A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
-
Updated
Jun 17, 2024 - Python
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
Towards Generalist Biomedical AI
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Robust multimodal integration method implemented in PyTorch and TensorFlow
Speech-conditioned face generation using Generative Adversarial Networks (ICASSP 2019)
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
A two stage multi-modal loss model along with rigid body transformations to regress 3D bounding boxes
Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"
A Tool for extracting multimodal features from videos.
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."