Stars
Multimodal Sarcasm Detection Dataset
the repository of A survey on image-text multimodal models
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization
Official PyTorch Implementation of SSMix (Findings of ACL 2021)
Lime: Explaining the predictions of any machine learning classifier
Microsoft COCO: Common Objects in Context for huggingface datasets
An open source implementation of CLIP.
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Code for ALBEF: a new vision-language pre-training method
MixGen: A New Multi-Modal Data Augmentation
Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"
Code for "A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking"
MMSA is a unified framework for Multimodal Sentiment Analysis.
Official implementation for NeurIPS'23 paper "Geodesic Multi-Modal Mixup for Robust Fine-Tuning"
MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations (ACL 2023)
Pytorch implementation for the paper: Multivariate, Multi-frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation, CVPR 2023.
Source code for ICASSP 2022 paper "MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations"
Official PyTorch implementation of "Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup" (ICML'20)