Block or Report
Block or report isrkhou
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (12)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
多模态情感分析——基于BERT+ResNet的多种融合方法
MMSA is a unified framework for Multimodal Sentiment Analysis.
Meta-Transformer for Unified Multimodal Learning
Reading list for research topics in multimodal machine learning
Code for COLING 2022 paper: Modeling Intra- and Inter-Modal Relations: Hierarchical Graph Contrastive Learning for Multimodal Sentiment Analysis
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
Codes for paper "Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis"
Code for NPLCC 2020 paper: 一种基于多任务学习的多模态情感识别方法
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
GPT4V-level open-source multi-modal model based on Llama3-8B
✨✨Latest Advances on Multimodal Large Language Models
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
Collection of AWESOME vision-language models for vision tasks
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
🎬 UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection (CVPR 2022)
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"