isrkhou

Follow

isrkhou

Follow

1 follower · 2 following

Block or Report

Block or report isrkhou

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Lists (12)

Sort

active_speaker_detection

face_super_resolution

facial_emotion

✨ Inspiration

17 repositories

ml_library

12 repositories

multimodal_fusion

11 repositories

object_detection

sentiment_analysis

speech_emotion

📈 Trending

video_grounding

visual_grounding

20 repositories

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

YeexiaoZheng / Multimodal-Sentiment-Analysis

多模态情感分析——基于BERT+ResNet的多种融合方法

Python 198 27 Updated Nov 20, 2022

thuiar / MMSA

MMSA is a unified framework for Multimodal Sentiment Analysis.

Python 627 103 Updated Dec 27, 2023

invictus717 / MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Python 1,476 113 Updated Dec 5, 2023

pliang279 / awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

5,735 831 Updated Jun 19, 2024

lzjjeff / HGraph-CL

Code for COLING 2022 paper: Modeling Intra- and Inter-Modal Relations: Hierarchical Graph Contrastive Learning for Multimodal Sentiment Analysis

Python 6 4 Updated May 28, 2023

njustkmg / OMML

Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.

Python 559 98 Updated May 7, 2023

declare-lab / multimodal-deep-learning

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

OpenEdge ABL 695 144 Updated Mar 15, 2023

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python 5,459 932 Updated May 25, 2024

facebookresearch / multimodal

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Python 1,385 133 Updated Jul 29, 2024

thuiar / Self-MM

Codes for paper "Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis"

Python 178 34 Updated Jun 25, 2022

lzjjeff / multitask_multimodal_bert_for_sentiment_classification_nlpcc2020

Code for NPLCC 2020 paper: 一种基于多任务学习的多模态情感识别方法

Python 2 1 Updated Jul 12, 2023

NX-AI / vision-lstm

xLSTM as Generic Vision Backbone

Python 366 26 Updated Jul 21, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,617 99 Updated Jul 26, 2024

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 1,692 90 Updated Jul 31, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

10,971 723 Updated Aug 1, 2024

dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,111 277 Updated May 4, 2024

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Python 2,320 144 Updated Jul 31, 2024

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 1,914 181 Updated Apr 24, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Python 4,544 352 Updated Aug 1, 2024

jingyi0000 / VLM_survey

Collection of AWESOME vision-language models for vision tasks

2,058 187 Updated Jul 24, 2024

OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Python 8,146 573 Updated Aug 1, 2024

microsoft / SoM

Set-of-Mark Prompting for LMMs

Python 1,057 82 Updated Jun 5, 2024

zzxslp / SoM-LLaVA

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 105 2 Updated Jul 28, 2024

mbzuai-oryx / Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python 230 11 Updated Jan 2, 2024

yeliudev / UMT

🎬 UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection (CVPR 2022)

7 Updated Mar 26, 2022

waybarrios / guidance-based-video-grounding

[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"

14 Updated Jul 19, 2023

j-min / HiREST

Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)

Python 87 8 Updated Oct 21, 2023

wjun0830 / CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Python 99 11 Updated Jul 1, 2024

wjun0830 / QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)

Python 187 13 Updated Nov 21, 2023

UX-Decoder / DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 335 11 Updated Apr 8, 2024