This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

OpenEdge ABL 684 143 Updated Mar 15, 2023

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,115 75 Updated Jul 10, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

946 53 Updated Jun 24, 2024

HenryHZY / VL-PET

[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"

Python 51 1 Updated Sep 21, 2023

sfimediafutures / CLIPping-the-Deception

Code and pre-trained models for our paper "CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection".

Python 28 3 Updated Jul 8, 2024

ZhangHanDong / rust-nn

Rust Deep Neural Network

Rust 7 Updated Jun 1, 2024

lucazanella / lavad

Official implementation of "Harnessing Large Language Models for Training-free Video Anomaly Detection", CVPR 2024

Python 32 Updated May 28, 2024

yjh0410 / MAE

PyTorch implementation of Masked AutoEncoder

Python 7 2 Updated Apr 2, 2024

YaoFANGUK / video-subtitle-extractor

视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 5,274 592 Updated Feb 21, 2024

TruongKhang / TopicFM

[AAAI2023] TopicFM: Robust, Efficient, and Interpretable Topic-Assisted Feature Matching

Python 101 4 Updated May 7, 2024

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 2,871 234 Updated Jul 5, 2024

LinlyAC / VDT-AGPReID

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network (CVPR'24)

Python 26 1 Updated Mar 26, 2024

iyzyi / XvideosCreeper

X站爬虫 Xvideos

Python 2 Updated May 26, 2024

ynu-yangpeng / GLMC

[CVPR2023] Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions

Python 67 12 Updated Sep 1, 2023

IDT-ITI / MMFusion-IML

Code and trained models for our paper: K. Triaridis, V. Mezaris, "Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization", Proc. 30th Int. Conf. on MultiMedia Modeling (MMM …

Python 46 3 Updated Apr 1, 2024

bcmi / SSP-AI-Generated-Image-Detection

The code for "A Single Simple Patch is All You Need for AI-generated Image Detection"

Python 41 1 Updated May 19, 2024

yformer / EfficientSAM

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Jupyter Notebook 1,978 143 Updated Jun 6, 2024

rsreetech / MultiModalSearch

In this repository I demonstrate how you can perform multimodal(image+text) search to find similar images+texts given a test image+text from a multimodal (texts+images) database . I use the Kaggle …

Jupyter Notebook 11 2 Updated May 22, 2021

Icemoon zyl9737

Highlights

Block or report zyl9737

Lists (12)

🉑 acdemic

🎨 AIGC & GPTs

🍰 CPP & Go

🤖 embeded

🥇 federated learning

❄️ Java

🖥️ MLSys

👥 open source

📦 python & AI

🧰 tools

🌕 全栈开发

🌵 泛学习

Stars