The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,774 1,038 Updated Oct 14, 2024

BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

383 12 Updated Jun 18, 2024

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,583 129 Updated Oct 20, 2024

EvolvingLMMs-Lab / LongVA

Long Context Transfer from Language to Vision

Python 311 16 Updated Aug 26, 2024

RifleZhang / LLaVA-Hound-DPO

Python 114 15 Updated Apr 23, 2024

LLaVA-VL / LLaVA-NeXT

Python 2,686 209 Updated Oct 16, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,728 113 Updated Sep 19, 2024

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,063 140 Updated Sep 3, 2024

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Python 2,483 154 Updated Oct 10, 2024

lupantech / MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Jupyter Notebook 227 35 Updated Sep 15, 2024

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

Python 1,201 169 Updated Oct 21, 2024

ShareGPT4Omni / ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Python 139 4 Updated Jul 1, 2024

xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Python 276 12 Updated Oct 15, 2024

chuangchuangtan / LLaVA-NeXT-Image-Llama3-Lora

LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft

Python 37 3 Updated Jul 17, 2024

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 800 54 Updated Oct 17, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,421 73 Updated Oct 9, 2024

labmlai / labml

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Python 2,035 135 Updated Oct 18, 2024

dvlab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 1,815 128 Updated Jul 2, 2024

AlignGPT-VL / AlignGPT

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

Python 29 3 Updated Jul 12, 2024

Karine-Huang / T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Python 196 6 Updated Aug 21, 2024

anuraghazra / github-readme-stats

⚡ Dynamically generated stats for your github readmes

JavaScript 69,066 22,772 Updated Oct 18, 2024

PixArt-alpha / PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Python 2,748 176 Updated Aug 1, 2024

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 6,559 672 Updated Aug 12, 2024

OptimalScale / DetGPT

Jupyter Notebook 756 72 Updated Aug 7, 2024

AILab-CVC / SEED-X

Multimodal Models in Real World

Jupyter Notebook 393 16 Updated Sep 21, 2024

Fanheng Kong friedrichor

Lists (14)

Image-Text Pairs Datasets

✨ Inspiration

LLM

LMM

LMM Benchmark

LMM Dataset

metric

Multimodal Dialogue

Multimodal Pretraining

🚀 My stack

Text-to-Image

Text-to-Image Dataset

tool

Video

Stars