[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Python 199 6 Updated Jul 1, 2024

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Python 632 72 Updated Jul 3, 2024

UMass-Foundation-Model / 3D-LLM

Code for 3D-LLM: Injecting the 3D World into Large Language Models

Python 841 55 Updated Jun 6, 2024

OpenBMB / MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Python 4,380 316 Updated Jul 1, 2024

lucidrains / infini-transformer-pytorch

Implementation of Infini-Transformer in Pytorch

Python 94 Updated May 9, 2024

ggjy / DeLVM

Python 104 6 Updated Jun 6, 2024

Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 1,829 76 Updated Jun 29, 2024

LLaVA-VL / LLaVA-NeXT

Python 1,041 57 Updated Jul 1, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

2,845 106 Updated Jun 26, 2024

AILab-CVC / SEED-X

Multimodal Models in Real World

Jupyter Notebook 295 15 Updated Jun 21, 2024

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 2,138 179 Updated Jul 1, 2024

rese1f / MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 454 37 Updated Jun 16, 2024

mira-space / MiraData

Python 175 5 Updated Apr 15, 2024

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 633 44 Updated Apr 9, 2024

VIRL-Platform / VIRL

Code for V-IRL: Grounding Virtual Intelligence in Real Life

Python 285 8 Updated Jun 10, 2024

CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Jupyter Notebook 5,534 1,105 Updated Apr 25, 2024

jiasenlu / LL3M

LL3M: Large Language and Multi-Modal Model in Jax

Python 56 3 Updated Apr 23, 2024

FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …

Python 3,780 285 Updated Apr 30, 2024

baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI

Python 1,558 80 Updated Mar 8, 2024

AILab-CVC / SEED

Official implementation of SEED-LLaMA (ICLR 2024).

Python 516 29 Updated Apr 11, 2024

Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,542 105 Updated May 27, 2024

PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,427 66 Updated Jul 2, 2024

dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,074 273 Updated May 4, 2024

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 963 77 Updated Jul 2, 2024

bfshi / scaling_on_scales

When do we not need larger vision models?

Python 253 7 Updated Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haotian Zhang Haotian-Zhang

Achievements