Skip to content
View Haotian-Zhang's full-sized avatar
👋
Welcome
👋
Welcome
Block or Report

Block or report Haotian-Zhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation

359 11 Updated Jul 1, 2024
Jupyter Notebook 940 108 Updated Apr 27, 2024

🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook

Python 20 Updated Jun 23, 2024

Code&Data for Grounded 3D-LLM with Referent Tokens

Python 51 Updated Jul 1, 2024

LLM101n: Let's build a Storyteller

13,083 567 Updated Jun 28, 2024

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Python 199 6 Updated Jul 1, 2024

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Python 632 72 Updated Jul 3, 2024

Code for 3D-LLM: Injecting the 3D World into Large Language Models

Python 841 55 Updated Jun 6, 2024

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Python 4,380 316 Updated Jul 1, 2024

Implementation of Infini-Transformer in Pytorch

Python 94 Updated May 9, 2024
Python 104 6 Updated Jun 6, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 1,829 76 Updated Jun 29, 2024
Python 1,041 57 Updated Jul 1, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

2,845 106 Updated Jun 26, 2024

Multimodal Models in Real World

Jupyter Notebook 295 15 Updated Jun 21, 2024

Vector (and Scalar) Quantization, in Pytorch

Python 2,138 179 Updated Jul 1, 2024

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 454 37 Updated Jun 16, 2024
Python 175 5 Updated Apr 15, 2024

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 633 44 Updated Apr 9, 2024

Code for V-IRL: Grounding Virtual Intelligence in Real Life

Python 285 8 Updated Jun 10, 2024

Taming Transformers for High-Resolution Image Synthesis

Jupyter Notebook 5,534 1,105 Updated Apr 25, 2024

LL3M: Large Language and Multi-Modal Model in Jax

Python 56 3 Updated Apr 23, 2024

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …

Python 3,780 285 Updated Apr 30, 2024

Emu Series: Generative Multimodal Models from BAAI

Python 1,558 80 Updated Mar 8, 2024

Official implementation of SEED-LLaMA (ICLR 2024).

Python 516 29 Updated Apr 11, 2024

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,542 105 Updated May 27, 2024

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,427 66 Updated Jul 2, 2024

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,074 273 Updated May 4, 2024

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 963 77 Updated Jul 2, 2024

When do we not need larger vision models?

Python 253 7 Updated Jun 27, 2024
Next