Skip to content
View csuhan's full-sized avatar
🐇
Focusing
🐇
Focusing

Highlights

  • Pro

Block or report csuhan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,314 82 Updated Sep 6, 2024

Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 3,201 253 Updated Aug 14, 2024

Diffusion Feedback Helps CLIP See Better

Python 196 10 Updated Aug 24, 2024

Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.

Python 21 1 Updated Aug 6, 2024

MINT-1T: A one trillion token multimodal interleaved dataset.

722 18 Updated Jul 31, 2024

[ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality

Jupyter Notebook 24 2 Updated Sep 2, 2024

[ICCV 2023] Online Clustered Codebook

Python 133 9 Updated Dec 1, 2023

Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

74 2 Updated Jul 16, 2024

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 382 15 Updated Aug 28, 2024

Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 102 2 Updated Jul 28, 2024

Code for Fast Training of Diffusion Models with Masked Transformers

Python 349 13 Updated May 15, 2024

Kolors Team

Python 3,430 219 Updated Sep 4, 2024

Code for the paper DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents, ICML 2024

Python 65 2 Updated Jun 12, 2024

Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Python 40 3 Updated May 25, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 629 35 Updated Aug 5, 2024

The official PyTorch implementation of Google's Gemma models

Python 5,236 499 Updated Jul 31, 2024

EVE: Encoder-Free Vision-Language Models

Python 200 4 Updated Jul 20, 2024

The open-source tool for building high-quality datasets and computer vision models

Python 8,062 537 Updated Sep 7, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,743 107 Updated Jul 29, 2024

A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"

Python 23 1 Updated Jun 13, 2024
1 Updated Jun 18, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,185 46 Updated Aug 15, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,227 68 Updated Aug 21, 2024

Latte: Latent Diffusion Transformer for Video Generation.

Python 1,625 170 Updated Sep 8, 2024
Python 21 1 Updated May 9, 2024

Official implementation of FIFO-Diffusion: Generating Infinite Videos from Text without Training

Python 331 23 Updated Jul 15, 2024

A massively parallel, high-level programming language

Rust 17,166 423 Updated Sep 5, 2024

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 3,259 282 Updated Aug 15, 2024

PyTorch implementation of "Brain Decodes Deep Nets"

Jupyter Notebook 50 4 Updated Feb 7, 2024
Next