Skip to content
View Hhhhhhao's full-sized avatar
🥝
🥝
  • Pittsburgh

Organizations

@cmu-mlsp

Block or report Hhhhhhao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A paper list of some recent works about Token Compress for Vit and VLM

130 4 Updated Nov 6, 2024

A suite of image and video neural tokenizers

Python 640 14 Updated Nov 8, 2024

Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 633 29 Updated Nov 8, 2024

🔥ImageFolder: Autoregressive Image Generation with Folded Tokens

53 Updated Oct 15, 2024

O1 Replication Journey: A Strategic Progress Report – Part I

1,259 34 Updated Oct 28, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,029 144 Updated Oct 31, 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Python 1,427 147 Updated Oct 28, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,011 44 Updated Nov 5, 2024

[Official Implementation] Acoustic Autoregressive Modeling 🔥

Python 57 5 Updated Aug 24, 2024
Python 446 45 Updated Oct 28, 2024
Python 231 14 Updated Nov 7, 2024

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 8,803 839 Updated Nov 9, 2024

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 495 21 Updated Aug 16, 2024

A framework for few-shot evaluation of language models.

Python 6,924 1,851 Updated Nov 9, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 986 55 Updated Sep 27, 2024

This is the official implementation for ControlVAR.

Python 52 1 Updated Oct 12, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,303 56 Updated Aug 15, 2024

An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch

Python 284 34 Updated May 23, 2023

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 675 36 Updated Aug 5, 2024

This is a repo to track the latest autoregressive visual generation papers.

42 Updated Oct 12, 2024
Python 320 34 Updated Jul 19, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,829 112 Updated Jul 29, 2024

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Python 869 60 Updated Sep 25, 2024
Python 11 Updated Jul 30, 2024

OpenMMLab Detection Toolbox and Benchmark

Python 29,561 9,462 Updated Aug 21, 2024

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 3,808 325 Updated Aug 14, 2024

4M: Massively Multimodal Masked Modeling

Python 1,602 93 Updated Oct 7, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,751 113 Updated Oct 30, 2024

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Python 687 28 Updated Sep 27, 2024

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Python 1,805 152 Updated Nov 2, 2024
Next