lxa9867

Xiang Li lxa9867

CMU PhD | Multimodal Understanding & Generation

32 followers · 16 following

Carnegie Mellon University
Pittsburgh, PA
https://lxa9867.github.io/

Achievements

Stars

hustvl / ControlAR

Official code for "ControlAR: Controllable Image Generation with Autoregressive Models"

Python 115 4 Updated Nov 2, 2024

NVIDIA / Cosmos-Tokenizer

A suite of image and video neural tokenizers

Python 712 16 Updated Nov 8, 2024

x-cls / superclass

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

Python 93 2 Updated Nov 7, 2024

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python 3,955 309 Updated Nov 8, 2024

audi / MeshGPT

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Python 75 8 Updated Oct 30, 2024

robustsam / RobustSAM

RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)

Python 309 26 Updated Aug 31, 2024

satvik-dixit / mace

Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Python 5 Updated Nov 12, 2024

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 212 9 Updated Oct 22, 2024

LeapLabTHU / AdaNAT

[ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

Python 31 1 Updated Sep 12, 2024

mit-han-lab / efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Python 2,341 186 Updated Nov 12, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 15,820 1,149 Updated Oct 8, 2024

thu-ml / CCA

Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"

Python 17 Updated Oct 28, 2024

ChaofanTao / Autoregressive-Models-in-Vision-Survey

The paper collections for the autoregressive models in vision.

150 4 Updated Nov 12, 2024

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 6,951 817 Updated Nov 11, 2024

sihyun-yu / REPA

Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 639 30 Updated Nov 8, 2024

poppuppy / SAR

32 Updated Oct 16, 2024

mit-han-lab / hart

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 331 14 Updated Oct 16, 2024

MiracleDance / CAR

CAR: Controllable AutoRegressive Modeling for Visual Generation

48 Updated Oct 8, 2024

lxa9867 / ImageFolder

🔥ImageFolder: Autoregressive Image Generation with Folded Tokens

54 Updated Oct 15, 2024

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 347 20 Updated Nov 12, 2024

appletea233 / AL-Ref-SAM2

AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

Python 68 9 Updated Oct 9, 2024

daixiangzi / Awesome-Token-Compress

A paper list of some recent works about Token Compress for Vit and VLM

134 4 Updated Nov 11, 2024

Xiaohao-Xu / Unified-Pretrain-AD

A Simple Yet Unified Self-supervised Pre-training Strategy for LiDAR-Camera 3D Perception

Python 5 Updated Sep 23, 2024

showlab / Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

208 9 Updated Nov 7, 2024

showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,017 44 Updated Nov 11, 2024

hp-l33 / AiM

Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"

Python 109 7 Updated Aug 27, 2024

qiuk2 / AAR

[Official Implementation] Acoustic Autoregressive Modeling 🔥

Python 57 5 Updated Aug 24, 2024

daixiangzi / VAR-CLIP

Implements VAR+CLIP for image generation

Python 78 2 Updated Aug 5, 2024

THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 9,003 852 Updated Nov 11, 2024

Alpha-VLLM / Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 495 21 Updated Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiang Li lxa9867

Achievements

Achievements

Block or report lxa9867

Stars

hustvl / ControlAR

NVIDIA / Cosmos-Tokenizer

x-cls / superclass

InternLM / xtuner

audi / MeshGPT

robustsam / RobustSAM

satvik-dixit / mace

CircleRadon / TokenPacker

LeapLabTHU / AdaNAT

mit-han-lab / efficientvit

black-forest-labs / flux

thu-ml / CCA

ChaofanTao / Autoregressive-Models-in-Vision-Survey

SWivid / F5-TTS

sihyun-yu / REPA

poppuppy / SAR

mit-han-lab / hart

MiracleDance / CAR

lxa9867 / ImageFolder

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

appletea233 / AL-Ref-SAM2

daixiangzi / Awesome-Token-Compress

Xiaohao-Xu / Unified-Pretrain-AD

showlab / Awesome-Unified-Multimodal-Models

showlab / Show-o

hp-l33 / AiM

qiuk2 / AAR

daixiangzi / VAR-CLIP

THUDM / CogVideo

Alpha-VLLM / Lumina-mGPT