Skip to content
View lxa9867's full-sized avatar

Block or report lxa9867

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official code for "ControlAR: Controllable Image Generation with Autoregressive Models"

Python 115 4 Updated Nov 2, 2024

A suite of image and video neural tokenizers

Python 712 16 Updated Nov 8, 2024

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

Python 93 2 Updated Nov 7, 2024

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python 3,955 309 Updated Nov 8, 2024

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Python 75 8 Updated Oct 30, 2024

RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)

Python 309 26 Updated Aug 31, 2024

Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Python 5 Updated Nov 12, 2024

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 212 9 Updated Oct 22, 2024

[ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

Python 31 1 Updated Sep 12, 2024

Efficient vision foundation models for high-resolution generation and perception.

Python 2,341 186 Updated Nov 12, 2024

Official inference repo for FLUX.1 models

Python 15,820 1,149 Updated Oct 8, 2024

Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"

Python 17 Updated Oct 28, 2024

The paper collections for the autoregressive models in vision.

150 4 Updated Nov 12, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 6,951 817 Updated Nov 11, 2024

Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 639 30 Updated Nov 8, 2024
32 Updated Oct 16, 2024

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 331 14 Updated Oct 16, 2024

CAR: Controllable AutoRegressive Modeling for Visual Generation

48 Updated Oct 8, 2024

🔥ImageFolder: Autoregressive Image Generation with Folded Tokens

54 Updated Oct 15, 2024

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 347 20 Updated Nov 12, 2024

AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

Python 68 9 Updated Oct 9, 2024

A paper list of some recent works about Token Compress for Vit and VLM

134 4 Updated Nov 11, 2024

A Simple Yet Unified Self-supervised Pre-training Strategy for LiDAR-Camera 3D Perception

Python 5 Updated Sep 23, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

208 9 Updated Nov 7, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,017 44 Updated Nov 11, 2024

Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"

Python 109 7 Updated Aug 27, 2024

[Official Implementation] Acoustic Autoregressive Modeling 🔥

Python 57 5 Updated Aug 24, 2024

Implements VAR+CLIP for image generation

Python 78 2 Updated Aug 5, 2024

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 9,003 852 Updated Nov 11, 2024

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 495 21 Updated Aug 16, 2024
Next