Stars
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
This project is the official implementation of 'Diffir: Efficient diffusion model for image restoration', ICCV2023
Scaling Diffusion Transformers with Mixture of Experts
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
TryOnDiffusion: A Tale of Two UNets Implementation
Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
SEED-Story: Multimodal Long Story Generation with Large Language Model
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
Official Pytorch Implementation of Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
ReCo: Region-Controlled Text-to-Image Generation, CVPR 2023
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models