-
University of Science and Technology of China
- University of Science and Technology of China
-
21:11
(UTC -12:00)
Highlights
- Pro
Stars
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
We introduce OpenStory++, a large-scale open-domain dataset focusing on enabling MLLMs to perform storytelling generation tasks.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Public code release for the paper "ProCreate, Don’t Reproduce! Propulsive Energy Diffusion for Creative Generation"
CoreNet: A library for training deep neural networks
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various r…
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
SEED-Story: Multimodal Long Story Generation with Large Language Model
Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
This repo contains the code for 1D tokenizer and generator
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
[ICCV 2023] Efficient Diffusion Training via Min-SNR Weighting Strategy
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Large World Model -- Modeling Text and Video with Millions Context
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Open-Sora: Democratizing Efficient Video Production for All
Download the latest stable Synergy binaries.
PyTorch Implementation of Diffusion Schrodinger Bridge Matching
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"