Stars
Official inference repo for FLUX.1 models
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Open-Sora: Democratizing Efficient Video Production for All
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Utilities intended for use with Llama models.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Collection of AWESOME vision-language models for vision tasks
A collection of resources and papers on Diffusion Models
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
EVA Series: Visual Representation Fantasies from BAAI
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Open-source and strong foundation image recognition models.
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Industry leading face manipulation platform
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Character Animation (AnimateAnyone, Face Reenactment)
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions