-
Y-tech, KuaiShou
- Beijing
- [email protected]
Stars
Official inference repo for FLUX.1 models
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
Codes for ID-Specific Video Customized Diffusion
🔥 StableIdentity: Inserting Anybody into Anywhere at First Sight
RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or im…
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
An efficient video loader for deep learning with smart shuffling that's super easy to digest
SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Repository for Detail-revealing Deep Video Super-resolution https://arxiv.org/abs/1704.02738
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
Official PyTorch implementation of the paper: Flow Matching in Latent Space
Fitting 3DMM models to multiview (monocular) video data.
[CVPRW2024, Official Code] for paper "Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap".
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
Lumina-T2X is a unified framework for Text to Any Modality Generation
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
[Arxiv 2024] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models"
[CVPR 2024] On the Content Bias in Fréchet Video Distance
Evaluating text-to-image/video/3D models with VQAScore