Starred repositories
[CVPRW 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images
Perceptual Artifacts Localization for Image Synthesis Tasks (ICCV 23')
The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
The official implementation of "Relay Diffusion: Unifying diffusion process across resolutions for image synthesis" [ICLR 2024 Spotlight]
PyTorch implementation of FILM: Frame Interpolation for Large Motion, In ECCV 2022.
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
[NeurIPS 2023] Structural Pruning for Diffusion Models
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
A list of papers, docs, codes about diffusion distillation.This repo collects various distillation methods for the Diffusion model. Welcome to PR the works (papers, repositories) missed by the repo.
A mini-library for training consistency models.
Official inference repo for FLUX.1 models
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Open-Sora: Democratizing Efficient Video Production for All
llama3 implementation one matrix multiplication at a time