Stars
Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?
A suite of image and video neural tokenizers
Patch convolution to avoid large GPU memory usage of Conv2D
ElasticTok: Adaptive Tokenization for Image and Video
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Official inference repo for FLUX.1 models
Ongoing research training transformer models at scale
Official Implementation of Rethinking Score Distillation as a Bridge Between Image Distributions
Evaluating text-to-image/video/3D models with VQAScore
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition (ICLR 2024)
A framework for 4D reconstruction from monocular videos.
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Finetune ModelScope's Text To Video model using Diffusers 🧨
Machine Learning Engineering Open Book
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
Official PyTorch implementation of Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023).
get things from one computer to another, safely
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Text2Cinemagraph: Text-Guided Synthesis of Eulerian Cinemagraphs [SIGGRAPH ASIA 2023]
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Stable Diffusion web UI