Lists (15)
Sort Name ascending (A-Z)
Stars
Latent-based SR using MoE and frequency augmented VAE decoder
RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
RepVGG: Making VGG-style ConvNets Great Again
imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video.
Wavelet Convolutions for Large Receptive Fields. ECCV 2024.
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Codebase for the paper-Elucidating the design space of language models for image generation
Accelerating Diffusion Transformers with Token-wise Feature Caching
Implementation of the Gumbel-Sigmoid distribution in PyTorch.
[ICLR 2024] Official pytorch implementation of "Denoising Task Routing for Diffusion Models"
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
This is a repo to track the latest autoregressive visual generation papers.
š„ImageFolder: Autoregressive Image Generation with Folded Tokens
The paper collections for the autoregressive visual models.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Official PyTorch Implementation of "Effective Diffusion Transformer Architecture for Image Super-Resolution"
Official code for "ControlAR: Controllable Image Generation with Autoregressive Models"
Batch Face Processing for Modern Research, including face detection, face alignment, face reconstruction, head pose estimation
Accelerating Image Super-Resolution Networks with Pixel-Level Classification (ECCV 2024)
The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
AuraSR: GAN-based Super-Resolution for real-world
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta