Lists (3)
Sort Name ascending (A-Z)
Stars
Generate a comprehensive review from an arXiv paper, then turn it into a blog post. This project powers the website below for the HuggingFace's Daily Papers (https://huggingface.co/papers).
Training-free Regional Prompting for Diffusion Transformers 🔥
InstantIR: Blind Image Restoration with Instant Generative Reference 🔥
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
Official PyTorch implementation of "Framer: Interactive Frame Interpolation".
Official implementation of “LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images”
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
A simple screen parsing tool towards pure vision based GUI agent
DepthSplat: Connecting Gaussian Splatting and Depth
The code for the Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness paper
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
A custom Gradio component that toggles between on and off states.
The official repository for paper "Tora: Trajectory-oriented Diffusion Transformer for Video Generation"
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Official implementation of 'Motion Inversion For Video Customization'
Depth Any Video with Scalable Synthetic Data
[CVPR 2024] Official Code for "AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Repo for the papers "Intrinsic Image Decomposition via Ordinal Shading" (TOG 2023) and "Colorful Diffuse Intrinsic Image Decomposition in the Wild" (TOG 2024)