- Shanghai
Lists (17)
Sort Name ascending (A-Z)
Stars
[NeurIPS 24] PromptFix: You Prompt and We Fix the Photo
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
[NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Fast and accurate automatic speech recognition (ASR) for edge devices
The official repository for paper "Tora: Trajectory-oriented Diffusion Transformer for Video Generation"
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement 💥💥💥
the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Efficient vision foundation models for high-resolution generation and perception.
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
OCR, layout analysis, reading order, table recognition in 90+ languages
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
COCO API - Dataset @ https://cocodataset.org/
Image composition toolbox: everything you want to know about image composition or object insertion
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.