Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,413 550 Updated Nov 1, 2024

shallowdream204 / DreamClear

[NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Python 718 36 Updated Oct 25, 2024

usefulsensors / moonshine

Fast and accurate automatic speech recognition (ASR) for edge devices

Python 2,105 90 Updated Nov 5, 2024

alibaba / Tora

The official repository for paper "Tora: Trajectory-oriented Diffusion Transformer for Video Generation"

Python 602 35 Updated Oct 31, 2024

Peterande / D-FINE

D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement 💥💥💥

Python 655 58 Updated Nov 6, 2024

tangqiaoyu / ToolAlpaca

the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"

Python 857 39 Updated Oct 26, 2024

deepseek-ai / Janus

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Python 917 41 Updated Oct 31, 2024

opendatalab / DocLayout-YOLO

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 426 29 Updated Oct 31, 2024

mit-han-lab / efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Python 2,323 185 Updated Nov 3, 2024

rhymes-ai / Allegro

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Python 556 40 Updated Oct 31, 2024

fudan-generative-vision / hallo2

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Python 3,674 514 Updated Nov 6, 2024

thu-ml / RoboticsDiffusionTransformer

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 421 36 Updated Nov 7, 2024

facebookresearch / MovieGenBench

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

326 18 Updated Oct 19, 2024

gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,515 181 Updated Nov 6, 2024

VikParuchuri / surya

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 13,903 863 Updated Nov 7, 2024

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 6,812 792 Updated Nov 8, 2024

mit-han-lab / hart

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 317 14 Updated Oct 16, 2024

mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 348 14 Updated Oct 31, 2024

cocodataset / cocoapi

COCO API - Dataset @ https://cocodataset.org/

Jupyter Notebook 6,099 3,757 Updated Apr 17, 2024

bcmi / libcom

Image composition toolbox: everything you want to know about image composition or object insertion

Python 531 32 Updated Oct 31, 2024

sihyun-yu / REPA

Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 632 29 Updated Nov 8, 2024

HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python 7,699 868 Updated Nov 7, 2024

apple / ml-depth-pro

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

Python 3,603 236 Updated Oct 5, 2024

x2ss

Lists (17)

chatgpt-like

demo tool

docs

embodied AI

ghs

inspiration

interesting application

llm-cv

Mamba

MoE

multimodal like

open-cv

sd-like

sota

traditional deep cv

useful tool

voice

Stars