Stars
[ECCV'24] TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Resolving 3D Human Pose Ambiguities with 3D Scene Constraints https://prox.is.tue.mpg.de
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A synthetic data generator for text recognition
TAG-Bench: A benchmark for table-augmented generation (TAG)
Open Source framework for voice and multimodal conversational AI
Real time interactive streaming digital human
Robust Singing Voice Transcription and MIDI Extraction
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with comma…
Official code of SmartEdit [CVPR-2024 Highlight]
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Official inference repo for FLUX.1 models
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Example UI implementing the RTVI web client
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headles…
Vietnamese OCR Images Dataset
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, p…
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"