Stars
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
A generative speech model for daily dialogue.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Generative models for conditional audio generation
openvpi / DiffSinger
Forked from MoonInTheRiver/DiffSingerAn advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Stable diffusion for real-time music generation
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Instant voice cloning by MIT and MyShell.
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Code for the paper Hybrid Spectrogram and Waveform Source Separation
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Chat凉宫春日, An open sourced Role-Playing chatbot Cheng Li, Ziang Leng, and others.
🦜🔗 Build context-aware reasoning applications
The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
State-of-the-art 2D and 3D Face Analysis Project
CVPR2023 (highlight) - UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View