AI-Speech
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
An unofficial PyTorch implementation of the audio LM VALL-E
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Muzic: Music Understanding and Generation with Artificial Intelligence
A collection of neural vocoders suitable for singing voice synthesis tasks.
unofficial vits2-TTS implementation in pytorch
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Instant voice cloning by MIT and MyShell.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
speech self-supervised representations
🔊 Text-Prompted Generative Audio Model
Repository for training models for music source separation.
Inference and training library for high-quality TTS models.
Faster Whisper transcription with CTranslate2
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Multilingual Voice Understanding Model