Lists (7)
Sort Name ascending (A-Z)
Stars
🔊 Text-Prompted Generative Audio Model
Instant voice cloning by MIT and MyShell.
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Foundational model for human-like, expressive TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
Robust Speech Recognition via Large-Scale Weak Supervision
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Foundational Models for State-of-the-Art Speech and Text Translation
Tracking the progress in end-to-end speech translation
List of direct speech-to-speech translation papers.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
🌏 Review notes for Postgraduate Interview of Tsinghua EE. (Sept. 2017)
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Conformer-based Metric GAN for speech enhancement
Implementation of paper "DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement"