This repository is to collect papers on zero-shot TTS, and the materials will be used for the survey talk at Interspeech 2024.
You are invited to add your papers by send a pull request. Feel free to give this repository a star if you enjoy the work.
This repo keeps updating. Come back again around Sep 10-15th.
- 2009 Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- 2015 A study of speaker adaptation for DNN-based speech synthesis
- 2016 Unsupervised speaker adaptation for DNN-based TTS synthesis
- 2018 Transfer learning from speaker verification to multispeaker text-to-speech synthesis
- 04/2024: SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
- 10/2023: High-Fidelity Audio Compression with Improved RVQGAN
- 08/2023: SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
- 10/2022: High Fidelity Neural Audio Compression (EnCodec)
- 08/2021: W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
- 07/2024: CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
- 06/2024: DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
- 06/2024: Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
- 04/2024: CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
- 03/2024: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
- 02/2024: BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
- 06/2023: Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale (Voicebox)
- 06/2023: AudioPaLM: A Large Language Model That Can Speak and Listen (AudioPaLM)
- 01/2023: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E)
- 07/2022 YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone