Skip to content

TTS model cloning the voice of Nach

Notifications You must be signed in to change notification settings

pablomm/nachotron-voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

nachotron-voice

Project completed in December 2022, training a Text-to-Speech model to clone the voice of Nach using a GAN. Examples:

For the project, we generated a dataset of Nach's voice, transcribing their discography with Whisper, and using Demucs to separate the voice from the music. Then we trained a TTS model using a CoquiTTS model as the base, and had fun generating new songs with the model. A presentation for the course 'Deep Learning for Audio Signal Processing' that describes the project can be found here in PDF or PowerPoint format.

Code

All the project work was done using Colab, with the following notebooks:

  1. Data Collection. Process original discography in zip and store as separate songs.
  2. Music Source Separation. Separate voice from music using Demucs for all songs.
  3. Speech Transcription. Transcribe voice from all songs using Whisper.
  4. Noise Reduction. Reduce noise from all voice tracks.
  5. Dataset Preparation. Cut voice into segments detected as separate sentences by Whisper.
  6. Speaker Identification. Perform speaker identification to detect Nach's voice, and filter out other voices and cuts without voice.
  7. Train GAN. Train a TTS model using CoquiTTS as the base.
  8. Align Voice and Music. Experiment to automatically align voice generated with our model and a music track.
  9. Other Training. Retrain the model with a different dataset, and with different parameters.
  10. Demo. Demo of the model, loads a model checkpoint and generate a voice track with a text.

The data used and the dataset is not released due to potential copyright issues. However, the scripts can be used to generate a similar dataset with any discography and replicate the voice cloning (possibly with a more recent and better model).

There are many aspects of this project that could be improved, but it was created in just a weekend purely for fun. We hope you enjoy the results!

Pablo cover

Acknowledgments

We thank Nach for many years of great music, and the resources and libraries used for this project: Colab, CoquiTTS, Demucs, deep-speaker and Whisper.

About

TTS model cloning the voice of Nach

Resources

Stars

Watchers

Forks