Skip to content

iisys-hof/tts_webservice

Repository files navigation

UPDATE SEPTEMBER 2022

The phoneme dictionary was extended. A VITS model trained on speaker data of "Hokuspokus Clean" was added.

A multispeaker model by NVIDIA was added (https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_de_fastpitch_multispeaker_5)

Text To Speech Inferencing Webservice based on Tacotron 2 and Multi-Band MelGAN, trained using the HUI-Audio-Corpus-German, evaluated in Neural Speech Synthesis in German. Try it out at http:https://narvi.sysint.iisys.de/projects/tts. Requirements:

  • Linux-based OS (Ubuntu 18+, Debian9, Centos7)
  • libfreetype6-dev
  • pkg-configure
  • Python >= 3.8
  • python3-dev (for your respective version)
  • libsndfile
  • sox/ffmpeg

PyTorch may need to be installed separately (see https://pytorch.org/get-started/locally/)

Preparation: Create virtual environment, install requirements Open a python interpreter session in the previously generated virtual environment and run:

  • import nltk
  • nltk.download('punkt')

Before the TTS models can be used, download them from https://opendata.iisys.de/systemintegration/Models/speakers.tar.gz and extract them to tts_inferencer/speakers

Before the STT models can be used, download it from https://opendata.iisys.de/systemintegration/Models/asr_models.zip and extract them to asr_inferencer/models

To start the server in debug settings, run "python3 app.py". Access it at http:https://127.0.0.1:5000.

Further Notes:

If symbolic links for tacotron2 models are broken, recreate them using "ln -s <checkpoint.pth> train.loss.best.pth" in the respective speakers//tacotron2 directories.

Keep in mind, this service does not include number normalization yet, so do not input any digits (2 -> zwei).

The incorporated ASR model was taken from https://github.com/AASHISHAG/deepspeech-german, check out their work: https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published