GitHub - rave974/neural-speech: complete instructions for tts/asr repos

complete instructions for tts/asr repos
donate

content

tts
- melspec generators
  - tacotron2 (nvidia)
- vocoders
  - end to end
  - seq to seq
asr

first time set-up (ubuntu 18.04.5 lts)

sudo apt update;sudo apt upgrade -y  
sudo apt install wget git python3 python3-pip python3-venv nvidia-driver-460 -y
# (cuda+cudnn)

how to make v. env

# (replace name with your project name)
mkdir name
python3.6 -m venv name/env  
source name/env/bin/activate 
pip install -U pip setuptools wheel  
cd name

tacotron2 (nvidia)
source

# (make v. env)
pip install torch==1.4.0 torchvision==0.5.0 #cuda 10.1 (update 2)
export CUDA_HOME=/usr/local/cuda #if apex setup gives error
git clone https://github.com/NVIDIA/apex
pip install --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex
git clone https://www.github.com/nvidia/tacotron2
cd tacotron2
git submodule init; git submodule update
# (in requirements.txt delete numpy==1.13.3)
pip install numpy==1.16 numba==0.48 tensorboard==2.5.0 jupyter
pip install -r req*

inference

# (download waveglow and tacotron checkpoints)  
# (write in terminal: "jupyter notebook" and navigate to "inference.ipynb")  
# (adjust checkpoint paths and run)

train

prepare dataset and filelists (default test dataset)

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2 (inside tacotron2 git cloned folder)   
tar -xvf *.tar.bz2  
sed -i -- 's,DUMMY,LJSpeech-1.1/wavs,g' filelists/*.txt

edit hparams.py (batch size, filelists location, amp, text cleaners)
edit text/symbols.py (add symbols for trained language)
single gpu

python train.py --output_directory=outdir --log_directory=logdir

multiple gpu

python multiproc train.py --output_directory=outdir --log_directory=logdir --n_gpus <number of gpus>

continue training. non-english language can be trained faster by continuing to train on english checkpoint. non-english continued checkpoint can be inferenced with english waveglow checkpoint. after 100 epochs good speech quality on a dataset with 3700wav files

python multiproc train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

rave974/neural-speech

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages