This repo contains a reimplementation of neural vocoder from Hi-Fi GAN paper. The moodel has few slight differences from original one. First, we use much larger multi-period discriminator with additional sub-final layer. Second, we average residual blocks' outputs in generator instead of adding, which lead to better stability during training.
Preprocessing steps heavily rely on official implentation and Mel-GAN.
First, clone the repo
git clone https://github.com/Mikezz1/hifi-gan
pip3 install -r requirements
Then install all dependencies
cd hifi_gan
pip3 install -r requirements
And download model checkpoint (if file is unavailable use gdrive link)
sh load_checkpoint.sh
To start training, run the following script. It takes one epoch to achieve distinguishable words, 4-5 epochs to get rid of robtic voice and at least 20 epochs to achive mostly clean sound.
python3 train.py --config="configs/base_config.yaml"
To run model on test samples, you need to calculate melspecs for reference audios first:
python3 prepare_test.py
Make sure that you have reference audios audio_1.wav
, audio_2.wav
and audio_3.wav
in data
folder (or specify other path / filenames inside the script). Then, run the inference script:
python3 inference.py --config='path/to/config' --mel_filenames='test_spec'
test_spec
option specifies filename pattern of source melspecs
config
option is a path to config. Make sure that you specified path to checkpoint in the config.