IACM

"Debugged" Code from (https://github.com/facebookresearch/vocoder-benchmark) with Results; original wav files from https://github.com/mikerapt/WaveRNN-PyTorch/tree/main/inference as well as LJ Speech. There is an echo-like noise in the synthesized wav files.

Here are the synthesized files using Wavenet vocoder with the best configuration for LJ Speech (μ-law compression on the input waveform, as well as normal loss for the loss function) as reported in the corresponding paper:

"For each of the vocoders, we start from the original configuration provided in the respective open-source implementation. However for WaveNet, there are different configurations that we can choose from. They vary in terms of input types and loss functions that can be used. For the input, we can use either raw waveform or pre-processed waveform using μ-law compression. For the loss function, there are two different options we can choose from: Mixture of Logistics (MoL-loss) and a single Gaussian distribution (normal-loss). We run different versions of the WaveNet model using each of the possible configurations and report the one with the best performance. We found that on LJ Speech and VCTK, it is better to use μ-law compression on the input waveform, while for LibriTTS, raw waveform input achieves the best results. For the loss function, using normal-loss helps to increase the overall performance". Actually, the loss function for WaveNet with pre-processed waveform using μ-law compression was Mixture of Logistics (MoL-loss).

I had to change the loss function for waveRNN from cross entropy to discretized_mix_logistic_loss since the waveforms were pre-processed using μ-law compression. In addition, cross entropy loss resulted in NaN for such input: loss_func = F.cross_entropy if voc_model.mode == 'RAW' else discretized_mix_logistic_loss

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
colab		colab
tutorials/tts		tutorials/tts
1000k_steps_1_gen_batched_target11000_overlap550-output.wav		1000k_steps_1_gen_batched_target11000_overlap550-output.wav
1000k_steps_1_gen_batched_target11000_overlap550.wav		1000k_steps_1_gen_batched_target11000_overlap550.wav
1000k_steps_1_target-output.wav		1000k_steps_1_target-output.wav
1000k_steps_1_target.wav		1000k_steps_1_target.wav
1000k_steps_2_gen_batched_target11000_overlap550-output.wav		1000k_steps_2_gen_batched_target11000_overlap550-output.wav
1000k_steps_2_gen_batched_target11000_overlap550.wav		1000k_steps_2_gen_batched_target11000_overlap550.wav
1000k_steps_2_target-output.wav		1000k_steps_2_target-output.wav
1000k_steps_2_target.wav		1000k_steps_2_target.wav
1000k_steps_3_gen_batched_target11000_overlap550-output.wav		1000k_steps_3_gen_batched_target11000_overlap550-output.wav
1000k_steps_3_gen_batched_target11000_overlap550.wav		1000k_steps_3_gen_batched_target11000_overlap550.wav
1000k_steps_3_target-output.wav		1000k_steps_3_target-output.wav
1000k_steps_3_target.wav		1000k_steps_3_target.wav
1000k_steps_4_gen_batched_target11000_overlap550-output.wav		1000k_steps_4_gen_batched_target11000_overlap550-output.wav
1000k_steps_4_gen_batched_target11000_overlap550.wav		1000k_steps_4_gen_batched_target11000_overlap550.wav
1000k_steps_4_target-output.wav		1000k_steps_4_target-output.wav
1000k_steps_4_target.wav		1000k_steps_4_target.wav
1000k_steps_5_gen_batched_target11000_overlap550-output.wav		1000k_steps_5_gen_batched_target11000_overlap550-output.wav
1000k_steps_5_gen_batched_target11000_overlap550.wav		1000k_steps_5_gen_batched_target11000_overlap550.wav
1000k_steps_5_target-output.wav		1000k_steps_5_target-output.wav
1000k_steps_5_target.wav		1000k_steps_5_target.wav
LJ001-0001.wav		LJ001-0001.wav
LJ001-0001output.wav		LJ001-0001output.wav
LJ001-0002.wav		LJ001-0002.wav
LJ001-0002output.wav		LJ001-0002output.wav
LJ005-0242.wav		LJ005-0242.wav
LJ005-0242output.wav		LJ005-0242output.wav
LJ050-0230.wav		LJ050-0230.wav
LJ050-0230output.wav		LJ050-0230output.wav
LJ050-0231.wav		LJ050-0231.wav
LJ050-0232.wav		LJ050-0232.wav
LJ050-0233.wav		LJ050-0233.wav
LJ050-0234.wav		LJ050-0234.wav
LJ050-0235.wav		LJ050-0235.wav
LJ050-0236.wav		LJ050-0236.wav
LJ050-0237.wav		LJ050-0237.wav
LJ050-0238.wav		LJ050-0238.wav
LJ050-0239.wav		LJ050-0239.wav
README.md		README.md
Αντίγραφο_Stress_recognition_in_Greek_speech_using_Wav2Vec2.ipynb		Αντίγραφο_Stress_recognition_in_Greek_speech_using_Wav2Vec2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IACM

About

Releases

Packages

Languages

mmarkaki/IACM

Folders and files

Latest commit

History

Repository files navigation

IACM

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages