Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

The code of "Parameter-Efficient Learning for Text-to-Speech Accent Adaptation"

Installation

This works depends on ESPnet2 with python 3.8

$ sudo apt-get install cmake
$ sudo apt-get install sox
$ sudo apt-get install zip
$ git clone https://github.com/Li-JEN/PEL-accent-adaptaion.git
$ cd PEL-accent-adaptaion/tools
$ ./setup_anaconda.sh miniconda espnet 3.8
$ ./activate_python.sh
$ make TH_VERSION=1.10.1 CUDA_VERSION=11.3
$ pip install pyopenjtalk
$ pip install typeguard==2.13.3
$ pip install Pillow==9.5.0
$ pip install numpy==1.23.0

For more detailed installation, please refer to official tutorial

Training

The work modifies the recipe of JVS with the Parameter-efficient transfer learning (PETL).

$ cd egs2/jvs/tts1

Please follow the tutorial of egs2/jvs/tts1 to prepare required files.
Because the Fastspeech2 need the duration alignment file for training duration predictor You may need to run the adaptation with AR model (Tacotron) to bulid the teacher model for preparing duration feature or checkout the newest ESPnet for integration with MFA
Before attempting PETL, please make sure you are able to implement vanilla fine-tuning with Fastspeech2 (Please don't skip first two steps, or there are missing files for training Fastspeech2)

 ./run.sh \
    --stage 5 \
    --g2p pyopenjtalk_accent_with_pause \
    --write_collected_feats true \
    --teacher_dumpdir exp/tts_finetune_tacotron2_raw_phn_jaconv_pyopenjtalk_accent_with_pause/decode_use_teacher_forcingtrue_train.loss.ave \
    --tts_stats_dir exp/tts_finetune_tacotron2_raw_phn_jaconv_pyopenjtalk_accent_with_pause/decode_use_teacher_forcingtrue_train.loss.ave/stats \
    --train_config conf/tuning/finetune_fastspeech2.yaml \
    --train_args "--init_param downloads/0293a01e429a84a604304bf06f2cc0b0/exp/tts_train_fastspeech2_tacotron2_teacher_raw_phn_jaconv_pyopenjtalk_accent_with_pause/train.loss.ave_5best.pth:tts:tts" \
    --tag finetune_fastspeech2_raw_phn_jaconv_pyopenjtalk_accent_with_pause

To implement PETL:

$ ./run.sh \
    --stage 5 \
    --g2p pyopenjtalk_accent_with_pause \
    --write_collected_feats true \
    --teacher_dumpdir exp/tts_finetune_tacotron2_raw_phn_jaconv_pyopenjtalk_accent_with_pause/decode_use_teacher_forcingtrue_train.loss.ave \
    --tts_stats_dir exp/tts_finetune_tacotron2_raw_phn_jaconv_pyopenjtalk_accent_with_pause/decode_use_teacher_forcingtrue_train.loss.ave/stats \
    --train_config conf/tuning/finetune_fastspeech2_adapter.yaml \
    --train_args "--init_param downloads/0293a01e429a84a604304bf06f2cc0b0/exp/tts_train_fastspeech2_tacotron2_teacher_raw_phn_jaconv_pyopenjtalk_accent_with_pause/train.loss.ave_5best.pth:tts:tts" \
    --freeze_param "tts.length_regulator tts.encoder.encoders tts.encoder.after_norm tts.decoder.after_norm tts.decoder.encoders tts.feat_out tts.postnet tts.gst tts.pitch_predictor tts.energy_predictor tts.duration_predictor tts.energy_embed tts.pitch_embed" \
    --tag finetune_fastspeech2_raw_phn_jaconv_pyopenjtalk_accent_with_pause_adapter

We provide the configuration files of Adapter and Reprogramming under the folder egs2/jvs/conf/tuning

Citations

@article{yang2023parameter,
  title={Parameter-Efficient Learning for Text-to-Speech Accent Adaptation},
  author={Yang, Li-Jen and Yang, Chao-Han Huck and Chien, Jen-Tzung},
  journal={arXiv preprint arXiv:2305.11320},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
ci		ci
doc		doc
docker		docker
egs2		egs2
espnet		espnet
espnet2		espnet2
test		test
test_utils		test_utils
tools		tools
utils		utils
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.mergify.yml		.mergify.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

Installation

Training

Citations

About

Releases

Packages

Languages

License

Li-JEN/PEL-accent-adaptaion

Folders and files

Latest commit

History

Repository files navigation

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

Installation

Training

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages