- Dartmouth LING 48 Final Project: Improving TTS for Shanghainese
- Yuanhao Chen [email protected] Spring 2023
To build a text-to-speech (TTS) system for Shanghainese from scratch, seeking to improve the production of tone sandhi compared to existing models by paying special attention to preprocessing of text.
See writeup/main.pdf.
pip install -r phonemisation/requirements.txt
pip install -r speech_synthesis/requirements.txt
pip install -r comparison_questionnaire/requirements.txt # for analysis of questionnaire results
See speech_synthesis/README.md
.
phonemisation/
: contains the phonemisation module- See explanation of output in
phonemisation/__init__.py
- Usage:
python -m phonemisation "text to phonemise"
- Mechanism: Chinese sentence — word segmentation ⟶ Chinese words — romanisation ⟶ Shanghainese pinyin — phonemisation ⟶ Shanghainese phonemes
jieba
is used for word segmentation- A Shanghainese dictionary I previously made is used for romanisation
- Uses
Qieyun
module to add the tone number1
to syllables of 陰平 yinping/inbin tone; other tones are phonologically unmarked
- Uses
- The
romanisation_to_ipa
function inromanisation.py
contains the phonemisation function
- See explanation of output in
make_metadata.py
: uses thephonemisation
module to convert transcription into IPA and generate metadata for training- See below in
data/
- See below in
data/
: contains the dataset used for training- The transcriptions and audio files are adapted from this repo
- Downsampled to 16kHz for training
- Currently, only
shh.dict.cn/
is used for training
- The
*/metadata.txt
files are generated bymake_metadata.py
- The transcriptions and audio files are adapted from this repo
training/
- Juptyer notebook for training the model
- Intended to be uploaded and run in Google Colab environment; needs to be modified for local use
- Uses the
coqui-ai/TTS
repo, which contains an implementation of VITS
writeup/
: the write-upspeech_synthesis/
: contains the speech synthesis model- See
speech_synthesis/README.md
for more details
- See
comparison_questionnaire/
: contains the questionnaire and audio files used to compare speech produced by this model, the Apple model, and a human speaker*-1.wav
: produced by this model*-2.wav
: produced by Apple VoiceOver (MacBook Pro 14-inch, 2021; MacOS Ventura 13.0.1)*-3.wav
: spoken by myselfstats.ipynb
: Jupyter notebook for analysing the questionnaire results