This repository contains the code for the paper ``Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition'' (more features available soon).
@inproceedings{wang-etal-2022-iq,
author={Liming Wang and Siyuan Feng and Mark Hasegawa-Johnson and Chang D. Yoo},
title={Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition},
booktitle={Annual Meeting of the Association for Computational Linguistics},
year={2022}
}
- ZeroSpeech 2021 baseline system
- UnsupSeg
- BEER
- Other dependencies can be found in
requirements.txt
Simply run bash run.sh
for the small datasets we provided. To reproduce the results in the paper, please download the whole datasets and convert them in a similar format as the small datasets by the following steps:
- Prepare datasets. Download the LibriSpeech dataset, manually cut out spoken word segments using information provided in resources/librispeech_word/librispeech_word.json. Also download the TIMIT dataset, convert the audio files to .wav and create the meta data files as done in
resources/TIMIT/test_subset
. - Modify the paths and variables in
run.sh
and configs/librispeech_word.conf. - Run
bash run.sh
.