Skip to content

lwang114/InformationQuantizer

Repository files navigation

Information Quantizer

This repository contains the code for the paper ``Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition'' (more features available soon).

@inproceedings{wang-etal-2022-iq,
  author={Liming Wang and Siyuan Feng and Mark Hasegawa-Johnson and Chang D. Yoo},
  title={Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2022}
}

Dependencies

How to run it?

Simply run bash run.sh for the small datasets we provided. To reproduce the results in the paper, please download the whole datasets and convert them in a similar format as the small datasets by the following steps:

  1. Prepare datasets. Download the LibriSpeech dataset, manually cut out spoken word segments using information provided in resources/librispeech_word/librispeech_word.json. Also download the TIMIT dataset, convert the audio files to .wav and create the meta data files as done in resources/TIMIT/test_subset.
  2. Modify the paths and variables in run.sh and configs/librispeech_word.conf.
  3. Run bash run.sh.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published