LibriTTS corpus
Identifier: SLR60
Summary: Large-scale corpus of English speech derived from the original materials of the LibriSpeech corpus
Category: Speech
License: CC BY 4.0
Downloads (use a mirror closer to you):
dev-clean.tar.gz [1.2G] ( Development set, clean speech
) Mirrors:
[US]
[EU]
[CN]
dev-other.tar.gz [924M] ( Development set, more challenging speech
) Mirrors:
[US]
[EU]
[CN]
test-clean.tar.gz [1.2G] ( Test set, "clean" speech
) Mirrors:
[US]
[EU]
[CN]
test-other.tar.gz [964M] ( Test set, "other" speech
) Mirrors:
[US]
[EU]
[CN]
train-clean-100.tar.gz [7.7G] ( Training set derived from the original materials of the train-clean-100 subset of LibriSpeech
) Mirrors:
[US]
[EU]
[CN]
train-clean-360.tar.gz [27G] ( Training set derived from the original materials of the train-clean-360 subset of LibriSpeech
) Mirrors:
[US]
[EU]
[CN]
train-other-500.tar.gz [44G] ( Training set derived from the original materials of the train-other-500 subset of LibriSpeech
) Mirrors:
[US]
[EU]
[CN]
About this resource:
- The audio files are at 24kHz sampling rate.
- The speech is split at sentence breaks.
- Both original and normalized texts are included.
- Contextual information (e.g., neighbouring sentences) can be extracted.
- Utterances with significant background noise are excluded.
The MD5 checksums of the downloads are as follows (note: not everyone will want to know this).
0c3076c1e5245bb3f0af7d82087ee207 dev-clean.tar.gz 815555d8d75995782ac3ccd7f047213d dev-other.tar.gz 7bed3bdb047c4c197f1ad3bc412db59f test-clean.tar.gz ae3258249472a13b5abef2a816f733e4 test-other.tar.gz 4a8c202b78fe1bc0c47916a98f3a2ea8 train-clean-100.tar.gz a84ef10ddade5fd25df69596a2767b2d train-clean-360.tar.gz 7b181dd5ace343a5f38427999684aa6f train-other-500.tar.gz