Skip to content

recordings of chanting of pali sutras with associated text to be used as a dataset to train TTS models

License

Notifications You must be signed in to change notification settings

pnfo/pali-tts-dataset

Repository files navigation

Here is the output from running the create-dataset.js script

Total labels => count: 980, length: 3.8 hours, average length: 13.86
Outliers labels => count: 930, length: 3.6 hours, average length: 13.88
Used labels => count: 767, length: 2.7 hours, average length: 12.88
characters=" '(),-.:;?abcdeghijklmnoprstuvxyñāīūḍḷṃṅṇṭ"
characters=" '(),-.:;?xංඅආඉඊඋඌඑඔකඛගඝඞචඡජඣඤටඨඩඪණතථදධනපඵබභමයරලවසහළ්ාිීුූෙො"
speakers={"wdevananda":214,"oshadir":213,"obhasa":149,"lankananda":191}


Total labels => count: 63610, length: 480.0 hours, average length: 27.17
Usable labels => count: 20584, length: 46.9 hours, average length: 8.21
Used labels => count: 8990, length: 20.0 hours, average length: 8.01
{
  paragraph: 3357,
  centered: 739,
  heading: 2423,
  gatha: 2429,
  unindented: 42
}
characters=" '(),-.:;?abcdeghijklmnoprstuvxyñāīūḍḷṃṅṇṭ"
characters=" '(),-.:;?xංඅආඉඊඋඌඑඔකඛගඝඞචඡජඣඤටඨඩඪණතථදධනපඵබභමයරලවසහළ්ාිීුූෙො"
Extracted audio from flac files in 57.90 seconds

In the releases you can find the tar archive with all the wav files. You can download it and extract it as follows

cat pali_dataset.tar.bz2.part* > pali_dataset.tar.bz2
tar -xjf pali_dataset.tar.bz2

About

recordings of chanting of pali sutras with associated text to be used as a dataset to train TTS models

Resources

License

Stars

Watchers

Forks

Packages

No packages published