Skip to content

This is a collection of text samples which wil be donated to Mozilla's Common Voice project. Please note that not all samples may be included on this repository.

License

MPL-2.0 and 2 other licenses found

Licenses found

MPL-2.0
LICENSE
CC0-1.0
LICENSE-CC0.txt
MIT
LICENSE-MIT.txt
Notifications You must be signed in to change notification settings

reinhart1010/common-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Common Text

Sample Texts for Mozilla's Common Voice

The main GitHub repository includes all the source files for the text corpus, iOS and Android apps, as well as the server to run the service. Here, all sample texts are located in the server/data fdirectory.

This repository, which originally started as a GitHub gist to count word occurence in Common Voice corpus, lists all Common Voice texts which are

Available languages

A full list of languages are available on the Common Voice website. Note that not all languages shown in this repository are officially launched, either due to localization problems or lack of text corpus.

Building

To run the scripts, make sure that you already have a copy of Common Voice repository on the same directory where you will put/clone the common-text directory. For simplicity I recommend to locate both under your Home directory.

./
|-common-text/
| |-scripts/
| | |-cv-count-latin.sh  // Script
| |-stats/
| | |-(Locale)/
| | | |-...              // Copy host
| |-...
|-voice-web/
| |-android/
| |-common/
| |-docker/
| |-docs/
| |-ios/
| |-locales/
| |-nubis/
| |-scripts/
| |-server/
| | |-data/
| | | |-(Locale)/
| | | | |-...            // Copy target
| | |-src/
| | |-...
| |-web/
| |-...
|-...

Contributing to this project

I welcome any pull requests on improving the extraction scripts. As of now it is implemented in bash (Linux) and does not work for non-Latin scripts (e.g. Arabic, Chinese).

If you would like to contribute more sample texts to this repository, please visit the Common Voice Sentence Collector. Any direct contributions to the sample texts will be overwritten by the texts hosted in the Common Voice.

To learn more about this project, or start contributing, visit voice.mozilla.org.

License

This project is licensed under Mozilla Public License, 2.0. See LICENSE file or https://mozilla.org/MPL/2.0/ for license details.

In accordance to Common Voice database license requirements, sample texts (located under stats//raw/ directory must be released under Public Domain (or similar licenses such as CC0, Unlicense, and WTFPL).

About

This is a collection of text samples which wil be donated to Mozilla's Common Voice project. Please note that not all samples may be included on this repository.

Resources

License

MPL-2.0 and 2 other licenses found

Licenses found

MPL-2.0
LICENSE
CC0-1.0
LICENSE-CC0.txt
MIT
LICENSE-MIT.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages