M2LP is part of the M2ASR project, and it is an open-source package for multi-minority langauge processing tasks.
Multi-Minority Automatic Speech Recognition (M2ASR) is an NSFC-supported project, with the aim of providing speech recognition service for ethnic minorities in China. We are planning to investigate, develope and architect the ASR service within 5 years. We will publish all the tools, data, models, and services for free.
M2ASR is a multi-party project involing CSLT@Tsinghua Unviersity, Xinjiang University and Northwest University for Nations (NUW). M2LP provides the basic functions for langauge processing, including text normalization, morphology segmentation, langauge modeling, and all the related resources, functions and tools.
The basic design principle is reusability and modulation. The functions will be implemented as functions written by c/c++, python or perl, and the recipe will be a shell script that integrating modules by piple line.
rc : multilingual langauge resource
src : source code for LP
md : model released