An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
-
Updated
Jul 27, 2024 - Python
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
OpusFilter - Parallel corpus processing toolkit
A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus
A parser for annotated MuseScore 3 files.
Bitextor generates translation memories from multilingual websites
Corpus processing library
Scripts for building a geo-located web corpus using Common Crawl data
Measure the similarity of text corpora for 74 languages
Scripts used for the preprocessing of the EstGEC-L2 corpus that contains Estonian L2 learner texts error-annotated in the M2 format.
Corpus analysis of plain text and providing Type-Token Ratio as well as some other statistics.
Napkin is a simple tool to produce statistical analysis of a text
Source code to evaluate the semantic severity (vertical expansion) of concepts.
A basic search engine to index a corpus for searching and rank the document data set.
Sense Tagged Instances For Finnish
A script for remove all english letter , emojies , arabic tashkel letter and punctuation marks from corpus .
Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.
This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.
Add a description, image, and links to the corpus-processing topic page so that developers can more easily learn about it.
To associate your repository with the corpus-processing topic, visit your repo's landing page and select "manage topics."