#

corpus-processing

Here are 83 public repositories matching this topic...

BLKSerene / Wordless

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

translation tokenizer corpus linguistics tagger literature dependency-parser corpus-linguistics lemmatizer corpus-tools corpus-processing corpus-search corpus-statistics stopword corpus-analysis

Updated Jul 19, 2024
Python

bitextor

bitextor / bitextor

Bitextor generates translation memories from multilingual websites

Updated Jun 18, 2024
Python

hankcs / TreebankPreprocessing

Python scripts preprocessing Penn Treebank and Chinese Treebank

natural-language-processing corpus-processing

Updated Sep 2, 2020
Python

Helsinki-NLP / OpusFilter

OpusFilter - Parallel corpus processing toolkit

nlp natural-language-processing machine-translation parallel-corpus corpus-tools corpus-processing

Updated Jun 26, 2024
Python

OHNLP / MedTator

A Serverless Text Annotation Tool for Corpus Development

nlp natural-language-processing serverless corpus-processing text-annotation-tool

Updated Jan 5, 2024
JavaScript

NathanDuran / Switchboard-Corpus

Utilities for Processing the Switchboard Dialogue Act Corpus

dialogue corpus corpus-data corpus-tools switchboard dialogues corpus-processing dialogue-data switchboard-corpus dialogue-act

Updated Jan 24, 2021
Python

StarlangSoftware / Corpus-Py

Corpus processing library

sentence-tokenizer sentence-segmentation corpus-processing turkish-sentence-segmentation turkish-sentence-tokenizer

Updated May 20, 2024
Python

zgornel / StringAnalysis.jl

Hard-Forked from JuliaText/TextAnalysis.jl

text-analysis text-processing random-projections latent-semantic-analysis corpus-processing

Updated Aug 14, 2023
Julia

NathanDuran / MRDA-Corpus

Utilities for Processing the Meeting Recorder Dialogue Act Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing dialogue-act

Updated Jan 24, 2021
Python

uma-pi1 / OPIEC

Reading the data from OPIEC - an Open Information Extraction corpus

nlp natural-language-processing wiki wikipedia corpus information-extraction dataset corpora corpus-data nlp-resources wikipedia-dump corpus-tools natural-language-understanding open-information-extraction dataset-interface wikipedia-corpus corpus-processing nlp-datasets

Updated Jun 12, 2019
Java

Bibliome / alvisnlp

ALvisNLP corpus processing engine

java nlp workflow machine-learning natural-language-processing pipeline workflow-engine alvis corpus-processing

Updated Jun 25, 2024
Java

NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

kennedyCzar / NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.

Updated Sep 8, 2022
Python

StarlangSoftware / Corpus

Corpus processing library

sentence-tokenizer sentence-segmentation corpus-processing turkish-sentence-segmentation turkish-sentence-tokenizer

Updated Jul 20, 2024
Java

levindoneto / lanGen

N-Gram language model that learns n-gram probabilities from a given corpus and generates new sentences from it based on the conditional probabilities from the generated words and phrases.

natural-language-processing generator n-grams language-modelling corpus-processing ngram-language-model

Updated Feb 8, 2018
Python

versotym / rhymetagger

A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry

language-processing corpus-processing versification

Updated Nov 20, 2021
Python

johentsch / ms3

A parser for annotated MuseScore 3 files.

Updated May 23, 2024
Python

ku-nlp / kyoto-reader

A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus

japanese coreference corpus-processing pyknp predicate-argument-structure

Updated Jun 26, 2024
Python

uma-pi1 / OPIEC-pipeline

ringoreality / uniblock

uniblock, scoring and filtering corpus with Unicode block information (and more).

nlp machine-translation corpus-processing emnlp2019

Updated Sep 21, 2019
Python

jonathandunn / corpus_similarity

Measure the similarity of text corpora for 74 languages

nlp language natural-language-processing text corpus corpora corpus-linguistics corpus-tools corpus-processing

Updated Jan 26, 2024
Python

Improve this page

Add a description, image, and links to the corpus-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-processing topic, visit your repo's landing page and select "manage topics."