A corpus builder for evaluation of plagiarism detection tools
-
Updated
Dec 12, 2016 - PHP
A corpus builder for evaluation of plagiarism detection tools
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Augmentation scripts for the bAbI Dialog Tasks dataset
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
A clean Fusha Arabic tagged corpus.
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
A set of corpus-based sampling & analysis M4L devices
Generate pseudo-English sentences for research in semantic composition
Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
Information Retrieval Lab
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
Create a corpus for fine-tuning an OpenAI model
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
Bitextor generates translation memories from multilingual websites
A parser for annotated MuseScore 3 files.
A full-text article retrieval pipeline for biomedical literature.
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
Add a description, image, and links to the corpus-generator topic page so that developers can more easily learn about it.
To associate your repository with the corpus-generator topic, visit your repo's landing page and select "manage topics."