-
Prompsit Language Engineering
Block or Report
Block or report ZJaume
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuse-
-
escape-unk Public
Escape unknown symbols in SentecePiece vocabularies
Python MIT License UpdatedJun 25, 2024 -
-
image_omr Public
Optical Music Recognition with RNN's in Keras
-
clean Public
A tool for downloading and cleaning parallel corpora
-
srx Public
Forked from bminixhofer/srxA mostly compliant Rust implementation of the Segmentation Rules eXchange (SRX) 2.0 standard for text segmentation.
Rust Apache License 2.0 UpdatedSep 14, 2023 -
splitters Public
A CLI for Rust SRX sentence segmenation rules as Python package.
Rust GNU General Public License v3.0 UpdatedSep 14, 2023 -
serde-fancy-regex Public
Forked from tailhook/serde-regexA serde-regex fork to (de)serialize fancy-regex regular expressions
Rust Apache License 2.0 UpdatedSep 14, 2023 -
gaoya Public
Forked from serega/gaoyaLocality Sensitive Hashing
Rust MIT License UpdatedAug 31, 2023 -
tmxt Public
Forked from sortiz/tmxtTransform TMX to text
-
students Public
Forked from browsermt/studentsEfficient teacher-student models and scripts to make them
Handlebars Other UpdatedJul 5, 2023 -
Infinity-For-Reddit Public
Forked from Docile-Alligator/Infinity-For-RedditA Reddit client for Android
Java GNU Affero General Public License v3.0 UpdatedJun 13, 2023 -
datasketch Public
Forked from ekzhu/datasketchMinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Python MIT License UpdatedJun 7, 2023 -
terminology Public
Tools to annotate parallel data with terminology for NMT forced translation
-
lttoolbox Public
Forked from apertium/lttoolboxFinite state compiler, processor and helper tools used by apertium
C++ GNU General Public License v2.0 UpdatedDec 13, 2022 -
cyrillic-transliteration Public
Forked from opendatakosovo/cyrillic-transliterationTransliterate Cyrillic script to Latin script and vice versa.
Python MIT License UpdatedNov 18, 2022 -
sacrebleu Public
Forked from mjpost/sacrebleuReference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Python Apache License 2.0 UpdatedOct 7, 2022 -
fastspell Public
Forked from mbanon/fastspellTargetted language identifier, based on FastText and Hunspell.
Python UpdatedAug 18, 2022 -
arch-install Public
Forked from tom5760/arch-installSimple bash script to install Arch Linux.
Shell Other UpdatedMar 20, 2022 -
bicleaner Public
Forked from bitextor/bicleanerBicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Python GNU General Public License v3.0 UpdatedNov 27, 2020 -
-
Domain_Adaptation Public
Forked from paracrawl/Domain_AdaptationInDomain detection is a tool designed to extract in-domain data from a large collections of data.
Python GNU General Public License v3.0 UpdatedMay 11, 2020 -
paraphrasing Public
A repository with different paraphrasing related tools. Sent2vec and paraphrase generation.
-
Computer-Vision Public
Computer vision repository
Jupyter Notebook GNU General Public License v3.0 UpdatedMar 31, 2019 -
LanguagePack Public
Forked from AnySoftKeyboard/LanguagePackA language pack project for AnySoftKeyboard
HTML UpdatedMay 17, 2018 -
diceware-cat Public
Forked from 1ma/diceware-catDiccionaris catalans per a generar contrasenyes Diceware
TeX UpdatedJan 3, 2017