ancatmara

Follow

Oksana Dereza ancatmara

Follow

Celtic Studies, NLP for Low-Resource Languages, Computer-Assisted Historical Linguistics

86 followers · 49 following

Insight Centre for Data Anaytics / University of Galway Library
Galway, Ireland
@ancatmara
in/oksana-dereza

Achievements

Achievements

Block or Report

Block or report ancatmara

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Stars

kermitt2 / grobid

A machine learning software for extracting information from scholarly documents

Java 3,282 440 Updated Jul 18, 2024

bigscience-workshop / lam

Libraries, Archives and Museums (LAM)

79 6 Updated Oct 4, 2022

huggingface / data-is-better-together

Let's build better datasets, together!

Jupyter Notebook 185 27 Updated Jul 19, 2024

aws-samples / amazon-textract-response-parser

Parse JSON response of Amazon Textract

TypeScript 212 95 Updated Jul 5, 2024

mbennett-uoe / whiiif

Simple IIIF Search service for OCRed texts

Python 15 1 Updated Dec 16, 2020

altomator / IIIF

IIIF experiments with Gallica content

JavaScript 23 3 Updated Jul 11, 2024

jpuigcerver / PyLaia

A deep learning toolkit specialized for handwritten document analysis

Python 187 41 Updated Jul 9, 2024

cneud / ocr-conversion

Conversions between various OCR formats

71 3 Updated May 13, 2023

filak / hOCR-to-ALTO

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

XSLT 51 14 Updated Jul 15, 2024

scribeocr / scribeocr

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

JavaScript 67 9 Updated Jul 19, 2024

tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)

C++ 59,806 9,259 Updated Jul 12, 2024

kscanne / gbb

Sonraí traenála/tástála NLP

Python 5 Updated May 20, 2024

quinnanya / transkribus-models

Images of example pages from Transkribus model training sets to make it easier to find a match.

10 Updated Jan 25, 2022

google-research / bert

TensorFlow code and pre-trained models for BERT

Python 37,517 9,541 Updated Jul 16, 2024

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 8,727 757 Updated Jul 20, 2024

impresso / CLEF-HIPE-2020

Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at CLEF 2020.

SCSS 22 5 Updated May 16, 2024

CopticScriptorium / corpora

Public repository for Coptic SCRIPTORIUM Corpora Releases

CSS 30 13 Updated Jun 12, 2024

Herodotos-Project / Herodotos-Project-Latin-NER-Tagger-Annotation

Latin texts annotated for named entities and NER tagger used for the Herodotos Project (Ohio State University / Ghent University)

Python 10 4 Updated Sep 26, 2022

stefan-it / europeana-bert

BERT and ELECTRA models trained on Europeana Newspapers

Python 35 1 Updated Dec 14, 2021

qurator-spk / dinglehopper

An OCR evaluation tool

Python 57 12 Updated Jul 19, 2024

WHaverals / CERberus

CERberus -- guardian against character errors 🐶🐶🐶

HTML 23 Updated Feb 15, 2024

slowwavesleep / ancient-lang-adapters

Source code for the submissions to SIGTYP 2024, EvaLatin 2024, and AXOLOTL 2024 shared tasks

Python 2 Updated Apr 11, 2024

dbmdz / historic-ner

Repository for "Towards Robust Named Entity Recognition for Historic German"

Python 18 3 Updated Dec 11, 2020

Synkied / hanzipy

Hanzipy is a Chinese character and NLP module for Chinese language processing for python. It is primarily written to help provide a framework for Chinese language learners to explore Chinese.

Python 13 3 Updated Jan 9, 2024

ljvmiranda921 / LiBERTus

Multilingual BERT model for Ancient and Historical Languages for SIGTYP Shared Task 2024

Python 3 Updated Mar 27, 2024

openai / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"

Python 22,004 5,444 Updated Jun 11, 2024

helboukkouri / character-bert

Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"

Python 192 46 Updated Oct 3, 2023

acl-org / acl-style-files

Official style files for papers submitted to venues of the Association for Computational Linguistics

TeX 636 168 Updated May 20, 2024

danielhers / semeval-ucca

Forked from bethard/semeval-codalab

Sample CodaLab competition for use as a template for SemEval tasks

HTML 1 2 Updated Aug 7, 2019

petermekhaeil / salary-negotiating

Curated list of valuable salary negotiation advice.

232 13 Updated Aug 15, 2023