Skip to content
View ancatmara's full-sized avatar
Block or Report

Block or report ancatmara

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A machine learning software for extracting information from scholarly documents

Java 3,282 440 Updated Jul 18, 2024

Libraries, Archives and Museums (LAM)

79 6 Updated Oct 4, 2022

Let's build better datasets, together!

Jupyter Notebook 185 27 Updated Jul 19, 2024

Parse JSON response of Amazon Textract

TypeScript 212 95 Updated Jul 5, 2024

Simple IIIF Search service for OCRed texts

Python 15 1 Updated Dec 16, 2020

IIIF experiments with Gallica content

JavaScript 23 3 Updated Jul 11, 2024

A deep learning toolkit specialized for handwritten document analysis

Python 187 41 Updated Jul 9, 2024

Conversions between various OCR formats

71 3 Updated May 13, 2023

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

XSLT 51 14 Updated Jul 15, 2024

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

JavaScript 67 9 Updated Jul 19, 2024

Tesseract Open Source OCR Engine (main repository)

C++ 59,806 9,259 Updated Jul 12, 2024

Sonraí traenála/tástála NLP

Python 5 Updated May 20, 2024

Images of example pages from Transkribus model training sets to make it easier to find a match.

10 Updated Jan 25, 2022

TensorFlow code and pre-trained models for BERT

Python 37,517 9,541 Updated Jul 16, 2024

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 8,727 757 Updated Jul 20, 2024

Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at CLEF 2020.

SCSS 22 5 Updated May 16, 2024

Public repository for Coptic SCRIPTORIUM Corpora Releases

CSS 30 13 Updated Jun 12, 2024

Latin texts annotated for named entities and NER tagger used for the Herodotos Project (Ohio State University / Ghent University)

Python 10 4 Updated Sep 26, 2022

BERT and ELECTRA models trained on Europeana Newspapers

Python 35 1 Updated Dec 14, 2021

An OCR evaluation tool

Python 57 12 Updated Jul 19, 2024

CERberus -- guardian against character errors 🐶🐶🐶

HTML 23 Updated Feb 15, 2024

Source code for the submissions to SIGTYP 2024, EvaLatin 2024, and AXOLOTL 2024 shared tasks

Python 2 Updated Apr 11, 2024

Repository for "Towards Robust Named Entity Recognition for Historic German"

Python 18 3 Updated Dec 11, 2020

Hanzipy is a Chinese character and NLP module for Chinese language processing for python. It is primarily written to help provide a framework for Chinese language learners to explore Chinese.

Python 13 3 Updated Jan 9, 2024

Multilingual BERT model for Ancient and Historical Languages for SIGTYP Shared Task 2024

Python 3 Updated Mar 27, 2024

Code for the paper "Language Models are Unsupervised Multitask Learners"

Python 22,004 5,444 Updated Jun 11, 2024

Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"

Python 192 46 Updated Oct 3, 2023

Official style files for papers submitted to venues of the Association for Computational Linguistics

TeX 636 168 Updated May 20, 2024

Sample CodaLab competition for use as a template for SemEval tasks

HTML 1 2 Updated Aug 7, 2019

Curated list of valuable salary negotiation advice.

232 13 Updated Aug 15, 2023
Next