Skip to content

A repository for typefaces to train Tesseract and OCRopus for natural history collections and digital humanities.

Notifications You must be signed in to change notification settings

jbest/typeface-corpus

Repository files navigation

typeface-corpus

The repository is initially focused on compiling data that is relevant to the OCR activities conducted in the natural history collections communities and in the digital humanities communities. These communities face the challenge of needing to extract high-quality text from documents and images that contain a variety of typefaces. The goal of this repository is to compile a corpus of typeface samples in standardized formats to help the natural history collection and digital humanities communities significantly improve the quality of text generated by OCR engines such as Tesseract and OCRopus.

For details about the types of files and formatting, see the Submission Procedues document.

About

A repository for typefaces to train Tesseract and OCRopus for natural history collections and digital humanities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published