Pipeline for ground truth creation to train text recognition models. Extracts OCR results from eScriptorium, prepare them for alignment with passim and import the valid alignments back to eScriptorium.
-
Updated
Jun 26, 2024 - Python
Pipeline for ground truth creation to train text recognition models. Extracts OCR results from eScriptorium, prepare them for alignment with passim and import the valid alignments back to eScriptorium.
Testing Tracer and Passim on medieval Irish & Welsh law texts
Code and data accompanying the Programming Historian tutorial on text reuse with Passim by Romanello & Hengchen.
Text preparation pipeline (digital witnesses) for training text recognition models. Retrieves texts from Sefaria.org, analyzes structure, cleans, concatenates and creates an index of text content. Texts are then ready for alignment search on OCR results with Passim.
Add a description, image, and links to the passim topic page so that developers can more easily learn about it.
To associate your repository with the passim topic, visit your repo's landing page and select "manage topics."