Skip to content

Deep learning experiments and library for 'In Codice Ratio' OCR, part of a project involving an AI that can process document from Archivio Segreto Vaticano.

License

Notifications You must be signed in to change notification settings

Kidel/In-Codice-Ratio-OCR-with-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In Codice Ratio - OCR with CNN

Deep learning experiments and library for the OCR of In Codice Ratio, part of a project involving an artificial intelligence that can process document from Vatican Secret Archives. logo

In Codice Ratio (ICR) is a project curated by Roma Tre University in collaboration with Vatican Secret Archives. This project has the purpose of digitalizing the contents of documents and ancient texts from the Archive.

The problem we faced in this repository wes just a part of ICR, basically its core. We had to classify handwritten characters in Carolingian minuscule starting from an image of that character. The input is an ensemble of possible cuts of the word that has to be read, and our system has to be able to decide if a cut is correct and, if it is, which character it is.

Example

  • Bad cut of the word "asseras", recognized as "----s"
tagli cattivi della parola asseras
* Good cut of the word "asseras", recognized as "asseras"
tagli buoni della parola asseras

Other parts of ICR include a segmentation software, that is used to find words in a document and provide possible letter cuts to the OCR, and a Language Model to discriminate false positives among cuts classified by the OCR. The dataset is provided via a crowdsourcing platform. Those parts are not included in this repository.

The folder "Notebooks" includes our experiments and examples. The folder "Relazione" contains a deatiled relation about what we did. The folder "Libreria" has everything that is needed to use, load or retrain our networks.

About

Deep learning experiments and library for 'In Codice Ratio' OCR, part of a project involving an AI that can process document from Archivio Segreto Vaticano.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published