"Optical character recognition" (OCR) means the extraction of the text content from a document image.
This toolkit provides
- python library functions for building custom ocr applications
- a ready to use script ocr4gamera
This toolkit has been written for the Gamera framework and requires a working installation of Gamera, version 4 or later. See the Gamera homepage https://gamera.sf.net/
For a user's guide and a developer's guide see 'doc/html/index.html'. For release notes and a revision history see 'CHANGES'.
A comprehensive overview of design, usage and customization of the OCR toolkit can be found in the paper
C. Dalitz, R. Baston: Optical Character Recognition with the Gamera Framework. In C. Dalitz (Ed.): "Document Image Analysis with the Gamera Framework." Schriftenreihe des Fachbereichs Elektrotechnik und Informatik, Hochschule Niederrhein, vol. 8, pp. 53-65, Shaker Verlag (2009)
For a small exmaple including training data and test images, see https://gamera.informatik.hsnr.de/addons/ocr4gamera/
Installation is done with pip3
, which will install a python module
gamera-ocr. See the section "Installation" in doc/html/index.html or
doc/src/index.txt.
- Christoph Dalitz, , 2009-2023
- Andreas Miller, 2023
- Rene Baston, 2009
Please contact Christoph Dalitz for questions about this toolkit.
Thanks to Jakub Wilk, Robert Butz, and Fabian Schmitt for valuable contributions to this toolkit.
This toolkit is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, either version 3 of the license, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the file LICENSE for more details.