Fuzzy search modules for searching lists of words in low quality OCR and HTR text.
-
Updated
Oct 18, 2024 - HTML
Fuzzy search modules for searching lists of words in low quality OCR and HTR text.
Enhanced awesome-align for low-resource languages and noise simulation: https://arxiv.org/abs/2301.09685
This allows to search text among all the image (screenshot) files in a specified folder and it returns a list of file names in which all, it found the text. It runs ocr always on just the newly added files for lesser time consumption. When any screenshots or images are removed from the folder thier corresponding text file is archived not deleted…
Add a description, image, and links to the ocr-text topic page so that developers can more easily learn about it.
To associate your repository with the ocr-text topic, visit your repo's landing page and select "manage topics."