Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
Updated
Jun 13, 2024 - HTML
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
The ultimate open-source RAG framework
Read Japanese manga inside browser with selectable text.
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Simple app to extract text from pictures using Tesseract
Tesseract.js OCR
Data Mining Historical Newspaper Metadata (METS/ALTO formats)
CERberus -- guardian against character errors 🐶🐶🐶
Some bits of javascript to transcribe scanned pages using PageXML
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
An OCR demo application using tesseract.js and html with smartphone camera.
Documentation for Papermerge DMS - Installation, Help, User Manual, REST API
OCR with javascript on web
Add a description, image, and links to the ocr topic page so that developers can more easily learn about it.
To associate your repository with the ocr topic, visit your repo's landing page and select "manage topics."