Generic framework for historical document processing
-
Updated
Jul 9, 2021 - Python
Generic framework for historical document processing
An include filter for Pandoc
⚡ Cloud-native, AI-powered, document processing pipelines on AWS.
A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. 🌟 Star to support our work!
Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.
Unofficial mirror of git:https://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)
A comprehensive list of annotated training datasets classified by use case.
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.
A Python framework for multi-modal document understanding with Amazon Bedrock
ClearCouncil: Automated tools for collecting, organizing, and embedding publicly available local state county council documents (minutes, agendas) into LLMs. Python, JS, and wget scripts included for easy data retrieval and integration.
Semantic extraction from conference proceedings.
An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.
Spire.Doc for C++ is a professional Word C++ library specifically designed for developers to create, read, write, convert, merge, split, and compare Word documents on any C++ platforms with fast and high-quality performance.
Text line detection for Urdu OCR (UTRNet)
School/College Stationary List OCR and Parsing
Use data from MongoDB in LangChain, Llama and OpenAI
An implementation of basic IR techniques from scratch.
Service to vectorize documents into a FAISS vectorstore.
Python tool for converting PDF files to text. Simplify your document processing tasks.
Add a description, image, and links to the document-processing topic page so that developers can more easily learn about it.
To associate your repository with the document-processing topic, visit your repo's landing page and select "manage topics."