document-processing

A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. 🌟 Star to support our work!

aws ocr serverless headless cloud-storage document-database amazon-web-services dms document-management optical-character-recognition document-processing document-management-system document-api document-apis intelligent-document-processing document-layer

Updated Nov 23, 2024
Java

steindani / pandoc-include

Star

An include filter for Pandoc

markdown pandoc pandoc-filter document-processing

Updated Dec 6, 2020
Haskell

parsee-ai / parsee-core

Star

Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

structured-data document-processing multimodal llm

Updated Nov 18, 2024
Python

awslabs / rhubarb

Star

A Python framework for multi-modal document understanding with Amazon Bedrock

multi-modal document-processing generative-ai intelligent-document-processing amazon-bedrock

Updated Nov 22, 2024
Python

cburschka / lyx

Star

Unofficial mirror of git:https://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

latex mirror lyx document-processing

Updated Mar 21, 2023
C++

aws-solutions / enhanced-document-understanding-on-aws

Star

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.

document-analysis document-processing

Updated Nov 18, 2024
JavaScript

afrozas / proceedings

Star

Semantic extraction from conference proceedings.

semantic conferences spacy document-processing

Updated Jul 26, 2020
Python

kili-technology / awesome-datasets

Star

A comprehensive list of annotated training datasets classified by use case.

Updated Jul 8, 2022

MBAigner / PDFSegmenter

Star

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

python pdf csv table annotations cluster-analysis document-processing layout-analysis detection-model page-segmentation

Updated Sep 11, 2020
Python

jmanhype / DSPy-Multi-Document-Agents

Star

An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.

nlp distributed-systems ai query-optimization knowledge-management document-processing vector-search

Updated Aug 17, 2024
Python

greed2411 / tokyo

Star

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.

clojure extension filetype text-extraction ring mime-types text-parser extract-text apache-tika document-processing text-parsing

Updated Jun 13, 2020
Clojure

iamarunbrahma / pdf-to-markdown

Star

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

python information-retrieval document-conversion pdf-converter text-extraction pdf-parsing document-processing rag pdf-extraction retrieval-augmented-generation pdf-to-markdown

Updated Nov 22, 2024
Python

eklem / stopword-trainer

Star

A module for creating stopword lists for any language, based on a set of documents.

nlp information-retrieval stopwords document-processing stopwords-removal

Updated Sep 28, 2024
JavaScript

jeanbaptisteb / doccleaner

Star

A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.

docx text-processing odt document-processing xsl-transformation xsl-stylesheet xsl-sheet

Updated Jul 17, 2018
XSLT

abdur75648 / urdu-text-detection

Star

Text line detection for Urdu OCR (UTRNet)

ocr text-detection document-processing urdu-text-detection urdu-ocr utrnet contournet

Updated Oct 8, 2024
Python

RPetitpierre / Generic_Semantic_Segmentation_of_Historical_Maps

Star

computer-vision historical-maps document-processing

Updated Jan 17, 2022
Jupyter Notebook

Improve this page

Add a description, image, and links to the document-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document-processing

Here are 60 public repositories matching this topic...

DocumindHQ / documind

enoch3712 / ExtractThinker

dhlab-epfl / dhSegment

awslabs / project-lakechain

formkiq / formkiq-core

steindani / pandoc-include

parsee-ai / parsee-core

awslabs / rhubarb

cburschka / lyx

aws-solutions / enhanced-document-understanding-on-aws

afrozas / proceedings

kili-technology / awesome-datasets

MBAigner / PDFSegmenter

jmanhype / DSPy-Multi-Document-Agents

greed2411 / tokyo

iamarunbrahma / pdf-to-markdown

eklem / stopword-trainer

jeanbaptisteb / doccleaner

abdur75648 / urdu-text-detection

RPetitpierre / Generic_Semantic_Segmentation_of_Historical_Maps

Improve this page

Add this topic to your repo