document-processing

Star

Here are 39 public repositories matching this topic...

dhlab-epfl / dhSegment

Star

Generic framework for historical document processing

tensorflow python3 segmentation historical-data document-processing

Updated Jul 9, 2021
Python

steindani / pandoc-include

Star

An include filter for Pandoc

markdown pandoc pandoc-filter document-processing

Updated Dec 6, 2020
Haskell

awslabs / project-lakechain

Star

⚡ Cloud-native, AI-powered, document processing pipelines on AWS.

aws machine-learning natural-language-processing computer-vision serverless hacktoberfest document-processing aws-cdk generative-ai retrieval-augmented-generation

Updated Jul 19, 2024
TypeScript

A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. 🌟 Star to support our work!

aws ocr serverless headless cloud-storage document-database amazon-web-services dms document-management optical-character-recognition document-processing document-management-system document-api document-apis intelligent-document-processing document-layer

Updated Jul 23, 2024
Java

aws-solutions / enhanced-document-understanding-on-aws

Star

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.

document-analysis document-processing

Updated Jul 18, 2024
JavaScript

cburschka / lyx

Star

Unofficial mirror of git:https://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

latex mirror lyx document-processing

Updated Mar 21, 2023
C++

kili-technology / awesome-datasets

Star

A comprehensive list of annotated training datasets classified by use case.

Updated Jul 8, 2022

MBAigner / PDFSegmenter

Star

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

python pdf csv table annotations cluster-analysis document-processing layout-analysis detection-model page-segmentation

Updated Sep 11, 2020
Python

jeanbaptisteb / doccleaner

Star

A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.

docx text-processing odt document-processing xsl-transformation xsl-stylesheet xsl-sheet

Updated Jul 17, 2018
XSLT

awslabs / rhubarb

Star

A Python framework for multi-modal document understanding with Amazon Bedrock

multi-modal document-processing generative-ai intelligent-document-processing amazon-bedrock

Updated Jul 15, 2024
Python

johnsirmon / clearcouncil

Star

ClearCouncil: Automated tools for collecting, organizing, and embedding publicly available local state county council documents (minutes, agendas) into LLMs. Python, JS, and wget scripts included for easy data retrieval and integration.

local-government wget open-data openai civic-tech gpt data-retrieval document-processing transparency-enhancing-technologies langchain langchain-python retrieval-augmented-generation

Updated Apr 16, 2024
Python

afrozas / proceedings

Star

Semantic extraction from conference proceedings.

semantic conferences spacy document-processing

Updated Jul 26, 2020
Python

jmanhype / DSPy-Multi-Document-Agents

Star

An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.

nlp distributed-systems ai query-optimization knowledge-management document-processing vector-search

Updated Apr 23, 2024
Python

eiceblue / Spire.Doc-for-C-

Star

Spire.Doc for C++ is a professional Word C++ library specifically designed for developers to create, read, write, convert, merge, split, and compare Word documents on any C++ platforms with fast and high-quality performance.

cpp word docx class-library document-processing