A community-supported supercharged version of paperless: scan, index and archive all your physical documents
-
Updated
Jun 12, 2024 - Python
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
borb is a library for reading, creating and manipulating PDF files in python.
💀 Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh
Open Source Document Management System for Digital Archives (Scanned Documents)
A library for converting HTML into PDFs using ReportLab
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
A Python library for reading and writing PDF, powered by QPDF
Add a description, image, and links to the pdf topic page so that developers can more easily learn about it.
To associate your repository with the pdf topic, visit your repo's landing page and select "manage topics."