pdf

Here are 2,007 public repositories matching this topic...

paperless-ngx / paperless-ngx

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

pdf machine-learning django angular ocr archiving dms document-management optical-character-recognition document-management-system

Updated Jun 12, 2024
Python

ocrmypdf / OCRmyPDF

Star

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

python pdf ocr image-processing tesseract

Updated Jun 11, 2024
Python

h2oai / h2ogpt

Star

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

pdf ai embeddings private gpt generative llm chatgpt gpt4all vectorstore privategpt llama2 mixtral

Updated Jun 12, 2024
Python

py-pdf / pypdf

Star

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

python pdf help-wanted pdf-documents pypdf2 pdf-manipulation pdf-parsing pdf-parser

Updated Jun 11, 2024
Python

Kozea / WeasyPrint

Star

The awesome document factory

css python html pdf converter weasyprint

Updated Jun 11, 2024
Python

jsvine / pdfplumber

Star

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

pdf pdf-parsing table-extraction

Updated Jun 11, 2024
Python

pdfminer / pdfminer.six

Star

Community maintained fork of pdfminer - we fathom PDF

python pdf parser

Updated May 22, 2024
Python

pymupdf / PyMuPDF

Star

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

python pdf font data-science ocr tesseract epub mupdf text-processing pdf-documents extract-data table-extraction text-shaping xps pymupdf

Updated Jun 10, 2024
Python

atlanhq / camelot

Star

Camelot: PDF Table Extraction for Humans

pdf table extract for-humans

Updated Jan 5, 2023
Python

jorisschellekens / borb

Sponsor

Star

borb is a library for reading, creating and manipulating PDF files in python.

python pdf library sdk typesetting pdf-converter python3 pdf-conversion pdf-generation pdf-library

Updated May 15, 2024
Python

pdfarranger / pdfarranger

Star

Small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface.

linux pdf gtk python3 gtk3