Skip to content

The project is for PDF Python learning with Large Language Model.

Notifications You must be signed in to change notification settings

percent4/pdf-llm_series

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The project is for Python PDF parsing with LLM.

PDF structure analysis using PaddlePaddle Structure.

main features:

pure PDF:

  • get basic PDF info
  • get text
  • get table data
  • get image
  • split PDF
  • merge PDF
  • OCR with scanned PDF

PDF structure analysis:

  • PDF table detection
  • PDF structure analysis
  • PDF recovery
  • PDF translation with deepl

PDF with LLM:

  • chat with text-based PDF
  • chat with scanned PDF
  • chat with tables in PDF using table detection
  • multi-modal RAG for PDF

About

The project is for PDF Python learning with Large Language Model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages