The project is for Python PDF parsing with LLM.
PDF structure analysis using PaddlePaddle Structure.
main features:
pure PDF:
- get basic PDF info
- get text
- get table data
- get image
- split PDF
- merge PDF
- OCR with scanned PDF
PDF structure analysis:
- PDF table detection
- PDF structure analysis
- PDF recovery
- PDF translation with deepl
PDF with LLM:
- chat with text-based PDF
- chat with scanned PDF
- chat with tables in PDF using table detection
- multi-modal RAG for PDF