Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
html
pdf
ocr
table-of-contents
excel
html-parser
docx
documents
doc
scanned-documents
txt
document-analysis
odt
pdf-parser
table-recognition
docx-parser
document-content-extraction
logical-structure-extraction
-
Updated
Aug 16, 2024 - Python