Skip to content

A Python module for extracting emails from a PDF.

License

Notifications You must be signed in to change notification settings

history-lab/xmpdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xmpdf

Extracts email metadata and text body from a PDF containing emails.

Installation

pip install xmpdf

Usage

from xmpdf import Xmpdf

ems = Xmpdf(pdf_file)
# print summary info about emails in PDF file
print(ems.info())
# process emails
for m in ems.emails:
    process(m)

OS Dependencies

If you encounter errors installing xmpdf, please check the OS-level dependencies of the pdftotext package to ensure you have the required libraries installed, as xmpdf utilizes this package.

Notes

  • Assumes an email ends when a new email begins
  • Works best with a standard email header (i.e., From:, To:, Sent:, Subject:)
  • The initial development of this package was funded in part by The Mellon Foundation’s “Email Archives: Building Capacity and Community” program.

About

A Python module for extracting emails from a PDF.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages