pdf-scraper

Scrape data from PDF files using python.

Usage

The scripts directory contains below scripts:

explore.py - use this exploratory script to locate coordinates of text fields you are interested in.
script.py - input coordinates from the explore script into this script and run to extract desired fields to an output file.

The files directory contains sample ADP format pay stub template (chosen as example of a consistent structure) PDF file to be scraped.

Versioning

https://semver.org

Example

0.0.1
0.0.1-rc.1

Local Development

make list    # list all container and images

make build   # build image

make scan    # scan image

make start   # start container

make shell   # start shell in running container

make stop    # stop container

make remove  # remove container

make clean   # remove images

References

https://towardsdatascience.com/scrape-data-from-pdf-files-using-python-and-pdfquery-d033721c3b28

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
files		files
scripts		scripts
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
VERSION		VERSION

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf-scraper

Usage

Versioning

Local Development

References

About

Releases

Packages

Languages

abhijeetchopra/pdf-scraper

Folders and files

Latest commit

History

Repository files navigation

pdf-scraper

Usage

Versioning

Local Development

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages