PDF Index Extraction

This program generates an index of Biblical references and other references common in Old/New Testament scholarship.

Information

Dependencies

This program uses PyPDF2 for pulling information from the PDF and the python standard library for everything else.

Limitations

The program pulls whatever text is embedded in the pdf. This means that if the text is not embedded, or if the text is not in the correct order, the program will not work. This is espeically true of scanned content.

Output

The basic output is just a list of references in the correct order, by default this will appear in the base directory as "index.txt".

Use

You can use the main.py script with arguments to change the input file. Minor modifications to the script will allow multiple input files. Also note the need to specify the base page number if the PDF is not paginated from 1.

Modification and Extension

Expansion of the alias files in the aliases directory will improve the behavior of the program. The program is designed to be easily modified to include additional aliases. I plan to improve the configruability of the formatters in the future and eventually add a .tex and possibily other outputs.

License

GPLv3

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
aliases		aliases
sample_pdfs		sample_pdfs
sort_order		sort_order
.gitattributes		.gitattributes
Index.py		Index.py
Index_Models.py		Index_Models.py
index_loggers.py		index_loggers.py
intake.py		intake.py
main.py		main.py
readme.md		readme.md
reference_formatter.py		reference_formatter.py
reference_parser.py		reference_parser.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Index Extraction

Information

Dependencies

Limitations

Output

Use

Modification and Extension

License

About

Releases

Packages

Languages

teuton1056/PDF-Index-Generator

Folders and files

Latest commit

History

Repository files navigation

PDF Index Extraction

Information

Dependencies

Limitations

Output

Use

Modification and Extension

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages