A collection of PDF command line tools and wrappers for Linux written in Bash Shell script. These are generally speaking convenience tools so one does not have to remember very long and cryptic options and switches.
The heavy lifting is done by backend tools such as pdftk, ghostscript and the poppler utils are used.
The scripts are meant to be installed in a users' home directory. To do this
quickly the Makefile
in the root of the repository has a target called
user_install
.
$ make user_install
Install pdftools under /home/<user>/bin
-> Installing bin/img2pdf to /home/<user>/bin/img2pdf
-> Installing bin/ocrpdf to /home/<user>/bin/ocrpdf
-> Installing bin/pdf2pdfa to /home/<user>/bin/pdf2pdfa
-> Installing bin/pdfcat to /home/<user>/bin/pdfcat
-> Installing bin/pdfmeta to /home/<user>/bin/pdfmeta
-> Installing bin/pdfresize to /home/<user>/bin/pdfresize
-> Installing bin/scan2jpg to /home/<user>/bin/scan2jpg
-> Installing bin/scan2pdf to /home/<user>/bin/scan2pdf
-> Installing bin/scan2png to /home/<user>/bin/scan2png
To uninstall everything the target user_uninstall
can be used.
A script to convert PNGs, TIFFs or JPEGs to PDF files.
- License
-
MIT
- Requires
-
bash, pdfcat, ImageMagick, pdftk
img2pdf first.png second.jpg
img2pdf --delete first.png second.jpg
img2pdf --rotate 180 myimage.png
output.pdf
img2pdf --output output.pdf --rotate 180 page1.png page2.jpg
Usage: img2pdf <options> <img-file> [<img-file> ... ] Options: -h | --help This message -d | --delete Delete the images after creating the PDF file. -o | --output <name> Write the output to specified file <name>. -r | --rotate <value> Rotate the image by <value> Where value can be a positive or negative integer between 0 and 360. -V | --version Display version and exit
Runs PDFs through OCR and saves the output as a text searchable PDF with the same name.
ℹ️
|
Only works with PDFs comprised of a single JPEG, LZW or ZIP compressed image per page. LZW compressed images are being converted to ZIP compressed one during the OCR process. |
- License
-
MIT
- Requires
-
bash, pdfcat, pdfimages (poppler-utils), pdftk, tesseract
ocrpdf first.pdf second.pdf
ocrpdf --lang deu german.pdf
ocrpdf --lang deu+fra+eng scanned_*.pdf
Usage: ocrpdf [options] <file> [<file> [,,]] Options: -h | --help This message -q | --quiet Don't send display processed file names -V | --version Print version information and exit -l | --lang <lang> Set the OCR languages to use. For multiple languages concatenate with a '+' E.g eng+deu for English and German Default: deu+eng+fra+ita+jpn+osd Description: Runs PDFs through OCR and saves the output as a text searchable PDF with the same name. Disclaimer: Only works with PDFs comprised of a single JPEG, LZW or ZIP compressed image per page. LZW compressed images will be converted to ZIP compressed ones during the OCR process.
A quick hack to replace pdfunite
as it destroys too much of the original’s
meta data.
- License
-
MIT
- Requires
-
bash, pdftk >= 2.0
pdfcat first.pdf second.pdf > merged.pdf
pdfcat myscan*.pdf > merged.pdf
A wrapper script around pdftk
to manipulate a PDFs meta data
- License
-
MIT
- Requires
-
bash >= 4.0, pdftk >= 2.0
pdfmeta --keywords "rainbow, magical, unicorn" unicorn.pdf rainbow.pdf
pdfmeta --creation-date "2017-01-01 22:30:45" unicorn.pdf
Usage: pdfmeta <options> <pdf> [[<pdf>] ..] Options: -h | --help This message -k | --keywords Comma separated list of keywords -s | --subject Define the PDFs subject -t | --title Define the PDFs title -c | --creator Define the PDFs creator program or library -p | --producer Define the PDFs producing program -C | --creation-date Set the creation date of the PDF -M | --modification-date Set the modification date of the PDF -V | --version Display version and exit
ℹ️
|
On Ubuntu 18.04 (bionic) |
A wrapper around ghostscript
to reduce the size of a scanned document
ℹ️
|
pdfresize is very likely not working with PDF documents containing JBIG2 images. |
- License
-
MIT
- Requires
-
bash, ghostscript
pdresize --input input.pdf --output output.pdf
pdfresize --quality screen --input input.pdf --output output.pdf
Usage: pdfresize [-q pdfsettings] -i <input> -o <output> Options: -h | --help This message -i | --input <input> A PDF file preferably of high resolution -o | --output <output> Name of the PDF file to save the result to -q | --quality <quality> Quality settings for output PDF. See quality keywords for acceptable input. -V | --version Print version and exit. Quality keywords: screen - low-resolution; comparable to "Screen Optimized" in Acrobat Distiller ebook - medium-resolution; comparable to "eBook" in Acrobat Distiller printer - comparable to "Print Optimized" in Acrobat Distiller prepress - comparable to "Prepress Optimized" in Acrobat Distiller default - intended to be useful across a wide variety of uses
Small script to convert a PDF to PDF/A type.
ℹ️
|
This is early beta and all the meta data in the PDF will be lost! |
sample.pdf
to a PDF/A-2 named sample_a.pdf
pdf2pdfa sample.pdf
sample.pdf
to a PDF/A-2 named sample_pdfa.pdf
pdf2pdfa --suffix _pdfa sample.pdf
sample.pdf
to a PDF/A-1 named sample_a.pdf
pdf2pdfa --level 1 sample.pdf
sample.pdf
to a PDF/A-3 exiting on errors.pdf2pdfa --level 3 --strict sample.pdf
sample.pdf
to a PDF/A-2 with color model CMYK.pdf2pdfa --color-model CMYK sample.pdf
Usage: pdf2pdfa [<options>] <pdf_file> [<pdf_file> [..]] Options: -c | --color-model <model> Color model to use for the conversion. Valid input is RGB or CMYK. Default: RGB -h | --help This message -l | --level <number> PDF-A specification level to use. Valid input is 1 (A-1), 2 (A-2) and 3 (A-3). Default: 2 -S | --strict Exit if errors are encountered during conversion. -s | --suffix <suffix> Append <suffix> to filename Default '_a' -V | --version Display version and exit.
Is frontend for scanimage
but has only been tested against the Canon LiDE 210
scanner.
Some but not all notable features are:
-
Can OCR scanned documents using
tesseract
. -
Scan a few predefined sizes such as A4 and A5 among others.
-
Symlinked to
scan2png
produces PNG and symlinked toscan2jpg
produces JPEG image output. -
Has command line mode only for single page or interactive mode for multi page scans.
scan_YYYY-MM-DD_hh-mm-ss.pdf
scan2pdf
`scan_YYYY-MM-DD_hh-mm-ss.pdf`
scan2pdf --ocr
scan2pdf --interactive --ocr
Enter filename [scan_2022-01-26_23-15-30]: (1)
1) Scan document (2)
2) Finish scan (3)
3) Wrap up and quit (4)
Choose action > 1 (5)
Choose action > 1 (6)
Choose action > 3 (7)
-
Provide file name or press enter to accept the default name.
-
Menu option
1
scans a page then returns to the prompt. -
Menu option
2
writes all pages to a PDF file and prompts for a new name. -
Menu option
3
writes all pages to a PDF file and exists. -
Scan one page.
-
Scan another page.
-
Write PDF and exit.
scan_YYYY-MM-DD_hh-mm-ss.jpg
scan2jpg
Usage: scan2pdf <options>
--interactive -I Interactive mode
--type -t Document Type
Possible values are:
d[ocument] for a text document
i[llustration] for a drawing
ph[otograph] for a photographic pictue
pr[int] for a scan from a print e.g. newspaper
r[aw] for not applying any post-processing
Default: document
--resolution -r Resolution of scan
Possible values are 75, 150, 300, 600, 1200
Default: 300
--page -p Page Size
Possible values are A4, A5, A6, Letter, CreditCard, CD-Cover
Default: A4
--depth -d Color depth of scan
1 for LineArt (Black & White)
8 for Grayscale and Color
16 for Color
Default: 8
--format -f PDF image compression
Possible values are jpeg, zip, lzw
Default: jpeg
--quality -q Recommended for jpeg, zip, png
Values for jpeg from 0 to 100
Values for png and zip from 0 to 9
Default: 90
--mode -m Color mode of scan
Possible values are Lineart, Gray, Color
Default: Color
--ocr -R Run the scan through character recognition
Default: false
--ocr-lang -L Set the language for the character recognition
Every language 'tesseract' supports
Default: deu+eng+fra+ita+jpn+osd
--output -o Filename of PDF file
Default: scan_2022-01-26_23-10-20
--orientation -O Document orientation
Possible options p[ortrait], l[andscape]
Default: portrait
--scanner -s Set the scanner to be used
E.g: gensys:libusb:001:005
--help -h This message