Skip to content

A Pure ruby library to merge PDF files, number pages and maybe more...

License

Notifications You must be signed in to change notification settings

HireArt/combine_pdf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CombinePDF - the ruby way for merging PDF files

Gem Version GitHub Documentation Maintainers Wanted

CombinePDF is a nifty model, written in pure Ruby, to parse PDF files and combine (merge) them with other PDF files, watermark them or stamp them (all using the PDF file format and pure Ruby code).

Install

Install with ruby gems:

gem install combine_pdf

Help Wanted

I need help maintaining the CombinePDF Ruby gem.

I wrote this gem because I needed to solve an issue with bates-numbering existing PDF documents. However, during the last three years or so I have been maintaining the project for no reason at all, except that I enjoyed sharing it with the community.

I love this gem, but I feel it's time I took a step back from maintaining it and concentrate on my music and other things I want to develop.

Please hit me up if you would like to join in and eventually take over.

Known Limitations

Quick rundown:

  • When reading PDF Forms, some form data might be lost. I tried fixing this to the best of my ability, but I'm not sure it all works just yet.

  • When combining PDF Forms, form data might be unified. I couldn't fix this because this is how PDF forms work (filling a field fills in the data in any field with the same name), but frankly, I kinda liked the issue... it's almost a feature.

  • When unifying the same TOC data more then once, one of the references will be unified with the other (meaning that if the pages look the same, both references will link to the same page instead of linking to two different pages). You can fix this by adding content to the pages before merging the PDF files (i.e. add empty text boxes to all the pages).

  • Some links and data (URL links and PDF "Named Destinations") are stored at the root of a PDF and they aren't linked back to from the page. Keeping this information requires merging the PDF objects rather then their pages.

    Some links will be lost when ripping pages out of PDF files and merging them with another PDF.

  • Some encrypted PDF files (usually the ones you can't view without a password) will fail quietly instead of noisily.

  • Sometimes the CombinePDF will raise an exception even if the PDF could be parsed (i.e., when PDF optional content exists)... I find it better to err on the side of caution, although for optional content PDFs an exception is avoidable using CombinePDF.load(pdf_file, allow_optional_content: true).

  • The CombinePDF gem runs recursive code to both parse and format the PDF files. Hence, PDF files that have heavily nested objects, as well as those that where combined in a way that results in cyclic nesting, might explode the stack - resulting in an exception or program failure.

CombinePDF is written natively in Ruby and should (presumably) work on all Ruby platforms that follow Ruby 2.0 compatibility.

However, PDF files are quite complex creatures and no guaranty is provided.

For example, PDF Forms are known to have issues and form data might be lost when attempting to combine PDFs with filled form data (also, forms are global objects, not page specific, so one should combine the whole of the PDF for any data to have any chance of being preserved).

The same applies to PDF links and the table of contents, which all have global attributes and could be corrupted or lost when combining PDF data.

If this library causes loss of data or burns down your house, I'm not to blame - as pointed to by the MIT license. That being said, I'm using the library happily after testing against different solutions.

Combine/Merge PDF files or Pages

To combine PDF files (or data):

pdf = CombinePDF.new
pdf << CombinePDF.load("file1.pdf") # one way to combine, very fast.
pdf << CombinePDF.load("file2.pdf")
pdf.save "combined.pdf"

Or even a one liner:

(CombinePDF.load("file1.pdf") << CombinePDF.load("file2.pdf") << CombinePDF.load("file3.pdf")).save("combined.pdf")

you can also add just odd or even pages:

pdf = CombinePDF.new
i = 0
CombinePDF.load("file.pdf").pages.each do |page|
  i += 1
  pdf << page if i.even?
end
pdf.save "even_pages.pdf"

notice that adding all the pages one by one is slower then adding the whole file.

Add content to existing pages (Stamp / Watermark)