Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert image and flatten form #268

Open
vekonylaszlo opened this issue Mar 4, 2024 · 1 comment
Open

Insert image and flatten form #268

vekonylaszlo opened this issue Mar 4, 2024 · 1 comment

Comments

@vekonylaszlo
Copy link

vekonylaszlo commented Mar 4, 2024

Hey there,

I've been digging into the docs and GitHub discussions but I'm a bit stuck. I'm trying to figure out how to add an image from memory to a specific rect in a PDF and then flatten the form. Any pointers on how to tackle this would be awesome.

For reference, i currently trying this:

#[derive(Debug, Clone, Default)]
  pub struct Rectangle {
      left: f32,
      bottom: f32,
      width: f32,
      height: f32,
  }

    pub fn add_image_on_coordinates(&mut self, coord: Rectangle) {
        let image_path = r#"barcode.png"#;
        let pages = self.document.get_pages();
        let page_obj_id = pages.iter().nth(0);
        let stream_dictionary = dictionary! {};
        let (mut x, mut y) = (0.0, 0.0);
        if let Some(page_oid) = page_obj_id {
            if let Ok(stream) = xobject::image(image_path) {
                self.document
                    .insert_image(*page_oid.1, stream, (x + 10., y + 10.), (50., 50.))
                    .unwrap();
                self.document.save("output.pdf").expect("should have saved");
            }
        }
    }

But the PDF is corrupted after save.
Thanks in advance for any help!

@chriskyndrid
Copy link

chriskyndrid commented Jul 11, 2024

For flattening is a bit complicated. You need to identify all the form fields:

 let catalog = document
            .trailer
            .get(b"Root")
            .and_then(|obj| obj.as_reference())?;
        let catalog_dict = document.get_object(catalog)?.as_dict()?;
        let acroform = catalog_dict
            .get(b"AcroForm")
            .and_then(|obj| obj.as_reference())?;
        let acroform_dict = document.get_object(acroform)?.as_dict()?;
        let fields_list = acroform_dict.get(b"Fields")?.as_array()?;

YOu then need to iterate over the fields and potentiall Kids and extract all the coordinates of the fields and their bounding boxes. Then you need to write in your text into the bounding boxes. You will also need to calculate the available space you have within the bounding box and measure the width of your words in pixels to determine where you need to start the next line(also paying attention to height of the words in pixels), and then truncate, downsize(reduce font size), etc depending on the fitment in your boxes and your desired use case.

I just spend the last week putting together a fully functioning library for my codebase on top of lopdf that is designed to generate documents from templates we design, map data from our system onto the forms, flatten them(technically if we flatten we don't fill the forms first, we just extract the coordinates and render the content), then optimize out the document, which includes deduplicating fonts, moving all text objects into XObject's, etc, to reduce the size of the resulting PDF. Many of our use cases involve duplicating document templates, and merging them (say a 10 page invoice), and mergers don't inherently include optimizations with this library. That said, after I wrote a bunch of code to parallel process the form fill, flattening, and optimization components, lopdf under the hood is very fast.

For a 1200 page pdf with over 10000 fields mapped, fully optimized, with lots of vector graphics, etc, on my development machine it takes about 3 seconds to generate include all the logic to go back and forth from our API (actix driven). The optimization process is the largest impact to production, but is vital for us.

Although this crate is low level, I'm pretty impressed with it's speed at parsing and manipulating PDF's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants