Skip to content
Maddie Abboud edited this page Oct 26, 2021 · 2 revisions

Command-line Convertor

The command-line PDF to HTML convertor is contained in the PDFToHTML.jar package that may be downloaded and directly executed on all the java-enabled platforms.

For converting a PDF file to a HTML web page just type: java -jar PDFToHTML.jar <input_file> [<output_file>] where

  • <input_file> is the path to the source PDF file to be converted.
  • <output_file> is an optional name of the output HTML file. If not specified, the output name will be the same as the input name with the html suffix.

Options:

  • -fm=[mode] Font conversion mode. Where [mode] = EMBED_BASE64, SAVE_TO_DIR, IGNORE_FONTS
  • -fdir=[path] Directory to extract fonts to. Where [path] = font extract directory ie dir/my-font-dir

Library

Basic Usage
Pdf2Dom may be used as a DOM interface to the Apache PDFBox™ library. The following example shows how to obtain a DOM model from a PDF file:
// load the PDF file using PDFBox
PDDocument pdf = PDDocument.load(new java.io.File("file.pdf"));
// create the DOM parser
PDFDomTree parser = new PDFDomTree();
// parse the file and get the DOM Document
Document dom = parser.createDOM(pdf);
Config Options
PDFDomTreeConfig config = PDFDomTreeConfig.createDefaultConfig();
config.setFontExtractDirectory(fontDir);
config.setFontMode(SAVE_TO_DIR);

PDFDomTree parser = new PDFDomTree(config);

API Documentation

See the PDFDomTree API documentation for more information.

Pdf2Dom API documentation is generated from the last snapshot.

Clone this wiki locally