PyMuPDF has introduced PDF Optional Content (OC) support with v1.18.3 and significantly extended this in v1.18.4.
We believe that we are covering now the most frequently used features, but do expect to see improvements going forward.
The last section of this README is a synopsis of optional content related methods in PyMuPDF.
Our support includes the following features.
- Create, update and remove OC layers - so-called Optional Content Configurations or OCCDs.
- Create OCGs and use the resulting xref as the vehicle for object association. If the PDF document did not previously have OC support, the required entries will be created (i.e. the
/OCProperties
dictionary in the PDF catalog). - Associate annotations, images, Form XObject, drawings and text with an existing OCG (optional content group). This will cause those objects to be shown or hidden whenever their OCG is set to ON or OFF.
- Optional content attachments can also be removed for annotations, images and Form XObjects (but nor for drawings and text).
- Details of all of the PDF's OCG, OC layers and temporary visibility status can be retrieved and modified.
- PDF object type OCMD (Optional Content Membership Dictionary) is also supported. OCMDs allow to set object visibility based on logical expressions involving the status of one or more OCGs.
The following features are missing yet but may be included in future versions:
- The
/Order
array in an OC configuration is currently automatically maintained by adding the xref of every created OCG. According to PDF specifications, this array may however be used to establish an advanced, hierarchical structure of a document's optional content. We may consider offering an interface to edit this object.
Creates a PDF with one page, which is divided in 2 x 2 equal sized rectangles.
The first 4 pages of a source.pdf
are displayed in those sub-rectangles, each associated with its own OCG. These 4 OCGs are linked together via a radio button group: whenever one source page is set to be displayed (ON), the other three are switched to OFF.
Please note, that this effect is supported by some (e.g. Adobe Acrobat and Nitro 5), but not all PDF viewers.
Requires v1.18.4. Creates two objects on a PDF page, which are displayed exactly one at a time.
First of all, here is a list of abbreviations and original PDF technical terms used throughout this section:
- xref: just an abbreviation of "PDF cross reference number".
- OC: just an abbreviation of "optional content".
- OCCD: OC configuration dictionary. This object type is used to quickly establish a document-wide setting of all ON/OFF states. There always is a base or default OCCD stored under key
/D
in the/OCProperties
dictionary of the PDF catalog. Optional additional OCCDs are stored in the/Configs
array of/OCProperties
. In PDF viewers (Adobe Acrobat, etc.) you will find terms like "layer" for this notion. - OCG: OC group. This PDF object serves as an attribute for other PDF objects, which should have the same visibility status (ON or OFF) at all times. Every PDF supporting OC must have at least one OCG, and all valid OCGs must occur in the central
/OCGs
array of the/OCProperties
dictionary. - OCMD: OC membership dictionary. This PDF object represents a logical expression about the state of one or more OCGs. The boolean value of the expression is in turn interpreted as ON (true) or OFF (false). Instead of an OCG, an OCMD can also be used to control the visibility of PDF objects.
Please note, that this section lists the method names as defined in v1.18.5. The following renames have occurred compared to v1.18.4:
new name | old names |
---|---|
add_layer | add_layer_config, addLayerConfig |
get_layers | layer_configs |
get_layer | get_oc_states |
set_layer | set_oc_states |
add_ocg | addOCG |
get_ocgs | getOCGs |
Add a new OCG to the PDF and return its xref (cross reference number). If the PDF did not previously support OC at all, the required changes to the PDF catalog are made automatically. Please note that, once created, an OCG can neither be deleted nor modified. Its visibility - which can be set both, permanently or temporarily - is not an attribute of the OCG object itself, but stored in other places.
Return a dictionary of all existing OCGs across all OCCDs. The dictionary key is the OCG xref. An empty dictionary indicates missing OC support in this PDF.
Synopsis of all defined optional OC configurations. This is an overview report of the entries in the /Configs
array. The /D
layer is not included.
List the detail content of the specified OCCD (default or optional OCCD). This is a dictionary with lists of cross reference numbers for OCGs contained in the arrays /ON
, /OFF
or radio button groups (/RBGroups
).
Set the content of a given OCCD. This is a permanent visibility status change of OCGs under this configuration.
Add a new OCCD. This always is a new entry in the /Configs
array of /OCProperties
.
Activate the specified OCCD. This causes the visibility state of all OCGs it mentions to be set accordingly. Optionally, this OCCD can be made the default configuration. In this case all /Configs
entries will be deleted - so the PDF will end up just having the default (/D
) layer.
Return the xref of the OCG or OCMD attached to an image or form XObject. Returns 0 if the object is independent from OC handling. Parameter is the image's xref.
Attach an OCG / OCMD to an image or form XObject to control its OC-related visibility. Parameters are the image xref and the OC xref. Replaces any previous value or removes OC support if 0 is used instead of an OC xref.
Show the current visibility status of all active OCGs. This information is the same as offered by the user interface of PDF viewers.
Modify the visibility status of an OCG. This is the same function as offered by the user interface of PDF viewers.
Attach an OCG or OCMD to the annotation. Parameter is the OC xref. Any previous value wil be overwritten. A zero will remove the previous value.
Retrieve the xref of the attached OC entry. A zero indicates, that it is not subject to OC.
Methods Page.insertImage
, Page.showPDFpage
, Page.insertText
, Page.insertTextbox
, Shape.insertText
, Shape.insertTextbox
, and Shape.finish
support the oc
parameter, which accepts the xref of an OCG or OCMD.
Returns a list of items (name, xref, type)
, where type is one of "ocg" / "ocmd", xref the cross reference number of the object and name is the property name. These items represent objects referenced in the page's /Contents
object. E.g. for the following PDF definitions, this method would return [("MC0", 7, "ocg")]
.
5 0 obj
<<
/Type /Page
...
/Resources<< ... /Properties<<... /MC0 7 0 R>> ... >>
/Contents 6 0 R
...
>>
endobj
6 0 obj
<< ... >>
stream
...
/OC /MC0 BDC
... text or drawing commands
EMC
...
endstream
endobj
7 0 obj
<</Type/OCG ...>>
endobj
This shows (relevant parts of) a page definition (xref 5), its content definition (xref 6), and an OCG object definition (xref 7).
The basic takeaway is the relationship between the reference name /MC0
in the /Resources/Properties
dictionary, the reference syntax in the contents source and the relationship to the OCG's xref.
If the OCG is OFF, then everything between statements /OC /MC0 BDC
and EMC
is ignored: the respective text or graphics are not displayed.