Skip to content
/ pypxml Public

A python library for parsing, converting and modifying PageXML files.

License

Notifications You must be signed in to change notification settings

jahtz/pypxml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPXML

A python library for parsing, converting and modifying PageXML files.

Setup

pip install pypxml

Install from source

  1. Clone repository: git clone https://github.com/jahtz/pypxml
  2. Install package: cd pypxml && pip install .
  3. Test with pypxml --version

CLI

pypxml [OPTIONS] COMMAND [ARGS]...

API

PyXML provides a feature rich Python API for working with PageXML files.

Example: Edit existing PageXML

from pypxml import PageXML, PageType

pxml = PageXML.from_xml('path_to_pagexml.xml')
text_region = pxml.create_element(PageType.TextRegion, type='paragraph', id='tr_001')
text_region.create_element(PageType.Coords, points='1,2 3,4 5,6 ...')

for region in pxml.regions:
    print(region.type)

pxml.to_xml('path_to_output.xml')

ZPD

Developed at Centre for Philology and Digitality (ZPD), University of Würzburg.

About

A python library for parsing, converting and modifying PageXML files.

Topics

Resources

License

Stars

Watchers

Forks

Languages