Skip to content

Latest commit





Folders and files

Last commit message
Last commit date

parent directory


Automated Testing

This sub-project contains Python-based code for testing the PDF output produced by earlier CI stages, or by you locally.

The desired properties of the PDFs under test (by default, all PDFs in the project root), are configured in the tests config file. Once that is done to your liking, set up the testing framework and run the tests (or don't and use CI instead, where everything is done for you already and Just Works™).

This is helpful to check for basic stuff. A more involved approach is shown here. This includes checking for publisher-specific requirements, allowing us to detect errors and iterate much faster.


There are two steps to this. This is unfortunately not as easy as it could be, owed to the nature of Python's ecosystem.

Python dependencies

The project uses poetry for dependency management. Once you have it installed according to their documentation, it is very easy to pull in the dependencies of this project. In the directory containing the poetry config file, run

poetry install

This will pull the precise requirements needed from the lock file. Otherwise, it uses the config file. That's it... almost.

Python itself

You will also need a suitable Python interpreter, aka Python version. This is "Python itself". If your system's Python is compatible with what is listed in the config, you do not need to do anything. The easiest way to test this is to just run:

poetry run pytest

and see if it fails. If it does, poetry will complain to you accordingly. In such a case, pyenv has worked well for me to set up a suitable, local or system-wide Python interpreter of any desired version.

The setup for the CI pipeline (GitHub Actions) is quite different. Take a look if you like, but the steps there are not applicable to local usage.


After the setup, you can simply run:

poetry run pytest

Prepending everything with poetry run will make sure all commands run in the suitable virtual environment with the correct packages installed in the correct versions, as well as using the Python version set up using pyenv, if any.

Any sub-commands or flags after pytest are courtesy of pytest, not poetry. There, you can for example specify which tests to run.


Otherwise, the testing procedure is tucked away and made accessible via the Makefile. It requires you to have GNU make installed. If you are on Linux you might already have it, since a lot of development workflows rely on it. The tests can then be run using:

make test

or any of the other targets in that Makefile:

make test-self
make test-pdfs

Side note

Sadly, an inherent issue is that PDF parsing/text extraction is incredibly hard. Visually, PDFs might look fine to human eyes, but trying to tell a computer, in an automated fashion, what it sees and have it parse that correctly is as of today basically impossible.