Comparing algorithms for causality analysis in a fair and just way.


A work in progress for causal estimator evaluation. The framework aims to make comparison of methods easier, by allowing to compare them across both generated and existing datasets.


  • make the package itself independent of Sacred, just advocate it as best practice
  • migrate all files ending in -old and delete them if no longer necessary
  • create some proper unittests and use pytest instead of the shipped unittest
  • add the final bachelor thesis as pdf under references
  • use Sphinx (checkout docs folder) to create command reference and some explanations.
  • convert configs/ into a config.yaml
    • the content of config should not accessed from everywhere but only the necessary information passed!
  • adhere to pep8 and other standards. Use pre-commit (which is set up below) to check and correct all mistakes
  • Don't fix things like random seed within the package, it's a library, advocate to do this outside (name this best-practice within the docs)
  • separate modules that only do math from plotting modules. Why would the generators/acic module need matplotlib as dependency
  • follow import order, first Python internal modules, then external, then the modules of your package.
  • use PyCharm and check for the curly yellow underline hints how to improve the code
  • add some example notebooks in the notebooks folder
  • add the libraries which a required (no visualisation) into setup.cfg under requires.
  • Check licences of third-party methods and add and note them accordingly. Within the of the subpackage add a docstring and state the licences and the original authors.
  • Do not set environment variables inside library, rather state this somewhere in the docs. os.environ['L_ALL']
  • Never print something in a library, use the logging module for logging. Takes a while to comprehend
  • move the module into the scripts folder because it's actually using the package (fix the imports accordingly)


In order to set up the necessary environment:

  1. create an environment justcause with the help of conda,
    conda env create -f environment.yaml
  2. activate the new environment with
    conda activate justcause
  3. install justcause with:
    python install # or `develop`

Optional and needed only once after git clone:

  1. install several pre-commit git hooks with:

    pre-commit install

    and checkout the configuration under .pre-commit-config.yaml. The -n, --no-verify flag of git commit can be used to deactivate pre-commit hooks temporarily.

  2. install nbstripout git hooks to remove the output cells of committed notebooks with:

    nbstripout --install --attributes notebooks/.gitattributes

    This is useful to avoid large diffs due to plots in your notebooks. A simple nbstripout --uninstall will revert these changes.

Then take a look into the scripts and notebooks folders.

Dependency Management & Reproducibility

  1. Always keep your abstract (unpinned) dependencies updated in environment.yaml and eventually in setup.cfg if you want to ship and install your package via pip later on.
  2. Create concrete dependencies as environment.lock.yaml for the exact reproduction of your environment with:
    conda env export -n justcause -f environment.lock.yaml
    For multi-OS development, consider using --no-builds during the export.
  3. Update your current environment with respect to a new environment.lock.yaml using:
    conda env update -f environment.lock.yaml --prune

Project Organization

β”œβ”€β”€ AUTHORS.rst             <- List of developers and maintainers.
β”œβ”€β”€ CHANGELOG.rst           <- Changelog to keep track of new features and fixes.
β”œβ”€β”€ LICENSE.txt             <- License as chosen on the command-line.
β”œβ”€β”€               <- The top-level README for developers.
β”œβ”€β”€ configs                 <- Directory for configurations of model & application.
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ external            <- Data from third party sources.
β”‚   β”œβ”€β”€ interim             <- Intermediate data that has been transformed.
β”‚   β”œβ”€β”€ processed           <- The final, canonical data sets for modeling.
β”‚   └── raw                 <- The original, immutable data dump.
β”œβ”€β”€ docs                    <- Directory for Sphinx documentation in rst or md.
β”œβ”€β”€ environment.yaml        <- The conda environment file for reproducibility.
β”œβ”€β”€ notebooks               <- Jupyter notebooks. Naming convention is a number (for
β”‚                              ordering), the creator's initials and a description,
β”‚                              e.g. `1.0-fw-initial-data-exploration`.
β”œβ”€β”€ references              <- Data dictionaries, manuals, and all other materials.
β”œβ”€β”€ reports                 <- Generated analysis as HTML, PDF, LaTeX, etc.
β”‚   └── figures             <- Generated plots and figures for reports.
β”œβ”€β”€ scripts                 <- Analysis and production scripts which import the
β”‚                              actual PYTHON_PKG, e.g. train_model.
β”œβ”€β”€ setup.cfg               <- Declarative configuration of your project.
β”œβ”€β”€                <- Use `python develop` to install for development or
|                              or create a distribution with `python bdist_wheel`.
β”œβ”€β”€ src
β”‚   └── justcause           <- Actual Python package where the main functionality goes.
β”œβ”€β”€ tests                   <- Unit tests which can be run with `py.test`.
β”œβ”€β”€ .coveragerc             <- Configuration for coverage reports of unit tests.
β”œβ”€β”€ .isort.cfg              <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.


This project has been set up using PyScaffold 3.2.2 and the dsproject extension 0.4. For details and usage information on PyScaffold see