Comparing algorithms for causality analysis in a fair and just way.
A work in progress for causal estimator evaluation. The framework aims to make comparison of methods easier, by allowing to compare them across both generated and existing datasets.
- make the package itself independent of Sacred, just advocate it as best practice
- migrate all files ending in
-old
and delete them if no longer necessary - create some proper unittests and use pytest instead of the shipped unittest
- add the final bachelor thesis as pdf under
references
- use Sphinx (checkout
docs
folder) to create command reference and some explanations. - convert
configs/config.py
into aconfig.yaml
- the content of config should not accessed from everywhere but only the necessary information passed!
- adhere to
pep8
and other standards. Usepre-commit
(which is set up below) to check and correct all mistakes - Don't fix things like random seed within the package, it's a library, advocate to do this outside (name this best-practice within the docs)
- separate modules that only do math from plotting modules. Why would the generators/acic module need matplotlib as dependency
- follow import order, first Python internal modules, then external, then the modules of your package.
- use PyCharm and check for the curly yellow underline hints how to improve the code
- add some example notebooks in the notebooks folder
- add the libraries which a required (no visualisation) into setup.cfg under requires.
- Check licences of third-party methods and add and note them accordingly. Within the init.py of the subpackage add a docstring and state the licences and the original authors.
- Do not set environment variables inside library, rather state this somewhere in the docs. os.environ['L_ALL']
- Never print something in a library, use the logging module for logging. Takes a while to comprehend
- move the
experiment.py
module into thescripts
folder because it's actually using the package (fix the imports accordingly)
In order to set up the necessary environment:
- create an environment
justcause
with the help of conda,conda env create -f environment.yaml
- activate the new environment with
conda activate justcause
- install
justcause
with:python setup.py install # or `develop`
Optional and needed only once after git clone
:
-
install several pre-commit git hooks with:
pre-commit install
and checkout the configuration under
.pre-commit-config.yaml
. The-n, --no-verify
flag ofgit commit
can be used to deactivate pre-commit hooks temporarily. -
install nbstripout git hooks to remove the output cells of committed notebooks with:
nbstripout --install --attributes notebooks/.gitattributes
This is useful to avoid large diffs due to plots in your notebooks. A simple
nbstripout --uninstall
will revert these changes.
Then take a look into the scripts
and notebooks
folders.
- Always keep your abstract (unpinned) dependencies updated in
environment.yaml
and eventually insetup.cfg
if you want to ship and install your package viapip
later on. - Create concrete dependencies as
environment.lock.yaml
for the exact reproduction of your environment with:For multi-OS development, consider usingconda env export -n justcause -f environment.lock.yaml
--no-builds
during the export. - Update your current environment with respect to a new
environment.lock.yaml
using:conda env update -f environment.lock.yaml --prune
βββ AUTHORS.rst <- List of developers and maintainers.
βββ CHANGELOG.rst <- Changelog to keep track of new features and fixes.
βββ LICENSE.txt <- License as chosen on the command-line.
βββ README.md <- The top-level README for developers.
βββ configs <- Directory for configurations of model & application.
βββ data
β βββ external <- Data from third party sources.
β βββ interim <- Intermediate data that has been transformed.
β βββ processed <- The final, canonical data sets for modeling.
β βββ raw <- The original, immutable data dump.
βββ docs <- Directory for Sphinx documentation in rst or md.
βββ environment.yaml <- The conda environment file for reproducibility.
βββ notebooks <- Jupyter notebooks. Naming convention is a number (for
β ordering), the creator's initials and a description,
β e.g. `1.0-fw-initial-data-exploration`.
βββ references <- Data dictionaries, manuals, and all other materials.
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
β βββ figures <- Generated plots and figures for reports.
βββ scripts <- Analysis and production scripts which import the
β actual PYTHON_PKG, e.g. train_model.
βββ setup.cfg <- Declarative configuration of your project.
βββ setup.py <- Use `python setup.py develop` to install for development or
| or create a distribution with `python setup.py bdist_wheel`.
βββ src
β βββ justcause <- Actual Python package where the main functionality goes.
βββ tests <- Unit tests which can be run with `py.test`.
βββ .coveragerc <- Configuration for coverage reports of unit tests.
βββ .isort.cfg <- Configuration for git hook that sorts imports.
βββ .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
This project has been set up using PyScaffold 3.2.2 and the dsproject extension 0.4. For details and usage information on PyScaffold see https://pyscaffold.org/.