Skip to content

mcejp/goeieDAG

Repository files navigation

goeieDAG

/ɣu.jə.ˈdɑx/: hello, good day (Dutch greeting used during daytime)

goeieDAG provides a unified Python API to Ninja and Make (TODO) build systems, aiming to make it extremely easy to benefit from parallel processing in any graph-like workflow.

Installation

pip install goeieDAG==0.0.2

Usage

from pathlib import Path

import goeiedag
from goeiedag import ALL_INPUTS, INPUT, OUTPUT

workdir = Path("output")

graph = goeiedag.Graph()

# Extract OS name from /etc/os-release
graph.add(["grep", "^NAME=", INPUT, ">", OUTPUT],
          inputs=["/etc/os-release"],
          outputs=["os-name.txt"])
# Get username
graph.add(["whoami", ">", OUTPUT],
          inputs=[],
          outputs=["username.txt"])
# Glue together to produce output
graph.add(["cat", ALL_INPUTS, ">", OUTPUT.result],
          inputs=["os-name.txt", "username.txt"],
          outputs=dict(result="result.txt"))  # can also use a dictionary and refer to inputs/outputs by name

goeiedag.build_all(graph, workdir)

# Print output
print((workdir / "result.txt").read_text())

Q&A

Why use the files and commands model rather than Python objects and functions?

  • It is a tested and proven paradigm (make traces back to 1976!)
  • It provides an obvious way of evaluating which products need rebuilding (subject to an accurate dependency graph)
  • It naturally isolates and parallelizes individual build tasks
  • It is agnostic as to how data objects are serialized (convenient for the library author...)
  • Graph edges are implicitly defined by input/output file names
  • A high-quality executor (Ninja) is available and installable via a Python package

How is this different from using the Ninja package directly?

  • Simpler mental model & usage: no need to separately define build rules or think about implicit/explicit inputs and outputs
  • API accepts Paths; no need to cast everything to str!
  • Higher-level API in general (for example, the output directory is created automatically)

Why is Python 3.10+ required?

The code makes use of PEP-604.

Similar projects

  • Ninja (Python package) -- provides a lower-level API, used by goeieDAG as back-end
  • TaskGraph -- similar project, but centered around Python functions and in-process parallelism
  • Snakemake -- similar goals, but a stand-alone tool rather than a library
  • Dask -- different execution model; caching of intermediate results is left up to the user
  • doit