runner

Sample parameters, run model ensemble over multiple processors

Requirements

Python 2.7 and 3

Python libraries:

numpy (tested with 1.11)
scipy (tested with 0.16 and 0.18)
tabulate

These libraries can be installed with pip, e.g., pip install tabulate.

Install

Clone / download the runner repository from https://github.com/alex-robinson/runner
Install runner to your Python installation via pip:

cd runner 
pip install ./

Now check that system command job is available by running job -h.

Note that install method python setup.py install should be avoided if possible to maintain Python system integrity.

Usage

Use command-line help:

Examples

Parameter sampling

Factorial combination of parameters:

job product a=2,3,4 b=0,1

 a      b
 2      0
 2      1
 3      0
 3      1
 4      0
 4      1

Monte-carlo sampling:

job sample a=U?0,1 b=N?0,1 --size 10 --seed 4

     a      b
0.425298236238 0.988953805595
0.904416005793 2.62482283016
0.68629932356 0.705445219934
0.397627445478 -0.766770633921
0.577938292179 -0.522609467132
0.0967029839014 -0.14215407458
0.71638422414 0.0495725958965
0.26977288246 0.519632323554
0.197268435996 -1.60068615198
0.800898609767 -0.948326628599

The above command draws 10 samples from "a" as uniform distribution between 0 and 1 and "b" as normal distribution of mean 0 and standard deviation 1. The seed parameter sets the random state, to make the sampling reproducible. Sampling method defaults to Latin Hypercube Sampling, built on the pyDOE package (copied in runner to reduce external dependencies).

Run model ensemble

The canonical form of job run is:

job run [OPTIONS] -- EXECUTABLE [OPTIONS]
job run [OPTIONS] -m CUSTOM

where EXECUTABLE is your model executable or a command, followed by its arguments, and CUSTOM indicates a custom model interface. Note the -- that separates job run arguments OPTIONS from the executable. When there is no ambiguity in the command-line arguments (as seen by python's argparse) it may be dropped. job run options determine in which manner to run the model, which parameter values to vary (the ensemble), and how to communicate these parameter values to the model. The most straightforward way it to use formattable command-line argument. For instance using canonical UNIX command echo as executable:

job run -p a=2,3,4 b=0,1 -o out -- echo --a {a} --b {b} --out {}

The standard output is written in log files under out/RUNID:

cat out/*/log.out

--a 2 --b 0 --out out/0
--a 2 --b 1 --out out/1
--a 3 --b 0 --out out/2
--a 3 --b 1 --out out/3
--a 4 --b 0 --out out/4
--a 4 --b 1 --out out/5

The command above runs an ensemble of 6 model versions, by calling echo --a {a} --b {b} --out {} where {a}, {b} and {} are formatted using runtime with parameter and run directory values, as displayed in the output above. Parameter ensembles generated by job sample can also be provided as input via -i/--params-file option instead of -p/--params. The job run parameter -o/--out-dir indicates experiment directory, under which individual ensemble member run directories {} will be created. There are a number of options to determine how this should be done (e.g. -a/--auto-dir to create sub-directory based on parameter names and values). The command is executed in the background and the standard output is saved to log files.

There are a number of other ways to communicate parameter values to your model (see also --arg-prefix parameter, e.g. with --arg-prefix "--{} " to achieve the same result with less redundancy, when parameter names match). Parameters can also be passed via a file:

job run -p a=2,3,4 b=0,1 -o out --file-name params.txt --file-type linesep --line-sep " "

cat out/*/params.txt

a 2
b 0
a 2
b 1
a 3
b 0
a 3
b 1
a 4
b 0
a 4
b 1

with a number of other file types. File types that involve grouping, such as namelist, require a group prefix with a . separator in the parameter name:

job run -p g1.a=0,1 g2.b=2. -o out --file-name params.txt --file-type namelist

cat out/*/params.txt

&g1
 a               = 0          
/
&g2
 b               = 2.0        
/
&g1
 a               = 1          
/
&g2
 b               = 2.0        
/

Additionally, parameters can be set as environment variables via --env-prefix argument (e.g. --env-prefix "" for direct access via $NAME within the script).

Custom model interface

For more complex model setup you may subclass runner.model.ModelInterface methods setup and postprocess. See examples/custom.py how to do that.

Note for use on the cluster

The current version makes use of python's multiprocessing.Pool to handle parallel tasks. When running on the cluster, it is up to the user to allocate ressources, via sbatch, e.g. by simply writing the command in a bash script:

# write job script
echo job run -p a=2,3,4 b=0,1 -o out -- echo --a {a} --b {b} --out {} > jobrun.sh

# submit with slurm with 10 procs
sbatch -n 10 jobrun.sh

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
doc		doc
examples		examples
runner		runner
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
makefile		makefile
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

runner

Requirements

Install

Usage

Examples

Parameter sampling

Run model ensemble

Custom model interface

Note for use on the cluster

About

Releases

Packages

Languages

License

alex-robinson/runner

Folders and files

Latest commit

History

Repository files navigation

runner

Requirements

Install

Usage

Examples

Parameter sampling

Run model ensemble

Custom model interface

Note for use on the cluster

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages