Skip to content

Functional highly reproducible bioinformatics pipelines

License

Notifications You must be signed in to change notification settings

PapenfussLab/bionix

Repository files navigation

BioNix is a tool for reproducible bioinformatics that unifies workflow engines, package managers, and containers. It is implemented as a lightweight library on top of the Nix deployment system.

BioNix is currently a work in progress, so documentation is sparse. Please get in contact with us for more information, help, and contributing (see bottom of this page).

Installation

BioNix requires no dependencies beyond Nix, which may be installed by:

curl -L https://nixos.org/nix/install | sh

If you do not have root access a variety of rootless install options are available.

API docs can be generated by executing nix build in the doc directory and viewing result/OEBPS/index.html.

Examples

Several examples are available in ./examples/. The main example is presented in ./examples/default.nix and can be built using nix build in ./examples/. This sample pipeline performs variant calling using platypus, alignment using bwa mem, and preprocessing using samtools.

See the documentation in ./examples/README.md for more detail about this pipeline and the other examples.

  • The pipeline itself is specified in examples/call.nix and examples/default.nix.
  • The BioNix wrapper to run platypus is in tools/platypus-callVariants.nix.
  • The Nix expression for the platypus software itself can be found in nixpkgs.

Constructing workflows

Writing workflows requires some familiarity with the Nix programming language and deployment system. Good introductions can be found here and here.

To understand how to construct workflows it is recommended to study the examples provided. Thanks to the flexibility of Nix, the workflows can be constructed in different ways to suit the intended purposes and the examples illustrate some of the ways one might approach various problems.

For constructing tool wrappers, take a look in the ./tools/ directory for the currently existing tool wrappers. A good starting point are the wrappers for BWA.

HPC execution

BioNix supports submission of jobs to computing queues rather than directly building them using the Nix build engine. The two supported engines are Slurm and PBS represented by the slurm and qsub entries in the root BioNix tree, which take an attribute set of default parameters to a new tree of tools. Simply use tools out of these trees to submit jobs, and specify resource requirements as ordinary configuration options to the tools.

The following resource parameters can be specified:

  • ppn: The number of cores to request;
  • mem: The amount of memory to request (GB);
  • walltime: A string defining the maximum walltime.

As we rely on side effects to submit jobs sandbox builds cannot be used and must be disabled (--option sandbox false with nix-build or --no-sandbox with nix build).

Slurm specifics

Slurm jobs are submitted by executing the salloc binary on the cluster. By default this is assumed to be /usr/bin/salloc; if this is not the case on your cluster then you need to additionally specify the path to salloc via the salloc parameter.

When launching the build, it is important that the TMPDIR environment variable points to a location which is on shared storage (i.e., available from all nodes). This will be the location used for temporary files during the execution of stages.

PBS specifics

The PBS wrapper is considerably more complicated as initiating interactive processes is not as reliable as Slurm's salloc. Consequently, jobs are submitted via non-interactive queue submissions and the queue polled to determine when the submitted job has completed.

The path to the PBS executables (i.e., qsub and qstat) has to be given in the qsubPath attribute. Furthermore, a temporary directory that's shared across all nodes must be specified in tmpDir.

Distributed execution

Nix has support for distributing jobs amongst a collection of distributed machines. See the manual and wiki for more information.

Citing

  1. Bedő, J., Di Stefano, L., & Papenfuss, A. T. (2020). Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix. GigaScience, 9(11). https://doi.org/10.1093/gigascience/giaa121

Getting help and contributing

For general questions and reporting problem please open an issue. For real-time help there is a chat room at #bionix:nixos.org.

About

Functional highly reproducible bioinformatics pipelines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •