Skip to content

asrivast28/ramBLe

Repository files navigation

ramBLe - A Parallel Framework for Bayesian Learning

Apache 2.0 License DOI

ramBLe (A Parallel Framework for Bayesian Learning) supports multiple constraint-based algorithms for structure learning from data in parallel.

Requirements

  • gcc (with C++14 support) is used for compiling the project.
    This project has been tested only on Linux platform, using version 9.2.0.
  • Boost libraries are used for parsing the command line options, logging, and a few other purposes.
    Tested with version 1.70.0.
  • MPI is used for execution in parallel.
    Tested with MVAPICH2 version 2.3.3.
  • SCons is required for building the project.
    Tested with version 3.1.2.
  • The following repositories are used as submodules:
    • BN Utils contains common utilities for BN learning in parallel and scripts for post-processing.
    • mxx is used as a C++ wrapper for MPI.
    • Graph API is used as a lightweight wrapper around Boost.Graph.
    • C++ Utils are used for logging and timing.
  • Google Test (optional) framework is used for unit testing in this project.
    If this dependency is not satisfied, then the unit tests are not built. See the relevant section in Building for more information.
    Tested with version 1.10.0.

Building

After the dependencies have been installed, the project can be built as:

scons

This will create an executable named ramble, which can be used for constraint-based structure learning.
By default, all the paths from the environment in CPATH and LIBRARY_PATH variables are used as include paths and library paths.
Path to external includes and libraries at non-default locations can also be specified as:

scons LOCALINCLUDES=<comma-delimited list of paths> LOCALLIBS=<comma-delimited list of paths>

Unit Tests

The unit tests are built by default. The following can be executed for building only the executable:

scons TEST=0

Debug

For building the debug version of the executable, the following can be executed:

scons DEBUG=1

Debug version of the executable is named ramble_debug.

Logging

By default, logging is disabled in the release build and enabled in the debug build. In order to change the default behavior, LOGGING=[0,1] argument can be passed to scons:

scons LOGGING=1 # Enables logging in the release build

Please be aware that enabling logging will affect the performance.

Timing

Timing of high-level operations can be enabled by passing TIMER=1 argument to scons.

Execution

Once the project has been built, the executable can be used for learning BN as follows:

./ramble -f test/coronary.csv -n 6 -m 1841 -d -o test/coronary.dot

For running in parallel, the following can be executed:

 mpirun -np 8 ./ramble -f test/coronary.csv -n 6 -m 1841 -d -o test/coronary.dot

Please execute the following for more information on all the options that the executable accepts:

./ramble --help

Algorithms

The algorithm for learning BNs can be chosen by specifying the desired algorithm as an option to the executable, using -a option. The currently supported algorithms are listed below.

Local-to-Global Learning

The algorithms in this category first learn the local neighborhood of each variable separately and then combine these neighborhoods to get the complete network.

Blanket Learning

This class of algorithms first finds the Markov blanket (MB) of the variable to get the parents and the children (PC).

Direct Learning

This class of algorithms directly finds the PC sets of nodes.

  • mmpc corresponds to the Max-Min PC (MMPC) algorithm by Tsamardinos et al. and corrected by Pena et al.
  • si.hiton.pc corresponds to the Semi-interleaved HITON-PC algorithm by Aliferis et al.
  • hiton (sequential only) corresponds to the HITON-PC algorithm by Aliferis et al. and corrected by Pena et al.
  • getpc (sequential only) corresponds to the Get PC algorithm by Pena et al.

Global Learning

This class of algorithms learn the network directly by iteratively eliminating edges between variables which are found to be independent.

  • pc.stable corresponds to the PC-stable algorithm by Colombo et al.
  • pc.stable.2 is an alternate parallel algorithm for PC-stable that learns the same network as pc.stable

Publication

Ankit Srivastava, Sriram Chockalingam, and Srinivas Aluru. "A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery." In 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), IEEE Computer Society, 2020.

The experiments reported in the publication can be reproduced using EXPERIMENTS.md.

Licensing

Our code is licensed under the Apache License 2.0 (see LICENSE).