ramBLe (A Parallel Framework for Bayesian Learning) supports multiple constraint-based algorithms for structure learning from data in parallel.
- gcc (with C++14 support) is used for compiling the project.
This project has been tested only on Linux platform, using version 9.2.0. - Boost libraries are used for parsing the command line options, logging, and a few other purposes.
Tested with version 1.70.0. - MPI is used for execution in parallel.
Tested with MVAPICH2 version 2.3.3. - SCons is required for building the project.
Tested with version 3.1.2. - The following repositories are used as submodules:
- Google Test (optional) framework is used for unit testing in this project.
If this dependency is not satisfied, then the unit tests are not built. See the relevant section in Building for more information.
Tested with version 1.10.0.
After the dependencies have been installed, the project can be built as:
scons
This will create an executable named ramble
, which can be used for constraint-based structure learning.
By default, all the paths from the environment in CPATH
and LIBRARY_PATH
variables are used as include paths and library paths.
Path to external includes and libraries at non-default locations can also be specified as:
scons LOCALINCLUDES=<comma-delimited list of paths> LOCALLIBS=<comma-delimited list of paths>
The unit tests are built by default. The following can be executed for building only the executable:
scons TEST=0
For building the debug version of the executable, the following can be executed:
scons DEBUG=1
Debug version of the executable is named ramble_debug
.
By default, logging is disabled in the release build and enabled in the debug build.
In order to change the default behavior, LOGGING=[0,1]
argument can be passed to scons
:
scons LOGGING=1 # Enables logging in the release build
Please be aware that enabling logging will affect the performance.
Timing of high-level operations can be enabled by passing TIMER=1
argument to scons
.
Once the project has been built, the executable can be used for learning BN as follows:
./ramble -f test/coronary.csv -n 6 -m 1841 -d -o test/coronary.dot
For running in parallel, the following can be executed:
mpirun -np 8 ./ramble -f test/coronary.csv -n 6 -m 1841 -d -o test/coronary.dot
Please execute the following for more information on all the options that the executable accepts:
./ramble --help
The algorithm for learning BNs can be chosen by specifying the desired algorithm as an option to the executable, using -a
option. The currently supported algorithms are listed below.
The algorithms in this category first learn the local neighborhood of each variable separately and then combine these neighborhoods to get the complete network.
This class of algorithms first finds the Markov blanket (MB) of the variable to get the parents and the children (PC).
gs
corresponds to the Grow-Shrink (GS) algorithm by Margaritis & Thrun.iamb
corresponds to the Incremental Association MB (IAMB) algorithm by Tsamardinos et al.inter.iamb
corresponds to the Interleaved Incremental Association MB (InterIAMB) by Tsamardinos et al.
This class of algorithms directly finds the PC sets of nodes.
mmpc
corresponds to the Max-Min PC (MMPC) algorithm by Tsamardinos et al. and corrected by Pena et al.si.hiton.pc
corresponds to the Semi-interleaved HITON-PC algorithm by Aliferis et al.hiton
(sequential only) corresponds to the HITON-PC algorithm by Aliferis et al. and corrected by Pena et al.getpc
(sequential only) corresponds to the Get PC algorithm by Pena et al.
This class of algorithms learn the network directly by iteratively eliminating edges between variables which are found to be independent.
pc.stable
corresponds to the PC-stable algorithm by Colombo et al.pc.stable.2
is an alternate parallel algorithm for PC-stable that learns the same network aspc.stable
The experiments reported in the publication can be reproduced using EXPERIMENTS.md
.
Our code is licensed under the Apache License 2.0 (see LICENSE
).