Extension of the state-of-the-art causal discovery method PCMCI augmented with a feature-selection method based on Transfer Entropy. The algorithm, starting from a prefixed set of variables, identifies the correct subset of features and possible links between them which describe the observed process. Then, from the selected features and links, a causal model is built.
Current state-of-the-art causal discovery approaches suffer in terms of speed and accuracy of the causal analysis when the process to be analysed is composed by a large number of features. F-PCMCI is able to select the most meaningful features from a set of variables and build a causal model from such selection. To this end, the causal analysis results faster and more accurate.
In the following it is presented an example showing a comparison between causal models obtained by PCMCI and F-PCMCI causal discovery algorithms on the same data. The latter have been created by defining a 6-variables system defined as follows:
min_lag = 1
max_lag = 1
np.random.seed(1)
nsample = 1500
nfeature = 6
d = np.random.random(size = (nsample, feature))
for t in range(max_lag, nsample):
d[t, 0] += 2 * d[t-1, 1] + 3 * d[t-1, 3]
d[t, 2] += 1.1 * d[t-1, 1]**2
d[t, 3] += d[t-1, 3] * d[t-1, 2]
d[t, 4] += d[t-1, 4] + d[t-1, 5] * d[t-1, 0]
Causal Model by PCMCI | Causal Model by F-PCMCI |
---|---|
Execution time ~ 6min 50sec | Execution time ~ 2min 45sec |
The causal analysis performed by the F-PCMCI results not only faster but also more accurate. Indeed, the causal model derived by the F-PCMCI agrees with the structure of the system of equations, instead the one derived by the PCMCI presents spurious links:
-
$X_2$ →$X_4$ -
$X_2$ →$X_5$
Note that, since all the 6 variables were involved in the evolution of the system, the F-PCMCI did not remove any of them. In the following example instead, we added a new variable in the system which is defined just by the noise component (as
Causal Model by PCMCI | Causal Model by F-PCMCI |
---|---|
Execution time ~ 8min 40sec | Execution time ~ 3min 00sec |
In this case the F-PCMCI removes the
If you found this useful for your work, please cite this papers:
@inproceedings{castri2023fpcmci,
title={Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios},
author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola},
booktitle={Conference on Causal Learning and Reasoning (CLeaR)},
year={2023},
}
- tigramite>=5.1.0.3
- pandas>=1.5.2
- netgraph>=4.10.2
- networkx>=2.8.6
- ruptures>=1.1.7
- scikit_learn>=1.1.3
- torch>=1.11.0
- gpytorch>=1.4
- dcor>=0.5.3
- h5py>=3.7.0
- jpype1>=1.5.0
- mpmath>=1.3.0
Before installing the F-PCMCI package, you need to install Java and the IDTxl package used for the feature-selection process, following the guide described here. Once complete, you can install the current release of F-PCMCI
with:
pip install fpcmci
For a complete installation Java - IDTxl - F-PCMCI, follow the following procedure.
Verify that you have not already installed Java:
java -version
if the latter returns Command 'java' not found, ...
, you can install Java by the following commands, otherwise you can jump to IDTxl installation.
# Java
sudo apt-get update
sudo apt install default-jdk
Then, you need to add JAVA_HOME to the environment
sudo nano /etc/environment
JAVA_HOME="/lib/jvm/java-11-openjdk-amd64/bin/java" # Paste the JAVA_HOME assignment at the bottom of the file
source /etc/environment
# IDTxl
git clone https://github.com/pwollstadt/IDTxl.git
cd IDTxl
pip install -e .
pip install fpcmci
Version | Changes |
---|---|
4.4.1 | installation simplified dag fix and bundle_parallel_edges param added node_proximity param added in timeseries_dag get_skeleton, get_val_matrix, get_pval_matrix added into DAG clean_cls param added to F-PCMCI constructor |
4.4.0 | neglect autodependent nodes fix |
4.3.3 | sourcelist method signature fixed in Node.py |
4.3.2 | documentation improved |
4.3.1 | README and index.md updated |
4.3.0 | alpha level fix in PCMCI timeseries_dag fix adaptation to DAG structure |
4.2.1 | fixed dependency error in setup.py |
4.2.0 | causal model with only selected features fix adapted to tigramite 5.2 get_causal_matrix FPCMCI method added f_alpha and pcmci_alpha instead of alpha requirements changed tutorials adapted to new version |
4.1.2 | tutorials adapted to 4.1.1 and get_SCM method added in FPCMCI |
4.1.1 | PCMCI dependencies fix: FPCMCI causal model field added, FPCMCI.run() and .run_pcmci() outputs the selected variables and the corresponding causal model |
4.1.0 | FSelector and FValidator turned into FPCMCI and PCMCI show_edge_label removed and dag optimized new package included in the setup.py added tutorials new example in README.md |
4.0.1 | online documentation and paths fixes |
4.0.0 | package published |