diff --git a/data/.gitignore b/data/.gitignore index 8b13789..0cbfe16 100644 --- a/data/.gitignore +++ b/data/.gitignore @@ -1 +1,15 @@ +# This gitignore contains all those data files, which are not kept on the repo +# and thus have to be downloaded automatically or manually (in case registration +# or journal access is needed) + +# The temporary download folder: +tmp/ + + +################ +# PPIs +################# + +# Bossi & Lehner PPI file +ppis/CRG.integrated.human.interactome.txt diff --git a/data/README.md b/data/README.md new file mode 100644 index 0000000..4d53e6e --- /dev/null +++ b/data/README.md @@ -0,0 +1,13 @@ +Data Sources +============ + +Protein-Protein Interaction Networks +------------------------------------ + +### Bossi & Lehner composite PPI network + +From the supplementary section of the paper ["Tissue specificity and the human protein interaction network"](http://www.ncbi.nlm.nih.gov/pubmed/19357639). + +This can be automatically downloaded via the download_data.sh script. + + diff --git a/data/download_data.sh b/data/download_data.sh new file mode 100755 index 0000000..3913da2 --- /dev/null +++ b/data/download_data.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +# This script downloads all publicly available data into the correct folders. +# Run this before using the pipeline. +# Don't forget to download the data, that is not publicly available (see README.md for +# sources and further explainations) + +# A few settings +TMP_FOLDER=tmp + + +# make tmp folder +rm -R $TMP_FOLDER +mkdir -p $TMP_FOLDER + + +#################### +# PPI networks +################### + +# 1.) Bossi & Lehner: +BOSSI_URL=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2683721/bin/msb200917-s2.zip +BOSSI_ZIP=$TMP_FOLDER/bossi-lehner-suppl.zip + +curl -o $BOSSI_ZIP $BOSSI_URL +unzip $BOSSI_ZIP -d $TMP_FOLDER +cp $TMP_FOLDER/CRG-human-interactome/CRG.integrated.human.interactome.txt ppis/ + +