# Import hetionet from `dhimmel/integrate`

[`dhimmel/integrate`](https://github.com/dhimmel/integrate) performs the data integration, creation, permutation, and neo4j export for v1.0 of hetionet. This repository (`hetio/hetionet`) hosts only the completed hetnets and network descriptions. This notebook copies files from `dhimmel/integrate`, some on the GitHub and some created locally, to populate `hetio/hetionet`.

In [1]:
import os
import shutil
import urllib.request
import tarfile

In [2]:
# dhimmel/integrate commit
commit = 'ffd1a48b4051c41fc8cef6e8847d0687f1a722bc'

# Name and version for hetionet
name = 'hetionet-v1.0'

## Import from GitHub

In [3]:
integrate_to_hetionet = {
 # Tabular TSVs
 'data/nodes.tsv': 'hetnet/tsv/{}-nodes.tsv'.format(name),
 'data/edges.sif.gz': 'hetnet/tsv/{}-edges.sif.gz'.format(name),
 
 # JSON
 'data/metagraph.json': 'hetnet/json/{}-metagraph.json'.format(name),
 'data/hetnet.json.bz2': 'hetnet/json/{}.json.bz2'.format(name),
 
 # Description
 'data/summary/metanodes.tsv': 'describe/nodes/metanodes.tsv',
 'data/summary/metaedges.tsv': 'describe/edges/metaedges.tsv',
 'data/summary/metaedge-styles.tsv': 'describe/edges/metaedge-styles.tsv',
 'data/summary/degrees.xlsx': 'describe/degree/degrees.xlsx',
 'viz/degrees.pdf': 'describe/degree/degrees.pdf',

 # Neo4j nomencalture mappings
 'neo4j/nomenclature/labels.tsv': 'hetnet/neo4j/labels.tsv',
 'neo4j/nomenclature/types.tsv': 'hetnet/neo4j/types.tsv',
}

In [4]:
for integrate_path, hetionet_path in integrate_to_hetionet.items():
 url = 'https://github.com/dhimmel/integrate/raw/{}/{}'.format(commit, integrate_path)
 if os.path.exists(hetionet_path):
 continue
 urllib.request.urlretrieve(url, filename=hetionet_path)

## Import local files

Several files were not uploaded to `dhimmel/integrate` due to filesize. These files are copied over locally.

In [5]:
# Specify the local path to the integrate repository
prepend = '../construct/integrate'

### Permuted JSON hetnets

In [6]:
# Specify the IDs of permuted hetnets in dhimmel/integrate
perm_ids = range(1, 1 + 5)

In [7]:
local_map = dict()
for i in perm_ids:
 filename = 'hetnet_perm-{}.json.bz2'.format(i)
 local_map['data/permuted/{}'.format(filename)] = 'hetnet/permuted/json/{}-perm-{}.json.bz2'.format(name, i)

In [8]:
for integrate_path, hetionet_path in local_map.items():
 integrate_path = os.path.join(prepend, integrate_path)
 shutil.copy(integrate_path, hetionet_path)

### Neo4j Databases

In [9]:
neo4j_map = {
 'neo4j-community-2.3.3_rephetio-v2.0': 'hetnet/neo4j/{}.db'.format(name),
}

for i in perm_ids:
 integrate_filename = 'neo4j-community-2.3.3_rephetio-v2.0_perm-{}'.format(i)
 neo4j_map[integrate_filename] = 'hetnet/permuted/neo4j/{}-perm-{}.db'.format(name, i)

for integrate_filename, hetionet_path in neo4j_map.items():
 integrate_path = os.path.join(prepend, 'neo4j', integrate_filename, 'data', 'graph.db')
 with tarfile.open('{}.tar.bz2'.format(hetionet_path), "w:bz2") as tar:
 tar.add(integrate_path, arcname=os.path.basename(integrate_path))
