{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Import hetionet from `dhimmel/integrate`\n", "\n", "[`dhimmel/integrate`](https://github.com/dhimmel/integrate) performs the data integration, creation, permutation, and neo4j export for v1.0 of hetionet. This repository (`hetio/hetionet`) hosts only the completed hetnets and network descriptions. This notebook copies files from `dhimmel/integrate`, some on the GitHub and some created locally, to populate `hetio/hetionet`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os\n", "import shutil\n", "import urllib.request\n", "import tarfile" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# dhimmel/integrate commit\n", "commit = 'ffd1a48b4051c41fc8cef6e8847d0687f1a722bc'\n", "\n", "# Name and version for hetionet\n", "name = 'hetionet-v1.0'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import from GitHub" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "integrate_to_hetionet = {\n", " # Tabular TSVs\n", " 'data/nodes.tsv': 'hetnet/tsv/{}-nodes.tsv'.format(name),\n", " 'data/edges.sif.gz': 'hetnet/tsv/{}-edges.sif.gz'.format(name),\n", " \n", " # JSON\n", " 'data/metagraph.json': 'hetnet/json/{}-metagraph.json'.format(name),\n", " 'data/hetnet.json.bz2': 'hetnet/json/{}.json.bz2'.format(name),\n", " \n", " # Description\n", " 'data/summary/metanodes.tsv': 'describe/nodes/metanodes.tsv',\n", " 'data/summary/metaedges.tsv': 'describe/edges/metaedges.tsv',\n", " 'data/summary/metaedge-styles.tsv': 'describe/edges/metaedge-styles.tsv',\n", " 'data/summary/degrees.xlsx': 'describe/degree/degrees.xlsx',\n", " 'viz/degrees.pdf': 'describe/degree/degrees.pdf',\n", "\n", " # Neo4j nomencalture mappings\n", " 'neo4j/nomenclature/labels.tsv': 'hetnet/neo4j/labels.tsv',\n", " 'neo4j/nomenclature/types.tsv': 'hetnet/neo4j/types.tsv',\n", "}" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for integrate_path, hetionet_path in integrate_to_hetionet.items():\n", " url = 'https://github.com/dhimmel/integrate/raw/{}/{}'.format(commit, integrate_path)\n", " if os.path.exists(hetionet_path):\n", " continue\n", " urllib.request.urlretrieve(url, filename=hetionet_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import local files\n", "\n", "Several files were not uploaded to `dhimmel/integrate` due to filesize. These files are copied over locally." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Specify the local path to the integrate repository\n", "prepend = '../construct/integrate'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Permuted JSON hetnets" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Specify the IDs of permuted hetnets in dhimmel/integrate\n", "perm_ids = range(1, 1 + 5)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "local_map = dict()\n", "for i in perm_ids:\n", " filename = 'hetnet_perm-{}.json.bz2'.format(i)\n", " local_map['data/permuted/{}'.format(filename)] = 'hetnet/permuted/json/{}-perm-{}.json.bz2'.format(name, i)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for integrate_path, hetionet_path in local_map.items():\n", " integrate_path = os.path.join(prepend, integrate_path)\n", " shutil.copy(integrate_path, hetionet_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Neo4j Databases" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "neo4j_map = {\n", " 'neo4j-community-2.3.3_rephetio-v2.0': 'hetnet/neo4j/{}.db'.format(name),\n", "}\n", "\n", "for i in perm_ids:\n", " integrate_filename = 'neo4j-community-2.3.3_rephetio-v2.0_perm-{}'.format(i)\n", " neo4j_map[integrate_filename] = 'hetnet/permuted/neo4j/{}-perm-{}.db'.format(name, i)\n", "\n", "for integrate_filename, hetionet_path in neo4j_map.items():\n", " integrate_path = os.path.join(prepend, 'neo4j', integrate_filename, 'data', 'graph.db')\n", " with tarfile.open('{}.tar.bz2'.format(hetionet_path), \"w:bz2\") as tar:\n", " tar.add(integrate_path, arcname=os.path.basename(integrate_path))\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }