This repository quantifies term cooccurrence in MEDLINE. It's designed for computing the cooccurence of all pairs between two MeSH termsets. The repository computes MEDLINE cooccurences for the Rephetio hetnet. See the corresponding Thinklab discussion for more information.
eutility.py
defines anesearch_query
function for retreiving PubMed IDs matching a user-defined query.cooccurrence.py
computes the cooccurences bewteen two termsets, whose associated PubMed IDs have been retrieved.
diseases.ipynb
computes disease-disease cooccurrencesymptoms.ipynb
computes symptom-disease cooccurrencetissues.ipynb
computes anatomy-disease cooccurrence. This notebook depends ondata/disease-pmids.tsv.gz
, a dataset created bysymptoms.ipynb
.
# create environment
conda env create --file=environment.yml
# update environment
conda env update --file=environment.yml
# activate environment
conda activate medline
# run jupyter lab for notebook development
jupyter lab
On 2021-04-09, ownership of this repository on GitHub was changed from dhimmel/medline
to hetio/medline
.
The hetio
organization has GitHub LFS quota,
providing a more convenient way to store large compressed files.
At the time of the transfer, the only default (and only) branch was gh-pages
.
The gh-pages
branch was renamed to pre-lfs-archive
.
A new default branch main
was created, whose history has been migrated to use Git LFS.
For the version of this repository used by Project Rephetio to create Hetionet v1.0,
refer to the v1.0 release.
This repository is released under CC0 1.0.