PyHealth is designed for both ML researchers and medical practitioners. We can make your healthcare AI applications easier to deploy and more flexible and customizable. [Tutorials]
PyHealth can support diverse electronic health records (EHRs) such as MIMIC and eICU and all OMOP-CDM based databases and provide various advanced deep learning algorithms for handling important healthcare tasks such as diagnosis-based drug recommendation, patient hospitalization and mortality prediction, and ICU length stay forecasting, etc.
Build a healthcare AI pipeline can be as short as 10 lines of code in PyHealth.
- You could install from PyPi:
pip install pyhealth
- or from github source:
git clone https://github.com/sunlabuiuc/PyHealth.git
cd pyhealth
pip install .
- Required Dependencies
python>=3.8
torch>=1.8.0
rdkit>=2022.03.4
scikit-learn>=0.24.2
networkx>=2.6.3
pandas>=1.3.2
tqdm
All healthcare tasks in our package follow a five-stage pipeline:
load dataset -> define task function -> build ML/DL model -> model training -> inference
! We try hard to make sure each stage is as separate as possibe, so that people can customize their own pipeline by only using our data processing steps or the ML models. Each step will call one module and we introduce them using an example.
- STEP 1: <pyhealth.datasets> provides a clean structure for the dataset, independent from the tasks. We support
MIMIC-III
,MIMIC-IV
andeICU
, as well as the standardOMOP-formatted data
. The dataset is stored in a unifiedPatient-Visit-Event
structure.
from pyhealth.datasets import MIMIC3Dataset
mimic3dataset = MIMIC3Dataset(
root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
# map all NDC codes to ATC 3-rd level codes in these tables
code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
)
- STEP 2: <pyhealth.tasks> inputs the
<pyhealth.datasets>
object and defines how to process each pateint's data into a set of samples for the tasks. In the package, we provide several task examples, such asdrug recommendation
andlength of stay prediction
.
from pyhealth.tasks import drug_recommendation_mimic3_fn
from pyhealth.datasets import split_by_patient, get_dataloader
mimic3dataset.set_task(task_fn=drug_recommendation_mimic3_fn) # use default task
train_ds, val_ds, test_ds = split_by_patient(mimic3dataset, [0.8, 0.1, 0.1])
# create dataloaders
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)
- STEP 3: <pyhealth.models> provides the healthcare ML models using
<pyhealth.models>
. This module also provides model layers, such aspyhealth.models.RETAINLayer
for building customized ML architectures. Our model layers can used as easily astorch.nn.Linear
.
from pyhealth.models import Transformer
model = Transformer(
dataset=mimic3dataset,
feature_keys=["conditions", "procedures"],
label_key="drugs",
mode="multilabel",
operation_level="visit",
)
- STEP 4: <pyhealth.trainer> is the training manager with
train_loader
, theval_loader
,val_metric
, and specify other arguemnts, such as epochs, optimizer, learning rate, etc. The trainer will automatically save the best model and output the path in the end.
from pyhealth.trainer import Trainer
trainer = Trainer(model=model)
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
monitor="pr_auc_samples",
)
- STEP 5: <pyhealth.metrics> provides: (i) common evaluation metrics and the usage is the same as
<pyhealth.metrics>
; (ii) metrics (weighted by patients) for patient-level tasks; (iii) special metrics in healthcare, such as drug-drug interaction (DDI) rate.
trainer.evaluate(test_loader)
- <pyhealth.codemap> provides two core functionalities: (i) looking up information for a given medical code (e.g., name, category, sub-concept); (ii) mapping codes across coding systems (e.g., ICD9CM to CCSCM). This module can be easily applied to your research.
- For code mapping between two coding systems
from pyhealth.medcode import CrossMap
codemap = CrossMap.load("ICD9CM", "CCSCM")
codemap.map("82101") # use it like a dict
codemap = CrossMap.load("NDC", "ATC")
codemap.map("00527051210")
- For code ontology lookup within one system
from pyhealth.medcode import InnerMap
icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0") # get detailed info
icd9cm.get_ancestors("428.0") # get parents
- <pyhealth.tokenizer> is used for transformations between string-based tokens and integer-based indices, based on the overall token space. We provide flexible functions to tokenize 1D, 2D and 3D lists. This module can be used in many other scenarios.
from pyhealth.tokenizer import Tokenizer
# Example: we use a list of ATC3 code as the token
token_space = ['A01A', 'A02A', 'A02B', 'A02X', 'A03A', 'A03B', 'A03C', 'A03D', \
'A03F', 'A04A', 'A05A', 'A05B', 'A05C', 'A06A', 'A07A', 'A07B', 'A07C', \
'A12B', 'A12C', 'A13A', 'A14A', 'A14B', 'A16A']
tokenizer = Tokenizer(tokens=token_space, special_tokens=["<pad>", "<unk>"])
# 2d encode
tokens = [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', 'B035', 'C129']]
indices = tokenizer.batch_encode_2d(tokens) # [[8, 9, 10, 11], [12, 1, 1, 0]]
# 2d decode
indices = [[8, 9, 10, 11], [12, 1, 1, 0]]
tokens = tokenizer.batch_decode_2d(indices) # [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', '<unk>', '<unk>']]
We provide the following tutorials to help users get started with our pyhealth.
Tutorial 0: Introduction to pyhealth.data
Tutorial 1: Introduction to pyhealth.datasets
Tutorial 2: Introduction to pyhealth.tasks
Tutorial 3: Introduction to pyhealth.models
Tutorial 4: Introduction to pyhealth.trainer
Tutorial 5: Introduction to pyhealth.metrics
Tutorial 6: Introduction to pyhealth.tokenizer
Tutorial 7: Introduction to pyhealth.medcode
The following tutorials will help users build their own task pipelines.
Pipeline 1: Drug Recommendation
Pipeline 2: Length of Stay Prediction
Pipeline 3: Readmission Prediction
Pipeline 4: Mortality Prediction
Users can customize their healthcare AI pipeline as simply as calling one module
- process your OMOP data via
pyhealth.datasets
- process the open eICU (e.g., MIMIC) data via
pyhealth.datasets
- define your own task on existing databases via
pyhealth.tasks
- use existing healthcare models or build upon it (e.g., RETAIN) via
pyhealth.models
. - code map between for conditions and medicaitons via
pyhealth.codemap
.
We provide the following datasets for general purpose healthcare AI research:
Dataset | Module | Year | Information |
---|---|---|---|
MIMIC-III | pyhealth.datasets.MIMIC3BaseDataset |
2016 | MIMIC-III Clinical Database |
MIMIC-IV | pyhealth.datasets.MIMIC4BaseDataset |
2020 | MIMIC-IV Clinical Database |
eICU | pyhealth.datasets.eICUBaseDataset |
2018 | eICU Collaborative Research Database |
OMOP | pyhealth.datasets.OMOPBaseDataset |
OMOP-CDM schema based dataset |
Model Name | Type | Module | Year | Reference |
---|---|---|---|---|
Logistic Regression (LR) | classifical ML | pyhealth.models.MLModel |
sklearn.linear_model.LogisticRegression |
|
Random Forest (RF) | classifical ML | pyhealth.models.MLModel |
sklearn.ensemble.RandomForestClassifier |
|
Neural Networks (NN) | classifical ML | pyhealth.models.MLModel |
sklearn.neural_network.MLPClassifier |
|
Convolutional Neural Network (CNN) | deep learning | pyhealth.models.CNN |
1989 | Handwritten Digit Recognition with a Back-Propagation Network |
Recurrent Neural Nets (RNN) | deep Learning | pyhealth.models.RNN |
2011 | Recurrent neural network based language model |
Transformer | deep Learning | pyhealth.models.Transformer |
2017 | Atention is All you Need |
RETAIN | deep Learning | pyhealth.models.RETAIN |
2016 | RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism |
GAMENet | deep Learning | pyhealth.models.GAMENet |
2019 | GAMENet: Graph Attention Mechanism for Explainable Electronic Health Record Prediction |
MICRON | deep Learning | pyhealth.models.MICRON |
2021 | Change Matters: Medication Change Prediction with Recurrent Residual Networks |
SafeDrug | deep Learning | pyhealth.models.SafeDrug |
2021 | SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations |
- Here is a temporary benchmark doc on healthcare tasks. We will put the results in this section below.