Skip to content

Reproducing SafeDrug model For CS598: Deep Learning for Healthcare (Spring '22)

Notifications You must be signed in to change notification settings

samhq/cs598dl4h-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducing SafeDrug For CS598: Deep Learning for Healthcare (Spring '22)

Paper and Group Details

Project Structure

  • data/
    • processing.py: The data preprocessing file.
    • input/
    • output/
      • atc3toSMILES.pkl: drug ID (we use ATC-3 level code to represent drug ID) to drug SMILES string dict
      • ddi_A_final.pkl: ddi adjacency matrix
      • ddi_matrix_H.pkl: H mask structure (This file is created by ddi_mask_H.py)
      • ehr_adj_final.pkl: used in GAMENet baseline (if two drugs appear in one set, then they are connected)
      • records_final.pkl: The final diagnosis-procedure-medication EHR records of each patient, used for train/val/test split.
      • voc_final.pkl: diag/prod/med index to code dictionary
  • src/
    • SafeDrug.py: our model
    • baseline models:
      • GAMENet.py
      • DMNC.py
      • Leap.py
      • Retain.py
      • ECC.py
      • LR.py
    • setting file
      • model.py
      • util.py
      • layer.py
    • analysis file
      • Result-Analysis.ipynb
  • dependency.sh
  • requirements.txt
  • README.md

After the processing have been done, we get the following statistics:

# patients  6350
# clinical events  15032
# diagnosis  1958
# med  112
# procedure 1430
# avg of diagnoses  10.5089143161256
# avg of medicines  11.647751463544438
# avg of procedures  3.8436668440659925
# avg of vists  2.367244094488189
# max of diagnoses  128
# max of medicines  64
# max of procedures  50
# max of visit  29

Execution

Step 1: Environment Setup

  • First, install the rdkit (RDKit: Open-Source Cheminformatics Software) conda environment

    conda create -c conda-forge -n SafeDrug rdkit
    conda activate SafeDrug
  • Clone this repository in your preferred location. We assume that you clone it in your home directory.

    cd ~
    git clone [email protected]:samhq/cs598dl4h-project.git 
  • In SafeDrug environment, run the following commands to install required python packages (according to your GPU support)

    cd ~/cs598dl4h-project
    
    # if you don't have GPU
    ./dependency.sh
    
    # if you have GPU
    ./dependency.sh 1

Step 2: Obtaining Data and Processing

  • Go to https://physionet.org/content/mimiciii/1.4/ to download the MIMIC-III dataset (You may need to get the certificate)

    wget -r -N -c -np --user [account] --ask-password https://physionet.org/files/mimiciii/1.4/
  • Go into the folder and unzip required three files and copy them to the ~/cs598dl4h-project/data/input/ folder

    cd ~/physionet.org/files/mimiciii/1.4
    gzip -d PROCEDURES_ICD.csv.gz # procedure information
    gzip -d PRESCRIPTIONS.csv.gz  # prescription information
    gzip -d DIAGNOSES_ICD.csv.gz  # diagnosis information
    cp PROCEDURES_ICD.csv PRESCRIPTIONS.csv DIAGNOSES_ICD.csv ~/cs598dl4h-project/data/input/
  • Download additional files in the ~/cs598dl4h-project/data/input/ folder

    cd ~/cs598dl4h-project/data/input/
    ./get_additional_files.sh
  • Processing the data to get a complete records_final.pkl

    cd ~/cs598dl4h-project/data
    python processing.py

Step 3: Run Model(s)

To run the SafeDrug model, run the following:

cd ~/cs598dl4h-project/src
python SafeDrug.py

here is the argument:

usage: SafeDrug.py [-h] [--Test] [--model_name=MODEL_NAME]
               [--resume_path=RESUME_PATH] [--lr=LR]
               [--target_ddi=TARGET_DDI] [--kp=KP] [--dim=DIM]
               [--epoch=EPOCH]

optional arguments:
  -h, --help                  show this help message and exit
  --Test                      test mode
  --model_name MODEL_NAME     model name
  --resume_path RESUME_PATH   resume path
  --lr LR                     learning rate
  --target_ddi TARGET_DDI     target ddi
  --kp KP                     coefficient of P signal
  --dim DIM                   dimension
  --epoch EPOCH               how many epoch

If you want to run all models consecutively, then run:

cd ~/cs598dl4h-project/src
./run_models.sh [NUMBER_OF_EPOCHS]

Step 4: Analysis of the results

Please check the Jupyter Notebook here.

Results

Model DDI Jaccard F1-score PRAUC Avg. # of Drugs
LR 0.0775 0.4900 0.6470 0.7553 -
ECC 0.0806 0.4868 0.6428 0.7602 -
RETAIN 0.0851 ± 0.0028 0.4711 ± 0.0140 0.6337 ± 0.0129 0.7512 ± 0.0126 17.9925 ± 0.8751
LEAP 0.0689 ± 0.0028 0.4369 ± 0.0117 0.6002 ± 0.0116 0.6467 ± 0.0068 19.1096 ± 0.1240
GAMENet 0.0836 ± 0.0067 0.4790 ± 0.0260 0.6382 ± 0.0240 0.7393 ± 0.0247 25.1478 ± 1.1325
SafeDrug 0.0627 ± 0.0023 0.5051 ± 0.0150 0.6624 ± 0.0134 0.7604 ± 0.0117 19.3245 ± 0.5557
SafeDrug* 0.0589 ± 0.0005 0.5213 ± 0.0030 0.6768 ± 0.0027 0.7647 ± 0.0025 19.9178 ± 0.1604
  • values from the original SafeDrug model paper

Further analysis can be found at the Jupyter Notebook here.

Credits

Our work followed the original codes at https://github.com/ycq091044/SafeDrug.

About

Reproducing SafeDrug model For CS598: Deep Learning for Healthcare (Spring '22)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages