initial commit of IAIABL code

alinajadebarnett · Oct 13, 2021 · 187d89f · 187d89f
commit 187d89f
Show file tree

Hide file tree

Showing 44 changed files with 7,579 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,13 @@
+The copyrights of this software are owned by Duke University. As such, two licenses to this software are offered:
+1. An open-source license under the MIT license for non-commercial use.
+2. A custom license with Duke University, for commercial use or for use without the MIT license restrictions. 
+
+As a recipient of this software, you may choose which license to receive the code under. Outside contributions to the Duke owned code base cannot be accepted unless the contributor transfers the copyright to those changes over to Duke University.
+To enter a custom license agreement without the MIT license restrictions, please contact the Digital Innovations department at Duke Office for Translation & Commercialization (OTC) (https://olv.duke.edu/software/) at [email protected].
+
+Please note that this software is distributed AS IS, WITHOUT ANY WARRANTY; and without the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+--
+
+(c) Copyright 2021. Duke University. All Rights Reserved. 
+Developed by Alina Jade Barnett, Fides Regina Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Y. Lo, 
+and Cynthia Rudin at Duke University. 
diff --git a/README.md b/README.md
@@ -0,0 +1,89 @@
+# IAIA-BL
+
+This code package implements IAIA-BL from the manuscript "IAIA-BL: 
+A Case-based Interpretable Deep Learning Model for Classification 
+of Mass Lesions in Digital Mammography" by Alina Jade Barnett, Fides 
+Regina Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Y. Lo, 
+and Cynthia Rudin.
+
+This code package was developed by the authors at Duke University and 
+University of Maine, and licensed under a dual License (see LICENSE 
+for more information regarding the use and the distribution of this 
+code package).
+
+## Prerequisites
+Any operating system on which you can run GPU-accelerated 
+PyTorch. Python 3.6.9. For packages see requirements.txt.
+### Recommended hardware
+2 NVIDIA Tesla P-100 GPUs or 2 NVIDIA Tesla V-100 GPUs
+
+## Installation instructions
+1. Git clone the repository to /usr/xtmp/IAIABL/. 
+3. Set up your environment using Python 3.6.9 and requirements.txt. 
+ (Optional) Set up your environment using requirements.txt so that "source
+ /home/virtual_envs/ml/bin/activate" activates your environment. You can 
+ set up the environment differently if you choose, but all .sh scripts 
+ included will attempt to activate the environment at 
+ /home/virtual_envs/ml/bin/activate.
+Typical install time: Less than 10 minutes.
+
+## Train the model
+1. In train.sh, the appropriate file locations should be set for train_dir, 
+test_dir, push_dir and finer_dir:
+ 1. train_dir is the directory containing the augmented training set
+ 2. test_dir is the directory containing the test set
+ 3. push_dir is the directory containing the original (unaugmented) training 
+ set, onto which prototypes can be projected
+ 4. finer_dir is the directory containing the augmented set of training 
+ examples with fine-scale annotations
+
+2. Run train.sh
+
+## Reproducing figures
+No data is provided with this code repository. The following scripts are 
+included to demonstrate how figures and results were created for the 
+paper. The following scripts require data to be provided. Type "source 
+scriptname.sh" into the command line to run.
+
+1. see_explanations.sh
+
+Expected output from see_explanations.sh are figures from the 
+manuscript that begin with "An automatically generated explanation 
+of mass margin classification." The paths to the output images will 
+appear in the relative file location "./visualizations_of_expl/".
+
+2. see_prototype_grid.sh
+
+Expected output from see_prototype_grid.sh will be a grid of prototypes 
+for a given model. The file location where the output image can be 
+found will be printed onto the command line.
+
+3. run_gradCAM.sh
+
+Expected output from run_gradCAM.sh will show the activation precision of the
+sample data. It will also print a visualization in 
+/usr/xtmp/IAIABL/gradCAM_imgs/view.png. The columns from left to right are 
+"Original Image," "GradCAM heatmap," "GradCAM++ heatmap," "GradCAM heatmap 
+overlayed on the original image," and "GradCAM++ heatmap overlayed on the 
+original image." The rows are "Last layer, using a network trained on natural
+images," "6th layer, using a network trained on natural images," "Blank," and 
+"Last layer, using a network trained to identify the mass margin."
+
+4. The mal_for_reviewers.ipynb Jupyter notebook is also included.
+
+Expected output from mal_for_reviewers.ipynb is in the cells of the notebook.
+
+Expected run time for these four demo files: 10 minutes.
+
+## Other functions
+The following scripts require the more of the (private) dataset in order to 
+run correctly, but are included to aid in reproducibility:
+1. dataaugment.sh - for offline data augmentation
+2. plot_graph.sh - plots a variety of graphs
+3. run_global_analysis.sh - provides a global analysis of the model
+4. train_vanilla_malignancy.sh - for training the baseline models
+
+## Expected Data Location
+Scripts are set up to expect data as numpy arrays in 
+/usr/xtmp/IAIABL/Lo1136i/test/Circumscribed/ where Circumscribed is the 
+mass margin label.
diff --git a/dataHandling.py b/dataHandling.py
@@ -0,0 +1,173 @@
+from __future__ import division
+import numpy as np
+import os
+import pandas as pd
+import argparse
+import sys
+import random
+import png
+from matplotlib.pyplot import imsave, imread
+import matplotlib
+import matplotlib.pyplot as plt
+from PIL import Image
+import cv2
+matplotlib.use("Agg")
+import torchvision.datasets as datasets
+from skimage.transform import resize
+import ast
+import pickle
+import csv
+import pydicom as dcm
+import Augmentor
+from tqdm import tqdm
+import pathlib
+from torch import randint, manual_seed
+from copy import copy
+from collections import defaultdict
+
+def random_flip(input, axis, with_fa=False):
+ ran = random.random()
+ if ran > 0.5:
+ if with_fa:
+ axis += 1
+ return np.flip(input, axis=axis)
+ else:
+ return input
+
+def random_crop(input, with_fa=False):
+ ran = random.random()
+ if ran > 0.2:
+ # find a random place to be the left upper corner of the crop
+ if with_fa:
+ rx = int(random.random() * input.shape[1] // 10)
+ ry = int(random.random() * input.shape[2] // 10)
+ return input[:, rx: rx + int(input.shape[1] * 9 // 10), ry: ry + int(input.shape[2] * 9 // 10)]
+ else:
+ rx = int(random.random() * input.shape[0] // 10)
+ ry = int(random.random() * input.shape[1] // 10)
+ return input[rx: rx + int(input.shape[0] * 9 // 10), ry: ry + int(input.shape[1] * 9 // 10)]
+ else:
+ return input
+
+def random_rotate_90(input, with_fa=False):
+ ran = random.random()
+ if ran > 0.5:
+ if with_fa:
+ return np.rot90(input, axes=(1,2))
+ return np.rot90(input)
+ else:
+ return input
+
+def random_rotation(x, chance, with_fa=False):
+ ran = random.random()
+ if with_fa:
+ img = Image.fromarray(x[0])
+ mask = Image.fromarray(x[1])
+ if ran > 1 - chance:
+ # create black edges
+ angle = np.random.randint(0, 90)
+ img = img.rotate(angle=angle, expand=1)
+ mask = mask.rotate(angle=angle, expand=1, fillcolor=1)
+ return np.stack([np.asarray(img), np.asarray(mask)])
+ else:
+ return np.stack([np.asarray(img), np.asarray(mask)])
+ img = Image.fromarray(x)
+ if ran > 1 - chance:
+ # create black edges
+ angle = np.random.randint(0, 90)
+ img = img.rotate(angle=angle, expand=1)
+ return np.asarray(img)
+ else:
+ return np.asarray(img)
+
+def augment_numpy_images(path, targetNumber, targetDir, skip=None, rot=True, with_fa=False):
+ classes = os.listdir(path)
+ if not os.path.exists(targetDir):
+ os.mkdir(targetDir)
+ for class_ in classes:
+ if not os.path.exists(targetDir + class_):
+ os.makedirs(targetDir + class_)
+
+ for class_ in classes:
+ count, round = 0, 0
+ while count < targetNumber:
+ round += 1
+ for root, dir, files in os.walk(os.path.join(path, class_)):
+ for file in files:
+ if skip and skip in file:
+ continue
+ filepath = os.path.join(root, file)
+ arr = np.load(filepath)
+ print("loaded ", file)
+ print(arr.shape)
+ try:
+ arr = random_crop(arr, with_fa)
+ print(arr.shape)
+ if rot:
+ arr = random_rotation(arr, 0.9, with_fa)
+ print(arr.shape)
+ arr = random_flip(arr, 0, with_fa)
+ arr = random_flip(arr, 1, with_fa)
+ arr = random_rotate_90(arr, with_fa)
+ arr = random_rotate_90(arr, with_fa)
+ arr = random_rotate_90(arr, with_fa)
+ print(arr.shape)
+ if with_fa:
+ whites = arr.shape[2] * arr.shape[1] - np.count_nonzero(np.round(arr[0] - np.amax(arr[0]), 2))
+ black = arr.shape[2] * arr.shape[1] - np.count_nonzero(np.round(arr[0], 2))
+ if arr.shape[2] < 10 or arr.shape[1] < 10 or black >= arr.shape[2] * arr.shape[1] * 0.8 or \
+ whites >= arr.shape[2] * arr.shape[1] * 0.8:
+ print("illegal content")
+ continue
+
+ else:
+ whites = arr.shape[0] * arr.shape[1] - np.count_nonzero(np.round(arr - np.amax(arr), 2))
+ black = arr.shape[0] * arr.shape[1] - np.count_nonzero(np.round(arr, 2))
+
+ if arr.shape[0] < 10 or arr.shape[1] < 10 or black >= arr.shape[0] * arr.shape[1] * 0.8 or \
+ whites >= arr.shape[0] * arr.shape[1] * 0.8:
+ print("illegal content")
+ continue
+
+ if count % 10 == 0:
+ if not os.path.exists("./visualizations_of_augmentation/" + class_ + "/"):
+ os.makedirs("./visualizations_of_augmentation/" + class_ + "/")
+ if with_fa:
+ imsave("./visualizations_of_augmentation/" + class_ + "/" + str(count), np.transpose(np.stack([arr[0], arr[0], arr[1]]), (1,2,0)))
+ else:
+ imsave("./visualizations_of_augmentation/" + class_ + "/" + str(count), np.transpose(np.stack([arr, arr, arr]), (1,2,0)))
+
+
+ np.save(targetDir + class_ + "/" + file[:-4] + "aug" + str(round), arr)
+ count += 1
+ print(count)
+ except:
+ print("something is wrong in try, details:", sys.exc_info()[2])
+ if not os.path.exists("./error_of_augmentation/" + class_ + "/"):
+ os.makedirs("./error_of_augmentation/" + class_ + "/")
+ np.save("./error_of_augmentation/" + class_ + "/" + str(count), arr)
+ if count > targetNumber:
+ break
+ print(count)
+
+def window_augmentation(wwidth, wcen):
+ if wcen == 2047 and wwidth == 4096:
+ return wwidth, wcen
+ else:
+ new_wcen = np.random.randint(-100, 300)
+ new_wwidth = np.random.randint(-200, 300)
+ wwidth += new_wwidth
+ wcen += new_wcen
+ return wwidth, wcen
+
+if __name__ == "__main__":
+
+ print("Data augmentation")
+ for pos in ["Spiculated","Circumscribed", "Indistinct"]:
+ augment_numpy_images(
+ path="/usr/xtmp/mammo/npdata/datasetname_with_fa/train/",
+ targetNumber=5000,
+ targetDir="/usr/xtmp/mammo/npdata/datasetname_with_fa/train_augmented_5000/",
+ rot=True,
+ with_fa=True)
+