Skip to content

Commit

Permalink
initial commit of IAIABL code
Browse files Browse the repository at this point in the history
  • Loading branch information
alinajadebarnett committed Oct 13, 2021
0 parents commit 187d89f
Show file tree
Hide file tree
Showing 44 changed files with 7,579 additions and 0 deletions.
13 changes: 13 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
The copyrights of this software are owned by Duke University. As such, two licenses to this software are offered:
1. An open-source license under the MIT license for non-commercial use.
2. A custom license with Duke University, for commercial use or for use without the MIT license restrictions.

As a recipient of this software, you may choose which license to receive the code under. Outside contributions to the Duke owned code base cannot be accepted unless the contributor transfers the copyright to those changes over to Duke University.
To enter a custom license agreement without the MIT license restrictions, please contact the Digital Innovations department at Duke Office for Translation & Commercialization (OTC) (https://olv.duke.edu/software/) at [email protected].

Please note that this software is distributed AS IS, WITHOUT ANY WARRANTY; and without the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--

(c) Copyright 2021. Duke University. All Rights Reserved.
Developed by Alina Jade Barnett, Fides Regina Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Y. Lo,
and Cynthia Rudin at Duke University.
89 changes: 89 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# IAIA-BL

This code package implements IAIA-BL from the manuscript "IAIA-BL:
A Case-based Interpretable Deep Learning Model for Classification
of Mass Lesions in Digital Mammography" by Alina Jade Barnett, Fides
Regina Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Y. Lo,
and Cynthia Rudin.

This code package was developed by the authors at Duke University and
University of Maine, and licensed under a dual License (see LICENSE
for more information regarding the use and the distribution of this
code package).

## Prerequisites
Any operating system on which you can run GPU-accelerated
PyTorch. Python 3.6.9. For packages see requirements.txt.
### Recommended hardware
2 NVIDIA Tesla P-100 GPUs or 2 NVIDIA Tesla V-100 GPUs

## Installation instructions
1. Git clone the repository to /usr/xtmp/IAIABL/.
3. Set up your environment using Python 3.6.9 and requirements.txt.
(Optional) Set up your environment using requirements.txt so that "source
/home/virtual_envs/ml/bin/activate" activates your environment. You can
set up the environment differently if you choose, but all .sh scripts
included will attempt to activate the environment at
/home/virtual_envs/ml/bin/activate.
Typical install time: Less than 10 minutes.

## Train the model
1. In train.sh, the appropriate file locations should be set for train_dir,
test_dir, push_dir and finer_dir:
1. train_dir is the directory containing the augmented training set
2. test_dir is the directory containing the test set
3. push_dir is the directory containing the original (unaugmented) training
set, onto which prototypes can be projected
4. finer_dir is the directory containing the augmented set of training
examples with fine-scale annotations

2. Run train.sh

## Reproducing figures
No data is provided with this code repository. The following scripts are
included to demonstrate how figures and results were created for the
paper. The following scripts require data to be provided. Type "source
scriptname.sh" into the command line to run.

1. see_explanations.sh

Expected output from see_explanations.sh are figures from the
manuscript that begin with "An automatically generated explanation
of mass margin classification." The paths to the output images will
appear in the relative file location "./visualizations_of_expl/".

2. see_prototype_grid.sh

Expected output from see_prototype_grid.sh will be a grid of prototypes
for a given model. The file location where the output image can be
found will be printed onto the command line.

3. run_gradCAM.sh

Expected output from run_gradCAM.sh will show the activation precision of the
sample data. It will also print a visualization in
/usr/xtmp/IAIABL/gradCAM_imgs/view.png. The columns from left to right are
"Original Image," "GradCAM heatmap," "GradCAM++ heatmap," "GradCAM heatmap
overlayed on the original image," and "GradCAM++ heatmap overlayed on the
original image." The rows are "Last layer, using a network trained on natural
images," "6th layer, using a network trained on natural images," "Blank," and
"Last layer, using a network trained to identify the mass margin."

4. The mal_for_reviewers.ipynb Jupyter notebook is also included.

Expected output from mal_for_reviewers.ipynb is in the cells of the notebook.

Expected run time for these four demo files: 10 minutes.

## Other functions
The following scripts require the more of the (private) dataset in order to
run correctly, but are included to aid in reproducibility:
1. dataaugment.sh - for offline data augmentation
2. plot_graph.sh - plots a variety of graphs
3. run_global_analysis.sh - provides a global analysis of the model
4. train_vanilla_malignancy.sh - for training the baseline models

## Expected Data Location
Scripts are set up to expect data as numpy arrays in
/usr/xtmp/IAIABL/Lo1136i/test/Circumscribed/ where Circumscribed is the
mass margin label.
173 changes: 173 additions & 0 deletions dataHandling.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
from __future__ import division
import numpy as np
import os
import pandas as pd
import argparse
import sys
import random
import png
from matplotlib.pyplot import imsave, imread
import matplotlib
import matplotlib.pyplot as plt
from PIL import Image
import cv2
matplotlib.use("Agg")
import torchvision.datasets as datasets
from skimage.transform import resize
import ast
import pickle
import csv
import pydicom as dcm
import Augmentor
from tqdm import tqdm
import pathlib
from torch import randint, manual_seed
from copy import copy
from collections import defaultdict

def random_flip(input, axis, with_fa=False):
ran = random.random()
if ran > 0.5:
if with_fa:
axis += 1
return np.flip(input, axis=axis)
else:
return input

def random_crop(input, with_fa=False):
ran = random.random()
if ran > 0.2:
# find a random place to be the left upper corner of the crop
if with_fa:
rx = int(random.random() * input.shape[1] // 10)
ry = int(random.random() * input.shape[2] // 10)
return input[:, rx: rx + int(input.shape[1] * 9 // 10), ry: ry + int(input.shape[2] * 9 // 10)]
else:
rx = int(random.random() * input.shape[0] // 10)
ry = int(random.random() * input.shape[1] // 10)
return input[rx: rx + int(input.shape[0] * 9 // 10), ry: ry + int(input.shape[1] * 9 // 10)]
else:
return input

def random_rotate_90(input, with_fa=False):
ran = random.random()
if ran > 0.5:
if with_fa:
return np.rot90(input, axes=(1,2))
return np.rot90(input)
else:
return input

def random_rotation(x, chance, with_fa=False):
ran = random.random()
if with_fa:
img = Image.fromarray(x[0])
mask = Image.fromarray(x[1])
if ran > 1 - chance:
# create black edges
angle = np.random.randint(0, 90)
img = img.rotate(angle=angle, expand=1)
mask = mask.rotate(angle=angle, expand=1, fillcolor=1)
return np.stack([np.asarray(img), np.asarray(mask)])
else:
return np.stack([np.asarray(img), np.asarray(mask)])
img = Image.fromarray(x)
if ran > 1 - chance:
# create black edges
angle = np.random.randint(0, 90)
img = img.rotate(angle=angle, expand=1)
return np.asarray(img)
else:
return np.asarray(img)

def augment_numpy_images(path, targetNumber, targetDir, skip=None, rot=True, with_fa=False):
classes = os.listdir(path)
if not os.path.exists(targetDir):
os.mkdir(targetDir)
for class_ in classes:
if not os.path.exists(targetDir + class_):
os.makedirs(targetDir + class_)

for class_ in classes:
count, round = 0, 0
while count < targetNumber:
round += 1
for root, dir, files in os.walk(os.path.join(path, class_)):
for file in files:
if skip and skip in file:
continue
filepath = os.path.join(root, file)
arr = np.load(filepath)
print("loaded ", file)
print(arr.shape)
try:
arr = random_crop(arr, with_fa)
print(arr.shape)
if rot:
arr = random_rotation(arr, 0.9, with_fa)
print(arr.shape)
arr = random_flip(arr, 0, with_fa)
arr = random_flip(arr, 1, with_fa)
arr = random_rotate_90(arr, with_fa)
arr = random_rotate_90(arr, with_fa)
arr = random_rotate_90(arr, with_fa)
print(arr.shape)
if with_fa:
whites = arr.shape[2] * arr.shape[1] - np.count_nonzero(np.round(arr[0] - np.amax(arr[0]), 2))
black = arr.shape[2] * arr.shape[1] - np.count_nonzero(np.round(arr[0], 2))
if arr.shape[2] < 10 or arr.shape[1] < 10 or black >= arr.shape[2] * arr.shape[1] * 0.8 or \
whites >= arr.shape[2] * arr.shape[1] * 0.8:
print("illegal content")
continue

else:
whites = arr.shape[0] * arr.shape[1] - np.count_nonzero(np.round(arr - np.amax(arr), 2))
black = arr.shape[0] * arr.shape[1] - np.count_nonzero(np.round(arr, 2))

if arr.shape[0] < 10 or arr.shape[1] < 10 or black >= arr.shape[0] * arr.shape[1] * 0.8 or \
whites >= arr.shape[0] * arr.shape[1] * 0.8:
print("illegal content")
continue

if count % 10 == 0:
if not os.path.exists("./visualizations_of_augmentation/" + class_ + "/"):
os.makedirs("./visualizations_of_augmentation/" + class_ + "/")
if with_fa:
imsave("./visualizations_of_augmentation/" + class_ + "/" + str(count), np.transpose(np.stack([arr[0], arr[0], arr[1]]), (1,2,0)))
else:
imsave("./visualizations_of_augmentation/" + class_ + "/" + str(count), np.transpose(np.stack([arr, arr, arr]), (1,2,0)))


np.save(targetDir + class_ + "/" + file[:-4] + "aug" + str(round), arr)
count += 1
print(count)
except:
print("something is wrong in try, details:", sys.exc_info()[2])
if not os.path.exists("./error_of_augmentation/" + class_ + "/"):
os.makedirs("./error_of_augmentation/" + class_ + "/")
np.save("./error_of_augmentation/" + class_ + "/" + str(count), arr)
if count > targetNumber:
break
print(count)

def window_augmentation(wwidth, wcen):
if wcen == 2047 and wwidth == 4096:
return wwidth, wcen
else:
new_wcen = np.random.randint(-100, 300)
new_wwidth = np.random.randint(-200, 300)
wwidth += new_wwidth
wcen += new_wcen
return wwidth, wcen

if __name__ == "__main__":

print("Data augmentation")
for pos in ["Spiculated","Circumscribed", "Indistinct"]:
augment_numpy_images(
path="/usr/xtmp/mammo/npdata/datasetname_with_fa/train/",
targetNumber=5000,
targetDir="/usr/xtmp/mammo/npdata/datasetname_with_fa/train_augmented_5000/",
rot=True,
with_fa=True)

Loading

0 comments on commit 187d89f

Please sign in to comment.