To train new models, you can either work within the InnerEye/ directory hierarchy or create a local hierarchy beside it
and with the same internal organization (although with far fewer files).
We recommend the latter as it offers more flexibility and better separation of concerns. Here we will assume you
create a directory InnerEyeLocal
beside InnerEye
.
As well as your configurations (dealt with below) you will need these files:
settings.yml
: A file similar toInnerEye\settings.yml
containing all your Azure settings. The value ofextra_code_directory
should (in our example) be'InnerEyeLocal'
, and model_configs_namespace should be'InnerEyeLocal.ML.configs'
.- A folder like
InnerEyeLocal
that contains your additional code, and model configurations. - A file
InnerEyeLocal/ML/runner.py
that invokes the InnerEye training runner, but that points the code to your environment and Azure settings.
from pathlib import Path
import os
from InnerEye.ML import runner
def main() -> None:
current = os.path.dirname(os.path.realpath(__file__))
project_root = Path(os.path.realpath(os.path.join(current, "..", "..")))
runner.run(project_root=project_root,
yaml_config_file=project_root / "relative/path/to/settings.yml",
post_cross_validation_hook=None)
if __name__ == '__main__':
main()
You will find a variety of model configurations here. Those not ending
in Base.py
reference open-sourced data and can be used as they are. Those ending in Base.py
are partially specified, and can be used by having other model configurations inherit from them and supply the missing
parameter values: a dataset ID at least, and optionally other values. For example, a Prostate
model might inherit
very simply from ProstateBase
by creating Prostate.py
in the directory InnerEyeLocal/ML/configs/segmentation
with the following contents:
from InnerEye.ML.configs.segmentation.ProstateBase import ProstateBase
class Prostate(ProstateBase):
def __init__(self) -> None:
super().__init__(
ground_truth_ids=["femur_r", "femur_l", "rectum", "prostate"],
azure_dataset_id="name-of-your-AML-dataset-with-prostate-data")
The allowed parameters and their meanings are defined in SegmentationModelBase
.
The class name must be the same as the basename of the file containing it, so Prostate.py
must contain Prostate
.
In settings.yml
, set model_configs_namespace
to InnerEyeLocal.ML.configs
so this config
is found by the runner.
A Head and Neck
model might inherit from HeadAndNeckBase
by creating HeadAndNeck.py
with the following contents:
from InnerEye.ML.configs.segmentation.HeadAndNeckBase import HeadAndNeckBase
class HeadAndNeck(HeadAndNeckBase):
def __init__(self) -> None:
super().__init__(
ground_truth_ids=["parotid_l", "parotid_r", "smg_l", "smg_r", "spinal_cord"]
azure_dataset_id="name-of-your-AML-dataset-with-prostate-data")
-
Set up your model configuration as above and update
azure_dataset_id
to the name of your Dataset in the AML workspace. It is enough to put your dataset into blob storage. The dataset should be a contained in a folder at the root of the datasets container. The InnerEye runner will check if there is a dataset in the AzureML workspace already, and if not, generate it directly from blob storage. -
Train a new model, for example
Prostate
:
python InnerEyeLocal/ML/runner.py --azureml=True --model=Prostate --train=True
Alternatively, you can train the model on your current machine if it is powerful enough. In
this case, you would simply omit the azureml
flag, and instead of specifying
azure_dataset_id
in the class constructor, you can instead use local_dataset="my/data/folder"
,
where the folder my/data/folder
contains a dataset.csv
file and all the files that are referenced therein.
To speed up training in AzureML, you can use multiple machines, by specifying the additional
--num_nodes
argument. For example, to use 2 machines to train, specify:
python InnerEyeLocal/ML/runner.py --azureml=True --model=Prostate --num_nodes=2
On each of the 2 machines, all available GPUs will be used. Model inference will always use only one machine.
For the Prostate model, we observed a 2.8x speedup for model training when using 4 nodes, and a 1.65x speedup when using 2 nodes.
AzureML structures all jobs in a hierarchical fashion:
- The top-level concept is a workspace
- Inside of a workspace, there are multiple experiments. Upon starting a training run, the name of the experiment needs to be supplied. The InnerEye toolbox is set specifically to work with git repositories, and it automatically sets the experiment name to match the name of the current git branch.
- Inside of an experiment, there are multiple runs. When starting the InnerEye toolbox as above, a run will be created.
- A run can have child runs - see below in the discussion about cross validation.
For running K-fold cross validation, the InnerEye toolbox schedules multiple training runs in the cloud that run at the same time (provided that the cluster has capacity). This means that a complete cross validation run usually takes as long as a single training run.
To start cross validation, you can either modify the number_of_cross_validation_splits
property of your model,
or supply it on the command line: Provide all the usual switches, and add --number_of_cross_validation_splits=N
,
for some N
greater than 1; a value of 5 is typical. This will start a
HyperDrive run: A parent
AzureML job, with N
child runs that will execute in parallel. You can see the child runs in the AzureML UI in the
"Child Runs" tab.
The dataset splits for those N
child runs will be
computed from the union of the Training and Validation sets. The Test set is unchanged. Note that the Test set can be
empty, in which case the union of all validation sets for the N
child runs will be the full dataset.
To train further with an already-created model, give the above command with additional switches like these:
--run_recovery_id=foo_bar:foo_bar_12345_abcd --start_epoch=120
The run recovery ID is of the form "experiment_id:run_id". When you trained your original model, it will have been
queued as a "Run" inside of an "Experiment". The experiment will be given a name derived from the branch name - for
example, branch foo/bar
will queue a run in experiment foo_bar
. Inside the "Tags" section of your run, you should
see an element run_recovery_id
. It will look something like foo_bar:foo_bar_12345_abcd
.
If you are recovering a HyperDrive run, the value of --run_recovery_id
should for the parent,
and --number_of_cross_validation_splits
should have the same value as in the recovered run.
For example:
--run_recovery_id=foo_bar:HD_55d4beef-7be9-45d7-89a5-1acf1f99078a --start_epoch=120 --number_of_cross_validation_splits=5
The run recovery ID of a parent HyperDrive run is currently not displayed in the "Details" section of the AzureML UI. The easiest way to get it is to go to any of the child runs and use its run recovery ID without the final underscore and digit.
To evaluate an existing model on a test set, you can use models from previous runs in AzureML or from local checkpoints.
This is similar to continuing training using a run_recovery object, but you will need to set --train
to False
.
Thus your command should look like this:
python Inner/ML/runner.py --azureml=True --model=Prostate --train=False --cluster=my_cluster_name \
--run_recovery_id=foo_bar:foo_bar_12345_abcd --start_epoch=120
To evaluate a model using a local checkpoint, use the local_weights_path to specify the path to the model checkpoint
and set train to False
.
python Inner/ML/runner.py --model=Prostate --train=False --local_weights_path=path_to_your_checkpoint
Alternatively, to submit an AzureML run to apply a model to a single image on your local disc,
you can use the script submit_for_inference.py
, with a command of this form:
python InnerEye/Scripts/submit_for_inference.py --image_file ~/somewhere/ct.nii.gz --model_id Prostate:555 \
--settings ../somewhere_else/settings.yml --download_folder ~/my_existing_folder
An ensemble model will be created automatically and registered in the AzureML model registry whenever cross-validation
models are trained. The ensemble model creation is done by the child whose cross_validation_split_index
is 0;
you can identify this child by looking at the "Child Runs" tab in the parent run page in AzureML.
To find the registered ensemble model, find the Hyperdrive parent run in AzureML. In the "Details" tab, there is an entry for "Registered models", that links to the ensemble model that was just created. Note that each of the child runs also registers a model, namely the one that was built off its specific subset of data, without taking into account the other crossvalidation folds.
As well as registering the model, child run 0 runs the ensemble model on the validation and test sets. The results are
aggregated based on the ensemble_aggregation_type
value in the model config,
and the generated posteriors are passed to the usual model testing downstream pipelines, e.g. metrics computation.
Once your HyperDrive AzureML runs are completed, you can visualize the results by running the
plot_cross_validation.py
script locally:
python InnerEye/ML/visualizers/plot_cross_validation.py --run_recovery_id ... --epoch ...
filling in the run recovery ID of the parent run and the epoch number (one of the test epochs, e.g. the last epoch)
for which you want results plotted. The script will also output several ..._outliers.txt
file with all of the outliers
across the splits and a portal query to
find them in the production portal, and run statistical tests to compute the significance of differences between scores
across the splits and with respect to other runs that you specify. This is done for you during
the run itself (see below), but you can use the script post hoc to compare arbitrary runs
with each other. Details of the tests can be found
in wilcoxon_signed_rank_test.py
and mann_whitney_test.py
.
- AzureML writes all its results to the storage account you have specified. Inside of that account, you will
find a container named
azureml
. You can access that with Azure StorageExplorer. The checkpoints and other files of a run will be in folderazureml/ExperimentRun/dcid.my_run_id
, wheremy_run_id
is the "Run Id" visible in the "Details" section of the run. If you want to download all the results files or a large subset of them, we recommend you access them this way. - The results can also be viewed in the "Outputs and Logs" section of the run. This is likely to be more convenient for viewing and inspecting single files.
- All files that the model training writes to the
./outputs
folder are automatically uploaded at the end of the AzureML training job, and are put intooutputs
in Blob Storage and in the run itself. Similarly, what the model training writes to the./logs
folder gets uploaded tologs
. - You can monitor the file system that is mounted on the compute node, by navigating to your
storage account in Azure. In the blade, click on "Files" and, navigate through to
azureml/azureml/my_run_id
. This will show all files that are mounted as the working directory on the compute VM.
The organization of the outputs
directory is as follows:
- A
checkpoints
directory containing the checkpointed model file(s). - For each test epoch
NNN
, a directoryepoch_NNN
, each of whose subdirectoriesTest
andVal
contains the following:- A
metrics.csv
file, giving the Dice and Hausdorff scores for every structure of every subject in the test and validation sets respectively. - A
metrics_aggregates.csv
file, aggregating the information inmetrics.csv
by subject to give minimum, maximum, mean and standard deviation values for both Dice and Hausdorff scores. - A
metrics_boxplot.png
file, containing box-and-whisker plots for the same information. - Various files identifying the dataset and structure names.
- A
thumbnails
directory, containing an image file for the maximal predicted slice for each structure of each test or validation subject. - For each test or validation subject, a directory containing a Nifti file for each predicted structure.
- A
- If there are comparison runs (specified by the config parameter
comparison_blob_storage_paths
), there will be a subdirectory named after each of those runs, each containing its ownepoch_NNN
subdirectory, and there will be a fileMetricsAcrossAllRuns.csv
directly underoutputs
, combining the data from themetrics.csv
files of the current run and the comparison run(s). - Additional files directly under
outputs
:args.txt
contains the configuration information.buildinformation.json
contains information on the build, partially overlapping with the content of the "Details" tab.dataset.csv
for the whole dataset (see "Creating Datasets for details), andtest_dataset.csv
,train_dataset.csv
andval_dataset.csv
for those subsets of it.BaselineComparisonWilcoxonSignedRankTestResults.txt
, containing the results of comparisons between the current run and any specified baselines (earlier runs) to compare with. Each paragraph of that file compares two models and indicates, for each structure, when the Dice scores for the second model are significantly better or worse than the first. For full details, see the source code.- A directory
scatterplots
, containing ajpg
file for every pairing of the current model with one of the baslines. Each one is namedAAA_vs_BBB.jpg
, whereAAA
andBBB
are the run IDs of the two models. Each plot shows the Dice scores on the test set for the models. - For both segmentation and classification models an IPython Notebook
report.ipynb
will be generated in theoutputs
directory.- For segmentation models, this report is based on the full image results of the model checkpoint that performed the best on the validation set. This report will contain detailed metrics per structure, and outliers to help model development.
- For classification models, the report is based on the validation and test results from the last epoch. It shows metrics on the validation and test sets, ROC and PR Curves, and a list of the best and worst performing images from the test set.
Ensemble models are created by the zero'th child (with cross_validation_split_index=0
) in each
cross-validation run. Results from inference on the test and validation sets are uploaded to the
parent run, and can be found in epoch_NNN
directories as above.
In addition, various scores and plots from the ensemble and from individual child
runs are uploaded to the parent run, in the CrossValResults
directory. This contains:
- Subdirectories named 0, 1, 2, ... for all the child runs including the zero'th one, as well
as
ENSEMBLE
, containing their respectiveepoch_NNN
directories. - Files
Dice_Test_Splits.jpg
andDice_Val_Splits.jpg
, containing box plots of the Dice scores on those datasets for each structure and each (component and ensemble) model. These give a visual overview of the results in themetrics.csv
files detailed above. When there are many different structures, several such plots are created, with a different subset of structures in each one. - Similarly,
HausdorffDistance_mm_Test_splits.jpg
andHausdorffDistance_mm_Val_splits.jpg
contain box plots of Hausdorff distances. MetricsAcrossAllRuns.csv
combines the data from all themetrics.csv
files.Test_outliers.txt
andVal_outliers.txt
highlight particular outlier scores (both Dice and Hausdorff) in the test and validation sets respectively.- A
scatterplots
directory and a fileCrossValidationWilcoxonSignedRankTestResults.txt
, for comparisons between the ensemble and its component models.
There is also a directory BaselineComparisons
, containing the Wilcoxon test results and
scatterplots for the ensemble, as described above for single runs.