All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
For each Pull Request, the affected code parts should be briefly described and added here in the "Upcoming" section. Once a release is done, the "Upcoming" section becomes the release changelog, and a new empty "Upcoming" should be created.
- (#446) Guarding
save_outlier
so that it works when institution id and series id columns are missing. - (#441) Add script to move models from one AzureML workspace to another:
python InnerEye/Scripts/move_model.py
- (#417) Added a generic way of adding PyTorch Lightning models to the toolbox. It is now possible to train almost any Lightning model with the InnerEye toolbox in AzureML, with only minimum code changes required. See the MD documentation for details.
- (#430) Update conversion to 1.0.1 InnerEye-DICOM-RT to add: manufacturer, SoftwareVersions, Interpreter and ROIInterpretedTypes.
- (#385) Add the ability to train a model on multiple
nodes in AzureML. Example: Add
--num_nodes=2
to the commandline arguments to train on 2 nodes. - (#366) and
(#407) add new parameters to the
score.py
script ofuse_dicom
andresult_zip_dicom_name
. Ifuse_dicom==True
then the input file should be a zip of a DICOM series. This will be unzipped and converted to Nifti format before processing. The result will then be converted to a DICOM-RT file, zipped and stored asresult_zip_dicom_name
. - (#416) Add a github action chat checks
if
CHANGELOG.md
has been modified. - (#412) Dataset files can now have arbitrary names, and
are no longer restricted to be called
dataset.csv
, via the config fielddataset_csv
. This allows to have a single set of image files in a folder, but multiple datasets derived from it. - (#391) Support for multilabel classification tasks.
Multilabel models can be trained by adding the parameter
class_names
to the config for classification models.class_names
should contain the name of each label class in the dataset, and the order of names should match the order of class label indices indataset.csv
.dataset.csv
supports multiple labels (indices corresponding toclass_names
) per subject in the label column. Multiple labels should be encoded as a string with labels separated by a|
, for example "0|2|4". Note that this PR does not add support for multiclass models, where the labels are mutually exclusive. - (#425) The number of layers in a Unet is no longer
fixed at 4, but can be set via the config field
num_downsampling_paths
. A lower number of layers may be useful for decreasing memory requirements, or for working with smaller images. (The minimum image size in any dimension when using a network of n layers is 2**n.) - (#426) Flake8, mypy, and testing the HelloWorld model is now happening in a Github action, no longer in Azure Pipelines.
- (#405) Cross-validation runs for classification models now also generate a report notebook summarising the metrics from the individual splits. Also includes minor formatting improvements for standard classification reports.
- (#438) Add links and small docs to InnerEye-Gateway and InnerEye-Inference
- (#439) Enable automatic job recovery from last recovery checkpoint in case of job pre-emption on AML. Give the possibility to the user to keep more than one recovery checkpoint.
- (#442) Enable defining custom scalar losses
(
ScalarLoss.CustomClassification
andCustomRegression
), prediction targets (ScalarModelBase.target_names
), and reporting (ModelConfigBase.generate_custom_report()
) in scalar configs, providing more flexibility for defining model configs with custom behaviour while leveraging the existing InnerEye workflows. - (#444) Added setup scripts and documentation to work with the FastMRI challenge datasets.
- (#445) Adding test coverage for the
HelloContainer
model with multiple GPUs - (#450) Adds the metric "Accuracy at threshold 0.5" to the classification report (
classification_crossval_report.ipynb
). - (#451) Write a file
model_outputs.csv
with columnssubject
,prediction_target
,label
,model_output
andcross_validation_split_index
. This file is not written out for sequence models. - (#440) Added support for training of self-supervised models (BYOL and SimCLR) based on the bring-your-own-model framework. Providing examples configurations for training of SSL models on CIFAR10/100 datasets as well as for chest-x-ray datasets such as NIH CHest-Xray or RSNA Pneumonia Detection Challenge datasets. See SSL doc for more details.
- (#455) All models trained on AzureML are registered.
The codepath previously allowed only segmentation models (subclasses of
SegmentationModelBase
) to be registered. Models are registered after a training run or if theonly_register_model
flag is set. Models may be legacy InnerEye config-based models or may be defined using the LightningContainer class. Additionally, theTrainHelloWorldAndHelloContainer
job in the PR build has been split into two jobs,TrainHelloWorld
andTrainHelloContainer
. A pytest markerafter_training_hello_container
has been added to run tests after training is finished in theTrainHelloContainer
job.
- (#385) Starting an AzureML run now uses the
ScriptRunConfig
object, rather than the deprecatedEstimator
object. - (#385) When registering a model, the name of the Python execution environment is added as a tag. This tag is read when running inference, and the execution environment is re-used.
- (#411) Upgraded to PyTorch 1.8.0, PyTorch-Lightning 1.1.8 and AzureML SDK 1.23.0
- (#432) Upgraded to PyTorch-Lightning 1.2.7. Add end-to-end test for classification cross-validation. WARNING: upgrade PL version causes hanging of multi-node training.
- (#437) Upgrade to PyTorch-Lightning 1.2.8.
- (#439) Recovery checkpoints are now
named
recovery_epoch=x.ckpt
instead ofrecovery.ckpt
orrecovery-v0.ckpt
. - (#451) Change the signature for function
generate_custom_report
inModelConfigBase
to take only the path to the reports folder and aModelProcessing
object. - (#444) The method
before_training_on_rank_zero
of theLightningContainer
class has been renamed tobefore_training_on_global_rank_zero
. The order in which the hooks are called has been changed.
- (#422) Documentation - clarified
setting_up_aml.md
datastore creation instructions and fixed small typos inhello_world_model.md
- (#432) Fixed cross-validation for classification models. Fixed multi-gpu metrics aggregation. Add end-to-end test for classification cross-validation. Add fix to bug in ddp setting when running multi-node with 1 gpu per node.
- (#435) If parameter
model
inAzureConfig
is not set, display an error message and terminate the run. - (#437) Fixed multi-node DDP bug in PL v1.2.8. Re-add end-to-end test for multi-node.
- (#445) Fixed a bug when running inference for container models on machines with >1 GPU
- (#439) Deprecated
start_epoch
config argument. - (#450) Delete unused
classification_report.ipynb
. - (#455) Removed the AzureRunner conda environment. The full InnerEye conda environment is needed to submit a training job to AzureML.
- (#323) There are new model configuration fields
(and hence, commandline options), in particular for controlling PyTorch Lightning (PL) training:
max_num_gpus
controls how many GPUs are used at most for training (default: all GPUs, value -1).pl_num_sanity_val_steps
controls the PL trainer flagnum_sanity_val_steps
pl_deterministic
controls the PL trainer flagsbenchmark
anddeterministic
generate_report
controls if a HTML report will be written (default: True)recovery_checkpoint_save_interval
determines how often a checkpoint for training recovery is saved.
- (#336) New extensions of
SegmentationModelBases
HeadAndNeckBase
andProstateBase
. Use these classes to build your own Head&Neck or Prostate models, by just providing a list of foreground classes. - (#363) Grouped dataset splits and k-fold
cross-validation. This allows, for example, training on datasets with multiple images per subject without leaking data
from the same subject across train/test/validation sets or cross-validation folds. To use this functionality, simply
provide the name of the CSV grouping column (
group_column
) when creating theDatasetSplits
object in your model config'sget_model_train_test_dataset_splits()
method. See theInnerEye.ML.utils.split_dataset.DatasetSplits
class for details.
- (#323) The codebase has undergone a massive
refactoring, to use PyTorch Lightning as the foundation for all training. As a consequence of that:
- Training is now using Distributed Data Parallel with synchronized
batchnorm
. The number of GPUs to use can be controlled by a new commandline argumentmax_num_gpus
. - Several classes, like
ModelTrainingSteps*
, have been removed completely. - The final model is now always the one that is written at the end of all training epochs.
- The old code that options to run full image inference at multiple epochs (i.e., multiple checkpoints), this has
been removed, alongside the respective commandline options
save_start_epoch
,save_step_epochs
,epochs_to_test
,test_diff_epochs
,test_step_epochs
,test_start_epoch
- The commandline option
register_model_only_for_epoch
is now calledonly_register_model
, and is boolean. - All metrics are written to AzureML and Tensorboard in a unified format. A training Dice score for 'bladder' would previously be called Train_Dice/bladder, now it is train/Dice/bladder.
- Due to a different checkpoint format, it is no longer possible to use checkpoints written by the previous version of the code.
- Training is now using Distributed Data Parallel with synchronized
- The arguments of the
score.py
script changed:data_root
->data_folder
, it no longer assumes a fixeddata
subfolder.project_root
->model_root
,test_image_channels
->image_files
. - By default, the visualization of patch sampling for segmentation models will run on only 1 image (down from 5). This is because patch sampling is expensive to compute, taking 1min per large CT scan.
- (#336) Renamed
HeadAndNeckBase
toHeadAndNeckPaper
, andProstateBase
toProstatePaper
. - (#427) Move dicom loading function from SimpleITK to pydicom. Loading time improved by 30x.
- When registering a model, it now has a consistent folder structured, described here. This folder structure is present irrespective of using InnerEye as a submodule or not. In particular, exactly 1 Conda environment will be contained in the model.
- The commandline options to control which checkpoint is saved, and which is used for inference, have been removed:
save_start_epoch
,save_step_epochs
,epochs_to_test
,test_diff_epochs
,test_step_epochs
,test_start_epoch
- Removed blobxfer completely. When downloading a dataset from Azure, we now use AzureML dataset downloading tools. Please remove the following fields from your settings.yml file: 'datasets_storage_account' and 'datasets_container'.
- Removed
ProstatePaperBase
. - Removed ability to perform sub-fold cross validation. The parameters
number_of_cross_validation_splits_per_fold
andcross_validation_sub_fold_split_index
have been removed from ScalarModelBase.
- This is the baseline release.