Enable building an ensemble model from the cross validation checkpoints of a BYO Lightning model #529

dumbledad · 2021-07-06T15:52:29Z

Closing #527

The method MLRunner.run_inference_for_lightning_models takes a list of checkpoint paths as an argument, but then makes sure that there is only one used (here):

    if len(checkpoint_paths) != 1:
        raise ValueError(f"This method expects exactly 1 checkpoint for inference, but got {len(checkpoint_paths)}")

We want to change this so that the checkpoints gleaned from a BYOL cross validation run can be used as an ensemble model.

This will use the methods defined in the InnerEyeInference abstract class. We expect that these methods are sufficient, but an extension or redesign may be required.
Use the simple linear regression model as the basis of an exemplar.
It would be great to use another, more realistic model. The UHB Covid-19 model is not a lightning model, how about the Single Cell model?

AB#4219

…dels But only the check, not the inference loop yet

Trying to work out how checkpoints are passed between training and inference, or where they should be by tracing through new model in run_ml

Itno two for unit testing with ensemble from xval checkpoints

I do not know why the instantiated model_config cannot live in deep_learning_config

Need to fix documentation and check it works for real on AML

ant0nsc

This is a good first step, but I feel the interface is not yet as easy as it should be. Anybody who wants to use ensemble models now has to define two models - and I think that's simply not going to fly. Can you think of a simpler solution?
As for testing - you have added a lot of complicated switching logic, that I was not fully able to digest. Is there a way to test that too?

ant0nsc · 2021-07-26T08:17:54Z

InnerEye/ML/configs/other/HelloContainer.py

@@ -250,7 +251,8 @@ def __init__(self) -> None:
 # This method must be overridden by any subclass of LightningContainer. It returns the model that you wish to
 # train, as a LightningModule
 def create_model(self) -> LightningModule:
- return HelloRegression()
+ self._model = HelloRegression() # TODO: why does LightningContainer need all three, _model, model, and create_model?


Not sure I get the question? This method here should only return a freshly generated model, the _model properties are populated elsewhere.

ant0nsc · 2021-07-26T08:22:55Z

InnerEye/ML/deep_learning_config.py

+ ensemble_model_name: str = param.String(
+ doc=("The class name of the ensemble model to build from the cross validation checkpoints of a Lightning model."))


I feel that this extra requirement is a key bottleneck in your design. Anybody who wants to implement an ensemble model now needs to define the model itself (ensemble member), and then another class that is the ensemble logic. For the other models, we have been able to achieve that without extra overhead - for example, we can define a Prostate model, and get Prostate ensemble models for free.

ant0nsc · 2021-07-26T08:25:19Z

InnerEye/ML/lightning_container.py

+ def __init__(self, outputs_folder: Optional[Path] = None) -> None:
+ """
+ Sets up the list of models that forms the ensemble. These models should inherit from both InnerEyeEinference and
+ LightiningModule, but since mypy does not have support for specifying interesection types we specify just one,


I don't get this?

ant0nsc · 2021-07-26T08:28:00Z

InnerEye/ML/lightning_container.py

+ """
+ for model in self.ensemble_models:
+ assert isinstance(model, LightningModule) # mypy
+ model.eval()


calling model.eval() is critical. If users forget to call the superclass method, they will run their model in a wrong way. Can we bake that into local_checkpoint?

ant0nsc · 2021-07-26T09:07:41Z

InnerEye/ML/run_ml.py

+ self.model_config_loader = model_config_loader
+ self.ensemble_model: Optional[InnerEyeEnsembleInference] = None


If you find a way of not requiring a separate class for ensemble models, this would become obsolete, and hence simplify the code.

ant0nsc · 2021-07-26T09:09:54Z

InnerEye/ML/run_ml.py

+ # 0, then wait for the sibling runs, build the ensemble model, and write a report for that.
+ if not self.is_offline_run and PARENT_RUN_CONTEXT is not None:
+ sibling_runs_checkpoint_handler = self.wait_and_collect_sibling_runs_if_required()
+ logging.info("DEBUGGING: about to create_ensemble_model_and_run_inference_from_lightningmodule_checkpoints")


use logging.debug instead?

ant0nsc · 2021-07-26T13:34:09Z

InnerEye/ML/run_ml.py

+ :param lightning_model: The LightningContainer container to be used.
+ :param checkpoint_paths: The path to the checkpoint that should be used for inference.
+ """
+ lightning_model = lightning_container.create_model()


I don't think you should create the model again here - it should already be accessible via lightning_container.model?

ant0nsc · 2021-07-26T13:38:06Z

InnerEye/ML/run_ml.py

+ # Register the model, and then run inference as required. No models should be registered when running outside
+ # AzureML.
+ if not self.is_offline_run:
+ if self.should_register_model():
+ self.register_model(checkpoint_paths, ModelProcessing.ENSEMBLE_CREATION)


This fragment also exists somewhere else?

dumbledad · 2021-08-03T16:07:23Z

Replaced by #549

Shruthi42 and others added 20 commits June 23, 2021 11:28

Use registered model for inference

1250de0

Merge branch 'main' into shbannur/load_registered_models

efc34fb

Bug fix

f94b2e9

Fix tests

7430785

Fix tests

229b7ea

mypy

16a09c3

Fix tests

ee957d8

Add tests

596efee

Fix tests

2c2160f

Fix tests

fe6fa93

Add test

45895e7

Fix tests

41f1b48

Fix test

f6bb7a2

Remove unnecessary function

3e3b069

Update tests

c0de1e6

Flake8

0889a88

Fix tests

f98b22e

mypy

d73f769

Merge branch 'main' into shbannur/load_registered_models

f4dfbe7

Loosening multiple checkpoint check in run_inference_for_lightning_mo…

2767e18

…dels But only the check, not the inference loop yet

dumbledad linked an issue Jul 6, 2021 that may be closed by this pull request

Enable building an ensemble model from the cross validation checkpoints of a BYOL model #527

Open

dumbledad self-assigned this Jul 6, 2021

Tim Regan added 3 commits July 7, 2021 18:40

WiP very scrappy!

8a63e0a

Trying to work out how checkpoints are passed between training and inference, or where they should be by tracing through new model in run_ml

WiP more mess

116e566

WiP: bones of test class

7107e27

javier-alvarez changed the title ~~Enable building an ensemble model from the cross validation checkpoints of a BYOL model~~ Enable building an ensemble model from the cross validation checkpoints of a BYOM model Jul 9, 2021

Tim Regan added 4 commits July 9, 2021 13:36

Refactoring run_inference_for_lightning_models

d0c7724

Itno two for unit testing with ensemble from xval checkpoints

mypy fixes

6735d2c

Merge branch 'main' into timregan/527-ensembles-for-BYOL-xval

585511b

WiP annotations and test

4df8c09

dumbledad added 13 commits July 17, 2021 09:00

Merge branch 'main' into timregan/527-ensembles-for-BYOL-xval

002886c

additional comments and remove inheritance

c6ca034

removing duplicated unit test

484fe4a

more comments

b771137

test tidy

6d57d08

flake fixes

6a9fd75

WiP

cea51e4

on_ensemble_inference_start needn't call down

9fc696a

Old WiP changes

b947452

Adding HelloEnsembleInference

f7d22f4

run_ml changes with parameter

af079ae

Wi{ on mypy and tidy pre unit test fix

38b12b7

mypy fixes

c255774

dumbledad changed the title ~~Enable building an ensemble model from the cross validation checkpoints of a BYOM model~~ Enable building an ensemble model from the cross validation checkpoints of a BYO Lightning model Jul 18, 2021

dumbledad and others added 10 commits July 18, 2021 14:13

import fix so test discovery works

2db1857

Fixing test discovery

21044e6

I do not know why the instantiated model_config cannot live in deep_learning_config

WiP fixing unit test

4f10169

Unit test works

e0e31e6

Need to fix documentation and check it works for real on AML

file system test

1d23735

Adding register and actually building ensemble

1486bff

Removed call to innnereye_config

ff8fef1

unit test fix

ddb3c9f

fix for the reference error on AzureML

e71d49b

Merge branch 'main' into timregan/527-ensembles-for-BYOL-xval

db18fdd

dumbledad requested a review from ant0nsc July 23, 2021 04:36

dumbledad marked this pull request as ready for review July 23, 2021 04:36

ant0nsc suggested changes Jul 26, 2021

View reviewed changes

dumbledad mentioned this pull request Aug 3, 2021

(Attempt 2) Enable building an ensemble model from the cross validation checkpoints of a BYO Lightning model #549

Closed

dumbledad closed this Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable building an ensemble model from the cross validation checkpoints of a BYO Lightning model #529

Enable building an ensemble model from the cross validation checkpoints of a BYO Lightning model #529

dumbledad commented Jul 6, 2021 •

edited

ant0nsc left a comment

ant0nsc Jul 26, 2021

ant0nsc Jul 26, 2021

ant0nsc Jul 26, 2021

ant0nsc Jul 26, 2021

ant0nsc Jul 26, 2021

ant0nsc Jul 26, 2021

ant0nsc Jul 26, 2021

ant0nsc Jul 26, 2021

dumbledad commented Aug 3, 2021

		ensemble_model_name: str = param.String(
		doc=("The class name of the ensemble model to build from the cross validation checkpoints of a Lightning model."))

		self.model_config_loader = model_config_loader
		self.ensemble_model: Optional[InnerEyeEnsembleInference] = None

Enable building an ensemble model from the cross validation checkpoints of a BYO Lightning model #529

Enable building an ensemble model from the cross validation checkpoints of a BYO Lightning model #529

Conversation

dumbledad commented Jul 6, 2021 • edited

ant0nsc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dumbledad commented Aug 3, 2021

dumbledad commented Jul 6, 2021 •

edited