What exactly is saved in '.../checkpoints/last.ckpt' when training is done for a segmentation task? #744

davidkvcs · 2022-06-17T14:41:26Z

Is there an existing issue for this?

I have searched the existing issues

Issue summary

When my segmentation task training is done, I am left with a single file containing weights: "last.cpkt" after 120 epochs using the HeadAndNeckBase class.

I used the "Building Models" guide, which says that report.ipynb uses the model checkpoint that performed best on the validation set - but it doesn't clarify if the finally saved "last.cpkt" hold the weights of the last epoch, or of the best performing checkpoint in that training session.

What documentation should be provided?

Would appreciate if you can clarify exactly what is saved when segmentation model training is done, so there is no doubt if I am pointing to the correct file when testing an existing model using python Inner/ML/runner.py --model=Prostate --no-train --local_weights_path=path_to_your_checkpoint.

Suggestion: Maybe change the name of the finally saved model to "best.cpkt" or "final.cpkt" to avoid confusion or document clearly in the section "Training a new model" if it is a requirement to the user to define what is finally saved and how it is named.

AB#6248

The text was updated successfully, but these errors were encountered:

ant0nsc · 2022-06-21T09:29:48Z

Hi @davidkvcs , thanks for bringing that up.
In an earlier version of the InnerEye toolbox, we had code that renamed the last.ckpt to best.ckpt, because effectively we did not use any sophisticated logic to choose the checkpoint. The documentation in Building Models is a reflection of that old way of doing things.
The current code uses last.ckpt to store the weights of the last training epoch, and uses that for inference too. It is the checkpoint that should be used for subsequent inference runs as you are suggesting.

We'll keep this issue open for now until we have a documentation update. I hope that my answer helps in the meantime.

davidkvcs added the documentation Improvements or additions to documentation label Jun 17, 2022

ant0nsc mentioned this issue Jun 21, 2022

DOC: Improve doc for checkpointing #747

Merged

peterhessey assigned ant0nsc Jun 21, 2022

ant0nsc closed this as completed in #747 Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What exactly is saved in '.../checkpoints/last.ckpt' when training is done for a segmentation task? #744

What exactly is saved in '.../checkpoints/last.ckpt' when training is done for a segmentation task? #744

davidkvcs commented Jun 17, 2022 •

edited by azure-boards bot

Loading

ant0nsc commented Jun 21, 2022

What exactly is saved in '.../checkpoints/last.ckpt' when training is done for a segmentation task? #744

What exactly is saved in '.../checkpoints/last.ckpt' when training is done for a segmentation task? #744

Comments

davidkvcs commented Jun 17, 2022 • edited by azure-boards bot Loading

Is there an existing issue for this?

Issue summary

What documentation should be provided?

ant0nsc commented Jun 21, 2022

davidkvcs commented Jun 17, 2022 •

edited by azure-boards bot

Loading