Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

What exactly is saved in '.../checkpoints/last.ckpt' when training is done for a segmentation task? #744

Closed
1 task done
davidkvcs opened this issue Jun 17, 2022 · 1 comment · Fixed by #747
Closed
1 task done
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@davidkvcs
Copy link

davidkvcs commented Jun 17, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Issue summary

When my segmentation task training is done, I am left with a single file containing weights: "last.cpkt" after 120 epochs using the HeadAndNeckBase class.

I used the "Building Models" guide, which says that report.ipynb uses the model checkpoint that performed best on the validation set - but it doesn't clarify if the finally saved "last.cpkt" hold the weights of the last epoch, or of the best performing checkpoint in that training session.

What documentation should be provided?

Would appreciate if you can clarify exactly what is saved when segmentation model training is done, so there is no doubt if I am pointing to the correct file when testing an existing model using python Inner/ML/runner.py --model=Prostate --no-train --local_weights_path=path_to_your_checkpoint.

Suggestion: Maybe change the name of the finally saved model to "best.cpkt" or "final.cpkt" to avoid confusion or document clearly in the section "Training a new model" if it is a requirement to the user to define what is finally saved and how it is named.

AB#6248

@davidkvcs davidkvcs added the documentation Improvements or additions to documentation label Jun 17, 2022
@ant0nsc
Copy link
Contributor

ant0nsc commented Jun 21, 2022

Hi @davidkvcs , thanks for bringing that up.
In an earlier version of the InnerEye toolbox, we had code that renamed the last.ckpt to best.ckpt, because effectively we did not use any sophisticated logic to choose the checkpoint. The documentation in Building Models is a reflection of that old way of doing things.
The current code uses last.ckpt to store the weights of the last training epoch, and uses that for inference too. It is the checkpoint that should be used for subsequent inference runs as you are suggesting.

We'll keep this issue open for now until we have a documentation update. I hope that my answer helps in the meantime.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants