Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Allow recovery from training runs on a local machine. #178

Closed
Shruthi42 opened this issue Aug 25, 2020 · 0 comments · Fixed by #282
Closed

Allow recovery from training runs on a local machine. #178

Shruthi42 opened this issue Aug 25, 2020 · 0 comments · Fixed by #282

Comments

@Shruthi42
Copy link
Contributor

Right now, there's no easy way to recover training/run inference on a local run.

Shruthi42 added a commit that referenced this issue Nov 3, 2020
- Adds a parameter `weights_url` to DeepLearningConfig to download model weights from a URL.
- Adds a parameter `local_weights_path` to DeepLearningConfig to initialize model weights from a local checkpoint. This can also be used to perform inference on a checkpoint from a local training run.
- Refactors all checkpoint logic, including recovering from run_recovery into a class CheckpointHandler
- Adds a parameter `epochs_to_test` to DeepLearningConfig which can be used to specify a list of epochs to test in a training/inference run.
-  Deprecates DeepLearningConfig parameters `test_diff_epochs`, `test_step_epochs` and `test_start_epoch`.

Closes #178 
Closes #297
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant