Allow recovery from training runs on a local machine. #178

Shruthi42 · 2020-08-25T12:42:09Z

Right now, there's no easy way to recover training/run inference on a local run.

- Adds a parameter `weights_url` to DeepLearningConfig to download model weights from a URL. - Adds a parameter `local_weights_path` to DeepLearningConfig to initialize model weights from a local checkpoint. This can also be used to perform inference on a checkpoint from a local training run. - Refactors all checkpoint logic, including recovering from run_recovery into a class CheckpointHandler - Adds a parameter `epochs_to_test` to DeepLearningConfig which can be used to specify a list of epochs to test in a training/inference run. - Deprecates DeepLearningConfig parameters `test_diff_epochs`, `test_step_epochs` and `test_start_epoch`. Closes #178 Closes #297

Shruthi42 mentioned this issue Nov 2, 2020

Load model weights from URL or local checkpoint #282

Merged

Shruthi42 closed this as completed in #282 Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow recovery from training runs on a local machine. #178

Allow recovery from training runs on a local machine. #178

Shruthi42 commented Aug 25, 2020

Allow recovery from training runs on a local machine. #178

Allow recovery from training runs on a local machine. #178

Comments

Shruthi42 commented Aug 25, 2020