Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The learning is done, but if it is interrupted because an error occurs during the prediction, how should I restart it? #2319

Closed
WanzeeCho opened this issue Jun 26, 2024 · 1 comment
Assignees

Comments

@WanzeeCho
Copy link

2024-06-26 00:09:52.543149: Epoch 999
2024-06-26 00:09:52.544152: Current learning rate: 2e-05
2024-06-26 00:11:41.186138: train_loss -0.6875
2024-06-26 00:11:41.186829: val_loss -0.646
2024-06-26 00:11:41.186978: Pseudo dice [0.9203, 0.7646]
2024-06-26 00:11:41.187284: Epoch time: 108.65 s
2024-06-26 00:11:48.493574: Training done.
2024-06-26 00:11:48.587392: Using splits from existing split file: /home/mip/disk3/WZ_Code/nnUNetv2/nnUNet_preprocessed/Dataset903_fakeCeT1_EMA_student_1/splits_final.json
2024-06-26 00:11:48.589633: The split file contains 5 splits.
2024-06-26 00:11:48.589791: Desired fold for training: 2
2024-06-26 00:11:48.589883: This split has 336 training and 84 validation cases.
2024-06-26 00:11:48.591720: predicting crossmoda2021_ldn_100_hrT2_qsa
2024-06-26 00:11:48.600965: crossmoda2021_ldn_100_hrT2_qsa, shape torch.Size([1, 120, 448, 448]), rank 0

This is the end of the log file.

Then an error occurred,

File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-9dc572, line 5; fatal : Unsupported .version 8.2; current version is '8.1'
ptxas fatal : Ptx assembly aborted due to errors

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True

What should I do to proceed to the end? Should I retrain fold2?

@WanzeeCho
Copy link
Author

WanzeeCho commented Jul 11, 2024

Maybe the --c option might help

no offense but useless issue report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants