The learning is done, but if it is interrupted because an error occurs during the prediction, how should I restart it? #2319

WanzeeCho · 2024-06-26T05:33:02Z

2024-06-26 00:09:52.543149: Epoch 999
2024-06-26 00:09:52.544152: Current learning rate: 2e-05
2024-06-26 00:11:41.186138: train_loss -0.6875
2024-06-26 00:11:41.186829: val_loss -0.646
2024-06-26 00:11:41.186978: Pseudo dice [0.9203, 0.7646]
2024-06-26 00:11:41.187284: Epoch time: 108.65 s
2024-06-26 00:11:48.493574: Training done.
2024-06-26 00:11:48.587392: Using splits from existing split file: /home/mip/disk3/WZ_Code/nnUNetv2/nnUNet_preprocessed/Dataset903_fakeCeT1_EMA_student_1/splits_final.json
2024-06-26 00:11:48.589633: The split file contains 5 splits.
2024-06-26 00:11:48.589791: Desired fold for training: 2
2024-06-26 00:11:48.589883: This split has 336 training and 84 validation cases.
2024-06-26 00:11:48.591720: predicting crossmoda2021_ldn_100_hrT2_qsa
2024-06-26 00:11:48.600965: crossmoda2021_ldn_100_hrT2_qsa, shape torch.Size([1, 120, 448, 448]), rank 0

This is the end of the log file.

Then an error occurred,

File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-9dc572, line 5; fatal : Unsupported .version 8.2; current version is '8.1'
ptxas fatal : Ptx assembly aborted due to errors

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True

What should I do to proceed to the end? Should I retrain fold2?

WanzeeCho · 2024-07-11T03:46:32Z

Maybe the --c option might help

no offense but useless issue report

FabianIsensee assigned dojoh Jun 26, 2024

WanzeeCho closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The learning is done, but if it is interrupted because an error occurs during the prediction, how should I restart it? #2319

The learning is done, but if it is interrupted because an error occurs during the prediction, how should I restart it? #2319

WanzeeCho commented Jun 26, 2024

WanzeeCho commented Jul 11, 2024 •

edited

Loading

The learning is done, but if it is interrupted because an error occurs during the prediction, how should I restart it? #2319

The learning is done, but if it is interrupted because an error occurs during the prediction, how should I restart it? #2319

Comments

WanzeeCho commented Jun 26, 2024

WanzeeCho commented Jul 11, 2024 • edited Loading

WanzeeCho commented Jul 11, 2024 •

edited

Loading