Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot continue previous training in SLURM #2367

Open
sabinevater opened this issue Jul 11, 2024 · 0 comments
Open

Cannot continue previous training in SLURM #2367

sabinevater opened this issue Jul 11, 2024 · 0 comments
Assignees

Comments

@sabinevater
Copy link

Dear nnUNet - team,

I have trained a model on a cluster, but due to time constraints this training was aborted. I wanted to continue it by adding the --c flag to my SLURM-script:

CUDA_VISIBLE_DEVICES=0 nnUNetv2_train 017 2d 0 --npz --c &
...
wait

I changed nothing in the script except this flag in each line and still I get the following error:

#########################

/slurm_script: line 15: cd: /work/user: No such file or directory
Traceback (most recent call last):
File "/home/user/nnunet/bin/nnUNetv2_train", line 33, in
Traceback (most recent call last):
File "/home/user/nnunet/bin/nnUNetv2_train", line 33, in
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 196, in run_training
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 196, in run_training
nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name,
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 62, in get_trainer_from_args
preprocessed_dataset_folder_base = join(nnUNet_preprocessed, maybe_convert_to_dataset_name(dataset_name_or_id))
File "/home/user/nnUNet/nnunetv2/utilities/dataset_name_id_conversion.py", line 74, in maybe_convert_to_dataset_name
Traceback (most recent call last):
File "/home/user/nnunet/bin/nnUNetv2_train", line 33, in
nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name,
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 62, in get_trainer_from_args
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry
preprocessed_dataset_folder_base = join(nnUNet_preprocessed, maybe_convert_to_dataset_name(dataset_name_or_id))
File "/home/user/nnUNet/nnunetv2/utilities/dataset_name_id_conversion.py", line 74, in maybe_convert_to_dataset_name
return convert_id_to_dataset_name(dataset_name_or_id)
File "/home/user/nnUNet/nnunetv2/utilities/dataset_name_id_conversion.py", line 48, in convert_id_to_dataset_name
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 196, in run_training
return convert_id_to_dataset_name(dataset_name_or_id)
File "/home/user/nnUNet/nnunetv2/utilities/dataset_name_id_conversion.py", line 48, in convert_id_to_dataset_name
nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name,
File "/home/user/nnUNet/nnunetv2/run/run_training.py", line 62, in get_trainer_from_args
raise RuntimeError(f"Could not find a dataset with the ID {dataset_id}. Make sure the requested dataset ID "
preprocessed_dataset_folder_base = join(nnUNet_preprocessed, maybe_convert_to_dataset_name(dataset_name_or_id))
RuntimeError: Could not find a dataset with the ID 17. Make sure the requested dataset ID exists and that nnU-Net knows where raw and preprocessed data are located (see Documentation - Installation). Here are$nnUNet_preprocessed=/work/user/data/nnUNet_preprocessed
nnUNet_results=/work/user/data/nnUNet_results
nnUNet_raw=/work/user/data/nnUNet_raw
If something is not right, adapt your environment variables.
File "/home/user/nnUNet/nnunetv2/utilities/dataset_name_id_conversion.py", line 74, in maybe_convert_to_dataset_name
return convert_id_to_dataset_name(dataset_name_or_id)
File "/home/user/nnUNet/nnunetv2/utilities/dataset_name_id_conversion.py", line 48, in convert_id_to_dataset_name
raise RuntimeError(f"Could not find a dataset with the ID {dataset_id}. Make sure the requested dataset ID "

############################

The script was working just fine before I added in the --c . Also, if I start it in the command line by itself (just CUDA_VISIBLE_DEVICES=0 nnUNetv2_train 017 2d 0 --npz --c ) it also runs well.

Does somebody know how to fix this? Tips would be very much appreciated

Kind regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants