Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

NODE_RANK KeyError on training runs #801

Closed
1 task done
peterhessey opened this issue Sep 13, 2022 · 0 comments
Closed
1 task done

NODE_RANK KeyError on training runs #801

peterhessey opened this issue Sep 13, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@peterhessey
Copy link
Contributor

peterhessey commented Sep 13, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Bug summary

When training models in InnerEye the error below is encountered due to recent changes in AML. This has been fixed in the latest hi-ml version (v0.2.5), so IE-DL needs to be updated to this.

Code for reproduction

python InnerEye/ML/runner.py --model=Lung--azureml

Actual outcome

Training run fails

Error messages

File "InnerEye/ML/runner.py", line 466, in <module>
    main()
  File "InnerEye/ML/runner.py", line 460, in main
    run(project_root=fixed_paths.repository_root_directory(),
  File "InnerEye/ML/runner.py", line 456, in run
    return runner.run()
  File "InnerEye/ML/runner.py", line 220, in run
    self.run_in_situ(azure_run_info)
  File "InnerEye/ML/runner.py", line 408, in run_in_situ
    set_environment_variables_for_multi_node()
  File "/mnt/azureml/cr/j/bc3f99f19bb745519fd9272cfd730249/exe/wd/InnerEye/Azure/azure_runner.py", line 313, in set_environment_variables_for_multi_node
    env_vars = ", ".join(f"{var} = {os.environ[var]}" for var in [ENV_MASTER_ADDR, ENV_MASTER_PORT, ENV_NODE_RANK])
  File "/mnt/azureml/cr/j/bc3f99f19bb745519fd9272cfd730249/exe/wd/InnerEye/Azure/azure_runner.py", line 313, in <genexpr>
    env_vars = ", ".join(f"{var} = {os.environ[var]}" for var in [ENV_MASTER_ADDR, ENV_MASTER_PORT, ENV_NODE_RANK])
  File "/azureml-envs/azureml_e12c14b51edf42f47eec39c741162949/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'NODE_RANK'

Expected outcome

Successful training run

System info

No response

AB#7305

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

1 participant