Skip to content

Commit

Permalink
Prevent infinite recursion when DS_ACCELERATOR is set to cuda (micros…
Browse files Browse the repository at this point in the history
…oft#4962)

When DS_ACCELERATOR is overriden to CUDA, `get_accelerator` attempts to
check if `is_current_accelerator_supported`. But since that calls
`get_accelerator` again and `ds_accelerator` has not been initialized,
DeepSpeed runs into infinite recursion.

```
    elif is_current_accelerator_supported():
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 48, in is_current_accelerator_supported
    return get_accelerator().device_name() in SUPPORTED_ACCELERATOR_LIST
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 101, in get_accelerator
    elif is_current_accelerator_supported():
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 48, in is_current_accelerator_supported
    return get_accelerator().device_name() in SUPPORTED_ACCELERATOR_LIST
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 101, in get_accelerator
    elif is_current_accelerator_supported():
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 48, in is_current_accelerator_supported
    return get_accelerator().device_name() in SUPPORTED_ACCELERATOR_LIST
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 101, in get_accelerator
    elif is_current_accelerator_supported():
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 48, in is_current_accelerator_supported
    return get_accelerator().device_name() in SUPPORTED_ACCELERATOR_LIST
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/accelerator/real_accelerator.py", line 59, in get_accelerator
    if "DS_ACCELERATOR" in os.environ.keys():
  File "/usr/lib/python3.8/_collections_abc.py", line 717, in __contains__
    return key in self._mapping
  File "/usr/lib/python3.8/_collections_abc.py", line 666, in __contains__
    self[key]
  File "/usr/lib/python3.8/os.py", line 672, in __getitem__
    value = self._data[self.encodekey(key)]
RecursionError: maximum recursion depth exceeded
```

This change fixes that by comparing the accelerator directly with the
supported list of accelerators.
  • Loading branch information
ShukantPal authored and amaurya committed Feb 17, 2024
1 parent a308656 commit afa8921
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion accelerator/real_accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def get_accelerator():
except ImportError as e:
raise ValueError(
f"HPU_Accelerator requires habana_frameworks.torch.hpu, which is not installed on this system.")
elif is_current_accelerator_supported():
elif accelerator_name not in SUPPORTED_ACCELERATOR_LIST:
raise ValueError(f'DS_ACCELERATOR must be one of {SUPPORTED_ACCELERATOR_LIST}. '
f'Value "{accelerator_name}" is not supported')
ds_set_method = "override"
Expand Down

0 comments on commit afa8921

Please sign in to comment.