-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse attention map::at triton error #472
Comments
Thanks for bringing this to our attention! If you’re having the same issue with NVIDIA’s repo I recommend opening an issue on DeepSpeed as well, and linking the two. Personally, I have a lot going on right now but I can look into this in a week or so. |
Thank you! I've opened an issue on the DeepSpeed repo: microsoft/DeepSpeed#1595. I'll also look into it to see if I can find anything. |
|
Downgrading my Nvidia driver fixed the issue! sudo apt install nvidia-driver-440 This installed Nvidia driver 460.91.03 (not sure why it's not 440) but it works! I was previously on driver version 495.44. |
Describe the bug
When training with sparse attention, Triton throws
IndexError: map::at
.This is the full traceback
To Reproduce
I cloned GPT-NeoX into my home directory.
Then I started and attached to a docker container with CUDA 10.2:
docker run --gpus all -ti -d --name gptneox -v ~/:/home pytorch/pytorch:1.8.1-cuda10.2-cudnn7-devel docker attach gptneox
The container had Python 3.6 installed, so I installed Python 3.8. I also installed libopenmpi-dev, as it's required to install all the GPT-NeoX dependencies:
I created and activated a new Python 3.8 virtual env and then pip installed PyTorch 1.8 and the latest version of Apex (commit aa756cec4359aff3df1d9abb68dc6e6e92920e0c):
Then I installed the GPT-NeoX dependencies:
I downloaded a dataset:
Then I started training without sparse attention, which worked fine:
However, once I added in the sparse attention config, it threw the error mentioned above. This is the command that caused the error:
Expected behavior
Training should run when sparse attention is enabled.
Proposed solution
I'm not sure how to fix this. I did run into this same issue when trying Microsoft's DeepSpeed examples, so this may be an issue inherited from DeepSpeed, rather than something introduced by DeeperSpeed.
Screenshots
N/A
Environment (please complete the following information):
small.yml
sparse.yml
local_setup.yml
NVCC 10.2
Nvidia driver 495.44
gcc 7.5.0
pip freeze
Additional context
None
The text was updated successfully, but these errors were encountered: