Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #12

Closed
TheFiZi opened this issue Feb 22, 2023 · 5 comments
Closed

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #12

TheFiZi opened this issue Feb 22, 2023 · 5 comments

Comments

@TheFiZi
Copy link

TheFiZi commented Feb 22, 2023

This happens randomly when generating a level.

Using the prompts: no blocks, no pipes, many goombas, fireball

shape: torch.Size([1, 678]), torch.Size([1, 1393]) first: 56, last: 51:  99%|██████████████████████████████████████████████████████████████████▌| 1392/1400 [02:58<00:01,  7.82it/s]Traceback (most recent call last):
  File "/home/me/apps/mariogpt/capturePlay.py", line 38, in <module>
    generated_level = mario_lm.sample(                                                                                                                                                File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/lm/gpt.py", line 54, in sample
    return sampler(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 248, in __call__
    return self.sample(*args, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 223, in sample
    next_tokens, encoder_hidden_states = self.step(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 158, in step
    out = self.mario_lm.lm(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1043, in forward
    transformer_outputs = self.transformer(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 887, in forward
    outputs = block(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 388, in forward
    attn_outputs = self.attn(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 329, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 216, in _attn
    attn_output = torch.matmul(attn_weights, value)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

I'm using:

generated_level = mario_lm.sample(
    prompts=prompts,
    num_steps=1400,
    #num_steps=100,
    temperature=2.0,
    use_tqdm=True
)

This was happening less frequently in 0.1.2 it feels like. I just upgraded to 0.1.3.

@TheFiZi
Copy link
Author

TheFiZi commented Feb 22, 2023

It just dawned on me. I wonder if these are happening because sometimes the generation process goes over the amount of RAM I have available on my GPU? I'm just using a dinky Quadro P620 right now which only has 2GB of RAM.

@shyamsn97
Copy link
Owner

Not sure actually, what torch version are you using? Maybe an upgrade is needed

@TheFiZi
Copy link
Author

TheFiZi commented Feb 22, 2023

Not sure actually, what torch version are you using? Maybe an upgrade is needed

Same response as #13 (comment) :)

@shyamsn97
Copy link
Owner

Yeah I find it strange that it’s happening at a random part in the generation, seems like it could be some weird cuda issue lol

@TheFiZi
Copy link
Author

TheFiZi commented Feb 24, 2023

I am going to close this off as a not enough memory issue. I ran the default generation example and it peaked at ~6GB of vRAM.

The Quadro I was running it on only has 2GB.

@TheFiZi TheFiZi closed this as completed Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants