RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #12

TheFiZi · 2023-02-22T04:48:30Z

This happens randomly when generating a level.

Using the prompts: no blocks, no pipes, many goombas, fireball

shape: torch.Size([1, 678]), torch.Size([1, 1393]) first: 56, last: 51:  99%|██████████████████████████████████████████████████████████████████▌| 1392/1400 [02:58<00:01,  7.82it/s]Traceback (most recent call last):
  File "/home/me/apps/mariogpt/capturePlay.py", line 38, in <module>
    generated_level = mario_lm.sample(                                                                                                                                                File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/lm/gpt.py", line 54, in sample
    return sampler(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 248, in __call__
    return self.sample(*args, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 223, in sample
    next_tokens, encoder_hidden_states = self.step(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 158, in step
    out = self.mario_lm.lm(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1043, in forward
    transformer_outputs = self.transformer(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 887, in forward
    outputs = block(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 388, in forward
    attn_outputs = self.attn(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 329, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 216, in _attn
    attn_output = torch.matmul(attn_weights, value)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

I'm using:

generated_level = mario_lm.sample(
    prompts=prompts,
    num_steps=1400,
    #num_steps=100,
    temperature=2.0,
    use_tqdm=True
)

This was happening less frequently in 0.1.2 it feels like. I just upgraded to 0.1.3.

The text was updated successfully, but these errors were encountered:

TheFiZi · 2023-02-22T06:29:23Z

It just dawned on me. I wonder if these are happening because sometimes the generation process goes over the amount of RAM I have available on my GPU? I'm just using a dinky Quadro P620 right now which only has 2GB of RAM.

shyamsn97 · 2023-02-22T07:18:18Z

Not sure actually, what torch version are you using? Maybe an upgrade is needed

TheFiZi · 2023-02-22T19:06:38Z

Not sure actually, what torch version are you using? Maybe an upgrade is needed

Same response as #13 (comment) :)

shyamsn97 · 2023-02-22T19:33:15Z

Yeah I find it strange that it’s happening at a random part in the generation, seems like it could be some weird cuda issue lol

TheFiZi · 2023-02-24T02:58:56Z

I am going to close this off as a not enough memory issue. I ran the default generation example and it peaked at ~6GB of vRAM.

The Quadro I was running it on only has 2GB.

shyamsn97 mentioned this issue Feb 22, 2023

RuntimeError: invalid multinomial distribution #13

Closed

TheFiZi closed this as completed Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #12

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #12

TheFiZi commented Feb 22, 2023 •

edited

Loading

TheFiZi commented Feb 22, 2023

shyamsn97 commented Feb 22, 2023

TheFiZi commented Feb 22, 2023

shyamsn97 commented Feb 22, 2023

TheFiZi commented Feb 24, 2023

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #12

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #12

Comments

TheFiZi commented Feb 22, 2023 • edited Loading

TheFiZi commented Feb 22, 2023

shyamsn97 commented Feb 22, 2023

TheFiZi commented Feb 22, 2023

shyamsn97 commented Feb 22, 2023

TheFiZi commented Feb 24, 2023

TheFiZi commented Feb 22, 2023 •

edited

Loading