Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on interactive generation #555

Closed
tonigi opened this issue Feb 15, 2022 · 10 comments
Closed

Error on interactive generation #555

tonigi opened this issue Feb 15, 2022 · 10 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@tonigi
Copy link

tonigi commented Feb 15, 2022

Describe the bug
Setting "text-gen-type": "interactive" results in an IndexError: : shape mismatch: indexing tensors could not be broadcast together with shapes [4], [3]. Other generation types work.

To Reproduce
Steps to reproduce the behavior:

  1. Install, adapt 20B to local environment, add "text-gen-type": "interactive" config
  2. Run inference
  3. Enter arbitrary prompt when requested
  4. See error

Expected behavior
Should work like non-interactive mode.

Environment (please complete the following information):

  • GPUs: 4xV100
  • Configs: 20B + "pipe-parallel-size": 1 + "text-gen-type": "interactive"

Additional context
Using ppc64le, so some libraries are not exactly as pinned. Please ignore the issue if it does not occur on more common platforms.

@tonigi tonigi added the bug Something isn't working label Feb 15, 2022
@StellaAthena
Copy link
Member

Thank you for the bug report. Can you check that this wasn’t inadvertently caused by #539?

@tonigi
Copy link
Author

tonigi commented Feb 16, 2022

Uhm no, I'm misunderstanding something. In both main and 2189a4f , interactive seems to only work with 3-words prompts.

@Adrian-1234
Copy link

Uhm no, I'm misunderstanding something. In both main and 2189a4f , interactive seems to only work with 3-words prompts.

Ditto for me also.

@StellaAthena StellaAthena added the good first issue Good for newcomers label Mar 22, 2022
@slash-under
Copy link
Contributor

Traceback (most recent call last):
  File "generate.py", line 88, in <module>
    main()
  File "generate.py", line 71, in main
    generate_samples_interactive(
  File "/workspace/gpt-neox/megatron/text_generation_utils.py", line 745, in generate_samples_interactive
    for (
  File "/workspace/gpt-neox/megatron/text_generation_utils.py", line 311, in stream_tokens
    logits = forward_model(model, model_inputs, neox_args.is_pipe_parallel)
  File "/workspace/gpt-neox/megatron/text_generation_utils.py", line 155, in forward_model
    loss, logits = model.eval_batch(model_inputs, return_logits=True)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 394, in eval_batch
    self._exec_schedule(sched)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1308, in _exec_schedule
    self._exec_instr(**cmd.kwargs)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 700, in _exec_forward_pass
    self.loss = self.loss_model(outputs, labels)
  File "/workspace/gpt-neox/megatron/model/gpt2_model.py", line 67, in cross_entropy
    losses = mpu.vocab_parallel_cross_entropy(output.float().contiguous(), labels)
  File "/workspace/gpt-neox/megatron/mpu/cross_entropy.py", line 114, in vocab_parallel_cross_entropy
    return _VocabParallelCrossEntropy.apply(vocab_parallel_logits, target)
  File "/workspace/gpt-neox/megatron/mpu/cross_entropy.py", line 60, in forward
    predicted_logits_1d = logits_2d[arange_1d, masked_target_1d]
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [27], [3]
Traceback (most recent call last):
  File "generate.py", line 88, in <module>
    main()
  File "generate.py", line 71, in main
    generate_samples_interactive(
  File "/workspace/gpt-neox/megatron/text_generation_utils.py", line 745, in generate_samples_interactive
    for (
  File "/workspace/gpt-neox/megatron/text_generation_utils.py", line 311, in stream_tokens
    logits = forward_model(model, model_inputs, neox_args.is_pipe_parallel)
  File "/workspace/gpt-neox/megatron/text_generation_utils.py", line 155, in forward_model
    loss, logits = model.eval_batch(model_inputs, return_logits=True)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 394, in eval_batch
    self._exec_schedule(sched)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1308, in _exec_schedule
    self._exec_instr(**cmd.kwargs)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 700, in _exec_forward_pass
    self.loss = self.loss_model(outputs, labels)
  File "/workspace/gpt-neox/megatron/model/gpt2_model.py", line 67, in cross_entropy
    losses = mpu.vocab_parallel_cross_entropy(output.float().contiguous(), labels)
  File "/workspace/gpt-neox/megatron/mpu/cross_entropy.py", line 114, in vocab_parallel_cross_entropy
    return _VocabParallelCrossEntropy.apply(vocab_parallel_logits, target)
  File "/workspace/gpt-neox/megatron/mpu/cross_entropy.py", line 60, in forward
    predicted_logits_1d = logits_2d[arange_1d, masked_target_1d]
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [27], [3]

@Adrian-1234
Copy link

Newbie alert ! - A quick test input-file & interactive run across 8 GPU's :

cross_entropy.py Line 60:

    print ("DEBUG ",arange_1d, masked_target_1d)

    predicted_logits_1d = logits_2d[arange_1d, masked_target_1d]

For "text-gen-type": "input-file",

DEBUG tensor([0, 1, 2, 3, 4, 5, 6, 7, 8], device='cuda:6') tensor([ 58, 46434, 0], device='cuda:6')

DEBUG tensor([0, 1, 2, 3, 4, 5, 6, 7, 8], device='cuda:7') tensor([0, 0, 0], device='cuda:7')

For "text-gen-type": "interactive",

DEBUG DEBUG tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,

    36, 37, 38], device='cuda:6') tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,

    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,

    36, 37, 38], device='cuda:7') tensor([  513,   417,  1158,   368,   476,  1014,  3812,   253,  1386,   670,

     3347, 47301,    13,  2167,   253,  5301,   310,  4931,   247,  1652,

     2372, 16593,   984,   247,  3347,  3024,  3542,   407,   247,  5145,

      588,   320,  1805, 14109,   407,  1529,  5145,    15,     0],

   device='cuda:6')

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:7')

DEBUG DEBUG tensor([0], device='cuda:7') tensor([0], device='cuda:7')

tensor([0], device='cuda:6') tensor([0], device='cuda:6')

DEBUG tensor([0], device='cuda:7') DEBUG tensor([0], device='cuda:7')

tensor([0], device='cuda:6') tensor([0], device='cuda:6')

DEBUG DEBUG tensor([0], device='cuda:7') tensor([0], device='cuda:7')

tensor([0], device='cuda:6') tensor([0], device='cuda:6')

DEBUG DEBUG tensor([0], device='cuda:7') tensor([0], device='cuda:7')

tensor([0], device='cuda:6') tensor([0], device='cuda:6')

DEBUG tensor([0], device='cuda:7') DEBUG tensor([0], device='cuda:7')

tensor([0], device='cuda:6') tensor([0], device='cuda:6')

DEBUG DEBUG tensor([0], device='cuda:6') tensor([0], device='cuda:7') tensor([0], device='cuda:6')

I don't understand enough about how this has been written but is it something to do with the large number of Tensor elements in the interactive mode - caused perhaps by the code not properly dimensioning the tensor array with the interactive input ?

@slash-under
Copy link
Contributor

slash-under commented Mar 30, 2022

With that line added on my end...

Context prompt >>> this cat
DEBUG  tensor([0, 1], device='cuda:3') tensor([0, 0, 0], device='cuda:3')
DEBUG  tensor([0, 1], device='cuda:2') tensor([   58, 46434,     0], device='cuda:2')

The same prompt, passed in as a file:

DEBUG  tensor([0, 1], device='cuda:3') tensor([0, 0], device='cuda:3')
DEBUG  tensor([0, 1], device='cuda:2') tensor([5798,    0], device='cuda:2')

These tensors don't look dimensioned properly to me for smaller inputs as well, looks like we might have the root cause outlined.

@slash-under
Copy link
Contributor

See #604 for the greater prototyping effort underway to ensure that all processes have the correct context_length and context_tokens.

@slash-under
Copy link
Contributor

This is resolved

@Kyle1668
Copy link
Contributor

I'm observing behavior similar to this issue. Whenever I enter an interactive input longer than three tokens I receive an error like this.

Context prompt >>> Are humans born with virtue?
Traceback (most recent call last):
  File "generate.py", line 88, in <module>
    main()
  File "generate.py", line 71, in main
    generate_samples_interactive(
  File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 746, in generate_samples_interactive
    for (
  File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 312, in stream_tokens
    logits = forward_model(model, model_inputs, neox_args.is_pipe_parallel)
  File "/home/mchorse/gpt-neox/megatron/text_generation_utils.py", line 155, in forward_model
    loss, logits = model.eval_batch(model_inputs, return_logits=True)
  File "/home/mchorse/anaconda3/envs/kyle1668/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 394, in eval_batch
    self._exec_schedule(sched)
  File "/home/mchorse/anaconda3/envs/kyle1668/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1308, in _exec_schedule
    self._exec_instr(**cmd.kwargs)
  File "/home/mchorse/anaconda3/envs/kyle1668/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 700, in _exec_forward_pass
    self.loss = self.loss_model(outputs, labels)
  File "/home/mchorse/gpt-neox/megatron/model/gpt2_model.py", line 67, in cross_entropy
    losses = mpu.vocab_parallel_cross_entropy(output.float().contiguous(), labels)
  File "/home/mchorse/gpt-neox/megatron/mpu/cross_entropy.py", line 114, in vocab_parallel_cross_entropy
    return _VocabParallelCrossEntropy.apply(vocab_parallel_logits, target)
  File "/home/mchorse/gpt-neox/megatron/mpu/cross_entropy.py", line 60, in forward
    predicted_logits_1d = logits_2d[arange_1d, masked_target_1d]
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [6], [3]

I don't receive any errors when the length of the input does not exceed three tokens.

Context prompt Is virtue innate
Generated Text: ?
Generated Text: ? Can
Generated Text: ? Can '
Generated Text: ? Can 'nob
Generated Text: ? Can 'noble
Generated Text: ? Can 'noble sentiment
Generated Text: ? Can 'noble sentiment,'
Generated Text: ? Can 'noble sentiment,' '
Generated Text: ? Can 'noble sentiment,' 'just
Generated Text: ? Can 'noble sentiment,' 'just pride
Generated Text: ? Can 'noble sentiment,' 'just pride'
Generated Text: ? Can 'noble sentiment,' 'just pride' be
Generated Text: ? Can 'noble sentiment,' 'just pride' be fost
Generated Text: ? Can 'noble sentiment,' 'just pride' be fostered
Generated Text: ? Can 'noble sentiment,' 'just pride' be fostered in
Generated Text: ? Can 'noble sentiment,' 'just pride' be fostered in every
...
Generated Text: ? Can 'noble sentiment,' 'just pride' be fostered in every human being by wholesome influences, moral laws, sound education, the workings of love? Miss Edgeworth says, no; that society is 'inert, boys mischievous, parents fanatical, children hopeless'; that much remains to be done, 'if admiration and emulation be to be our next goal in heaven; our saints and philosophers, our poets and warriors, are there, perhaps, only a preparation

Environment

  • 8x NVIDIA A40
  • Ubuntu 20.04

@jdagdelen
Copy link

I think I'm also experiencing this. Any interactive prompt larger than 3 words has issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

6 participants