Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SequentialWrapper Generation (pipe_parallel_size = 0) #1031

Merged
merged 2 commits into from
Sep 18, 2023

Conversation

xu-song
Copy link
Contributor

@xu-song xu-song commented Sep 15, 2023

This PR fix the following bug for generating with pipe_parallel_size = 0

Bug Reproduce

set pipe_parallel_size = 0

$ python ./deepy.py generate.py -d configs 125M.yml local_setup.yml text_generation.yml

The following error occurres when generating with pipe_parallel_size = 0

  is_pipe_parallel ................ False.......................default
  pipe_parallel_size .............. 0...........................default

Traceback (most recent call last):
  File "generate.py", line 91, in <module>
    main()
  File "generate.py", line 73, in main
    generate_samples_interactive(
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/text_generation_utils.py", line 782, in generate_samples_interactive
    for (
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/text_generation_utils.py", line 319, in stream_tokens
    logits = forward_model(model, model_inputs, neox_args.is_pipe_parallel)
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/text_generation_utils.py", line 137, in forward_model
    return model.module(model_inputs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/utils.py", line 182, in forward
    x = func(forward_input)
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/utils.py", line 175, in exec_func
    inputs = layer(inputs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/transformer.py", line 916, in forward
    return super().forward(hidden_states, attention_mask), attention_mask
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/transformer.py", line 860, in forward
    attention_output, attention_bias = self.attention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/transformer.py", line 688, in forward
    context_layer = self.attention(
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/transformer.py", line 451, in attention
    attention_probs = self.scale_mask_softmax(attention_scores, attention_mask)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/fused_softmax.py", line 146, in forward
    return self.forward_torch_softmax(input, mask)
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/fused_softmax.py", line 190, in forward_torch_softmax
    mask_output = self.mask_func(input, mask) if mask is not None else input
  File "/workspace/gpt-neox/gpt-neox-dev-latest/megatron/model/gpt2_model.py", line 52, in gpt2_attention_mask_func
    attention_scores.masked_fill_(ltor_mask, mask_value)
RuntimeError: The expanded size of the tensor (1) must match the existing size (4) at non-singleton dimension 2.  Target sizes: [1, 12, 1, 4].  Tensor sizes: [1, 1, 4, 4]

The above procedure is easy to reproduce.

Analysis

  • attention_score.size= [1, 12, 1, 4]
  • attention_mask.size = [1, 1, 4, 4] (wrong size)

The right size of attention_mask should be [1, 1, 1, 1] for generation_step > 1.

Root Cause

The sequential generation process (SequentialWrapper) is missing a batch_fn, which leads to bad size of attention_mask.

if neox_args.is_pipe_parallel:
model.set_has_attention_mask(True)
if neox_args.curriculum_learning:
curr_scheduler = CurriculumScheduler(neox_args.curriculum_learning)
if iteration is not None and iteration > 0:
curr_scheduler.update_difficulty(iteration)
else:
curr_scheduler = None
model.set_batch_fn(
partial(
get_batch_pipe, neox_args=neox_args, curr_scheduler=curr_scheduler
)
)

Similar implementation can be found in PipelineEngine deepspeed/runtime/pipe/engine.py#L-578 pipe_parallel_size > 0

@xu-song xu-song requested a review from a team as a code owner September 15, 2023 14:11
@xu-song xu-song changed the title Fix SequentialGeneration Fix SequentialWrapper Generation (pipe_parallel_size = 0) Sep 16, 2023
@Quentin-Anthony Quentin-Anthony merged commit 70af6e8 into EleutherAI:main Sep 18, 2023
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants