Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'dp_process_group' at evaluating medium gpt-2 model #474

Closed
sameeravithana opened this issue Dec 3, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@sameeravithana
Copy link

sameeravithana commented Dec 3, 2021

Describe the bug

Traceback (most recent call last):
  File "evaluate.py", line 46, in <module>
    main()
  File "evaluate.py", line 35, in main
    model, neox_args = setup_for_inference_or_eval(inference=False, get_key_value=False)
  File "../megatron/utils.py", line 410, in setup_for_inference_or_eval
    model, _, _ = setup_model_and_optimizer(
  File "../megatron/training.py", line 390, in setup_model_and_optimizer
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "../lib/python3.8/site-packages/deepspeed/__init__.py", line 128, in initialize
    engine = PipelineEngine(args=args,
  File "../lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 60, in __init__
    super().__init__(*super_args, **super_kwargs)
  File "../lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 207, in __init__
    self._configure_checkpointing(dist_init_required)
  File "../lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 514, in _configure_checkpointing
    group=self.optimizer.dp_process_group)
AttributeError: 'NoneType' object has no attribute 'dp_process_group'

To Reproduce
./deepy.py evaluate.py medium.yaml *.yaml

@sameeravithana sameeravithana added the bug Something isn't working label Dec 3, 2021
@StellaAthena
Copy link
Member

What happens when you call ./deepy.py evaluate.py medium.yaml *.yaml?

@sameeravithana
Copy link
Author

It was from ./deepy.py evaluate.py medium.yaml *.yaml, the same error.

<SKIP the initial logs>
2021-12-03 12:55:51,023] [INFO] [module.py:363:_partition_layers] Partitioning pipeline stages with method type:transformer|mlp
stage=0 layers=29
     0: EmbeddingPipe
     1: _pre_transformer_block
     2: ParallelTransformerLayerPipe
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: _post_transformer_block
    27: NormPipe
    28: ParallelLinearPipe
  loss: partial
[2021-12-03 12:55:51,102] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.
DeepSpeed is enabled.
[2021-12-03 12:55:51,103] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.15+eb7f5cf, git-hash=eb7f5cf, git-branch=fetch_upstream
[2021-12-03 12:55:51,103] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.
[2021-12-03 12:55:51,103] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.
[2021-12-03 12:55:51,104] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.
Traceback (most recent call last):
  File "evaluate.py", line 46, in <module>
    main()
  File "evaluate.py", line 35, in main
    model, neox_args = setup_for_inference_or_eval(inference=False, get_key_value=False)
  File "../megatron/utils.py", line 410, in setup_for_inference_or_eval
    model, _, _ = setup_model_and_optimizer(
  File "../megatron/training.py", line 390, in setup_model_and_optimizer
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "../lib/python3.8/site-packages/deepspeed/__init__.py", line 128, in initialize
Traceback (most recent call last):
  File "evaluate.py", line 46, in <module>
    engine = PipelineEngine(args=args,
  File "../lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 60, in __init__
    main()
  File "evaluate.py", line 35, in main
    super().__init__(*super_args, **super_kwargs)
  File "../lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 207, in __init__
    model, neox_args = setup_for_inference_or_eval(inference=False, get_key_value=False)
  File "../megatron/utils.py", line 410, in setup_for_inference_or_eval
    model, _, _ = setup_model_and_optimizer(
  File "../megatron/training.py", line 390, in setup_model_and_optimizer
    self._configure_checkpointing(dist_init_required)
  File "../lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 514, in _configure_checkpointing
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "../lib/python3.8/site-packages/deepspeed/__init__.py", line 128, in initialize
    group=self.optimizer.dp_process_group)
AttributeError: 'NoneType' object has no attribute 'dp_process_group'

@sdtblck
Copy link
Contributor

sdtblck commented Dec 12, 2021

Hi @SamTube405 - this is an error with deepspeed trying to load zero optimizer states if you specify one in your config, even if we set load_optim to false.

This should be fixed in a future update, but for now you can just set your zero_stage to 0 in the config when evaluating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants