Skip to content

Commit

Permalink
Merge pull request EleutherAI#938 from EleutherAI/force_multi
Browse files Browse the repository at this point in the history
Remove duplicate deepspeed config and allow forced multinode
  • Loading branch information
StellaAthena authored May 17, 2023
2 parents 162ea36 + 21a43a2 commit 1a43a58
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 2 deletions.
2 changes: 1 addition & 1 deletion configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Logging Arguments

- **git_hash**: str

Default = b181124
Default = b130d58

current git hash of repository

Expand Down
5 changes: 5 additions & 0 deletions megatron/neox_arguments/deepspeed_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,11 @@ class NeoXArgsDeepspeedRunner(NeoXArgsTemplate):
If true, overrides the default check where DeepSpeed confirms that the headnode is accessible via ssh.
"""

force_multi: bool = False
"""
If true, Force multi-node launcher mode, helps in cases where user wants to launch on single remote node.
"""

comment: str = None
"""
Adds a `--comment` to the DeepSpeed launch command. In DeeperSpeed this is passed on to the SlurmLauncher as well. Sometime necessary for cluster rules, or so I've heard.
Expand Down
3 changes: 2 additions & 1 deletion megatron/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,8 @@ def setup_model_and_optimizer(neox_args, use_cache=False, iteration=None):
lr_scheduler=_lr_scheduler,
dist_init_required=False,
model_parameters=_model_params,
config_params=neox_args.deepspeed_config,
# Need to remove the below so that it doesn't conflict with --deepspeed_config required by autotuning
#config_params=neox_args.deepspeed_config,
mpu=mpu if not neox_args.is_pipe_parallel else None,
)
model.total_params = get_total_params(model.module)
Expand Down

0 comments on commit 1a43a58

Please sign in to comment.