Skip to content

Commit

Permalink
Updates bf16 demo config and mixed precision docutmentation. (#941)
Browse files Browse the repository at this point in the history
* Pre-commit

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Do not check for overflow if not using fp16

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
  • Loading branch information
dashstander and github-actions committed May 18, 2023
1 parent 1a43a58 commit 03f4f77
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 18 deletions.
7 changes: 2 additions & 5 deletions configs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ N.B - `OneBitAdam` requires you to use deepspeed's internal lr scheduler because
Checkpointing works by trading compute for memory. Rather than storing all intermediate activations of the entire computation graph for computing backward, the checkpointed part does not save intermediate activations, and instead recomputes them in backward pass.

### Mixed Precision Training Settings:
gpt-neox's mixed precision training is configured identically to DeepSpeed's, please see [their documentation](https://www.deepspeed.ai/docs/config-json/#fp16-training-options) for more information.
gpt-neox's fp16 training is configured identically to DeepSpeed's, please see [their documentation](https://www.deepspeed.ai/docs/config-json/#fp16-training-options) for more information.
An example config for fp16 training:

```yaml
Expand All @@ -272,7 +272,7 @@ An example config for fp16 training:
},
```

To train in fp32, simply set `fp16["enabled"]` to `false`.
Alternatively you can use the `precision` config which can be set to `fp16`, `bfloat16`, or `fp32`. If you set `"precision": "fp16"` without adding a `"fp16": {...}` dict, then it will simply use DeepSpeed's defaults for fp16 training.


### SLURM Settings
Expand Down Expand Up @@ -312,6 +312,3 @@ To make this JSON just remove the comment and use all lowercase for the boolean:
"comm_backend_name": "nccl"
}
```


** TODO: bf16 docs **
10 changes: 1 addition & 9 deletions configs/bf16_125M.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,15 +57,7 @@
"hidden_dropout": 0.0,
"attention_dropout": 0.0,

# precision settings
"fp16": {
"enabled": true,
"type": "bfloat16", # set bf16 as precision
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"precision": "bfloat16",

"fp32_allreduce": True, # without a patch to torch, bf16 models have to do the allreduce in fp32
# misc. training settings
Expand Down
2 changes: 1 addition & 1 deletion configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Logging Arguments

- **git_hash**: str

Default = b130d58
Default = 83e820c

current git hash of repository

Expand Down
6 changes: 3 additions & 3 deletions megatron/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -625,7 +625,7 @@ def setup_model_and_optimizer(neox_args, use_cache=False, iteration=None):
dist_init_required=False,
model_parameters=_model_params,
# Need to remove the below so that it doesn't conflict with --deepspeed_config required by autotuning
#config_params=neox_args.deepspeed_config,
# config_params=neox_args.deepspeed_config,
mpu=mpu if not neox_args.is_pipe_parallel else None,
)
model.total_params = get_total_params(model.module)
Expand Down Expand Up @@ -792,8 +792,8 @@ def train(
)
iteration += 1
neox_args.iteration = iteration

overflow_monitor.check(skipped_iter) # check for repeated overflow
if neox_args.precision == "fp16":
overflow_monitor.check(skipped_iter) # check for repeated overflow
if neox_args.log_gradient_noise_scale: # log noise scale if applicable
noise_scale_logger.update()

Expand Down

0 comments on commit 03f4f77

Please sign in to comment.