Clean up Neox configuration #132

joshlk · 2021-02-21T13:52:22Z

Clean up neox configuration so config files can be used instead of a mishmash of files, command line args and enviroment variables.

Aim:

All parameters can be set using passed json files
No parameters are repeated
Modify megatron's codebase as little as possible to make it easier to merge upstream megatron changes in the future.

Nice to haves:

JSON schema files
Single config documentation

Todo:

Convert all examples and configs to new configuration
Create config documentation with all possible parameters
Separate configs into: model and system
Cast numbers to numbers in JSON (suggested by @StellaAthena)
Calculate batch size from other parms (micro_batch_per_gpu*GAS*n_gpus)

joshlk · 2021-02-22T19:51:02Z

Created a draft implementation. Example usage:

./deepy.py pretrain_gpt2.py -d configs ds_pretrain_gpt2.yml eleutherai_cluster.yml

configs is the directory the config files are in
ds_pretrain_gpt2.yml, eleutherai_cluster.yml are two JSON-ish (ok they are YAML but its basically JSON with comments) files. deepy takes all config files and merges then (so we can separate out the parameters how we want). Currently eleutherai_cluster.yaml has system stuff and ds_pretrain_gpt2.yml is to do with the model

The parameters in the yaml files are automatically separated into the DS runner, Megatron and DS config file parts. They are then converted into the "old" format and provided to the scripts. I wanted to do it this was so that I made as little changes as possible to the megatron codebase - making it easier in the future to merge upstream changes.

Some parameters are also automatically derived, such as the megatron "fp16" param from the DS runner "fp16" param.

ShivanshuPurohit · 2021-02-22T20:17:29Z

configs/ds_pretrain_gpt2.yml

+   "num-attention-heads":"16",
+   "seq-length":"1024",
+   "max-position-embeddings":"1024",
+   "batch-size":"9",


It’s taken from the corresponding examples script: examples/ds_pretrain_gpt2.sh

It was mentioned in megatron keys but not defined.

joshlk · 2021-02-23T09:32:05Z

./deepy.py pretrain_gpt2.py -d configs ds_pretrain_gpt2.yml eleutherai_cluster.yml

The example above exactly replicates the parameters used in examples/ds_pretrain_gpt2.sh - try it for yourself. It is not intended to show all possible configurations. I can create such a config later. @ShivanshuPurohit I am going to undo your commit as pipe-parallel-size isn't used in the original example.

ShivanshuPurohit · 2021-02-23T09:41:40Z

I see. No problem. I thought all the keys were supposed to be initialized.

…

On Tue, Feb 23, 2021, 3:02 PM Josh Levy-Kramer ***@***.***> wrote: ./deepy.py pretrain_gpt2.py -d configs ds_pretrain_gpt2.yml eleutherai_cluster.yaml The example above exactly replicates the parameters used in examples/ds_pretrain_gpt2.sh - try it for yourself. It is not intended to show all possible configurations. I can create such a config later. @ShivanshuPurohit <https://github.com/ShivanshuPurohit> I am going to undo your commit as pipe-parallel-size isn't used in the original example. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#132 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKHCCSL42WBKJFAKJE25SRLTANY2JANCNFSM4X7AF65Q> .

configs/ds_pretrain_gpt2.yml

# Conflicts: # megatron/config_monster.py

sdtblck · 2021-02-26T21:56:02Z

Okay, deduplicated the remaining params.

Zero parameters should be set deepspeed style, like so:

   "zero_optimization": {
    "stage": 0,
    "allgather_partitions": True,
    "allgather_bucket_size": 500000000,
    "overlap_comm": False,
    "reduce_scatter": True,
    "reduce_bucket_size": 500000000,
    "contiguous_gradients": False,
    "cpu_offload": False
},

same with optimizer params, like so:

     "type": "Adam",
     "params": {
       "lr": 0.00015,
       "max_grad_norm": 1.0,
       "betas": [0.9, 0.95]
     }
   }

(options are "adam", "onebitadam", "cpu_adam", "cpu_torch_adam")

gradient clipping should be set with "gradient_clipping" instead of clip-grads

and i think that's about it.

Josh and I figured out the batch size related problems - so when doing model parallel world_size according to deepspeed is actually dp_parallel_group_size
so if you have model paralellism (either pipeline or model) of 2, and a world size of 16, deepspeed engine sets the world size to 8.
Before we weren't setting all three params (train_batch, g.a.s, train_microbatch_per_gpu) in conjunction, so never ran into the error.

Should be ready to merge now imo - maybe would be good to get solid documentation first though, to avoid confusion

StellaAthena · 2021-02-27T14:55:09Z

@sdtblck looks a lot better! Do you think there are other params that would be worth bundling together, deepspeed-style? I mostly have the checkpointing args in mind here, I think.

sdtblck · 2021-02-27T15:29:38Z

@StellaAthena i think whether we do checkpointing args deepspeed style or not is inconsequential, really. But i can set it up that way if you think it'd be more user friendly.

sdtblck · 2021-02-23T15:37:26Z

configs/ds_pretrain_gpt2.yml

+   "train_batch_size": 224,
+   "train_micro_batch_size_per_gpu": 4,
+   "steps_per_print": 10,
+   "optimizer": {


this doesn't actually do anything bc the optimizer is initialized within the megatron code. I believe there is an optimizer arg in megatron/arguments.py - the only time we need the 'optimizer' in the deepspeed config is when we're using onebitadam, because in that case the optimizer has to be initialized within the deepspeed code because reasons.

We should find a cleaner way to do this

sdtblck · 2021-02-23T15:38:11Z

configs/ds_pretrain_gpt2.yml

+       "betas": [0.9, 0.95]
+     }
+   },
+   "gradient_clipping": 1.0,


duplicate of clip-grad

Create scrip to convert args to new config file

e760bb0

StellaAthena linked an issue Feb 22, 2021 that may be closed by this pull request

Fix our configs #133

Closed

joshlk added 17 commits February 22, 2021 19:00

Make executable

51b3fb9

Use input JSON/YAML files instead of CL params

60011a1

Tool to convert old style to new style args

e232e79

Example config for

df51089

Remove temp file

9f48cb2

Split out system config

2339cf1

Test

ae27a81

Test

2f445f6

Tool to convert old style to new style

0f4b81d

Tool to convert old style to new style

d3e581c

Tool to convert old style to new style

39879f6

Tool to convert old style to new style

6213940

Test

06eb5c8

Test

d84a01e

Write JSON to temp file

4ce3895

Write JSON to temp file

6195b68

Directly load JSON

c121025

ShivanshuPurohit reviewed Feb 22, 2021

View reviewed changes

joshlk and others added 4 commits February 22, 2021 20:36

Remove tests

99bc657

Cast strings as int and float when appropriate

7f1c4d1

Cleaned up a bit

628ed2e

define pipeline parallelism

2b6d45c

It was mentioned in megatron keys but not defined.

joshlk commented Feb 23, 2021

View reviewed changes

configs/ds_pretrain_gpt2.yml Outdated Show resolved Hide resolved

joshlk added 2 commits February 23, 2021 09:46

Remove pipe-parallel-size from example. Use only true/false

93c33bd

TODO

c5e9806

sid added 4 commits February 26, 2021 22:47

deduplicate remaining args

154bfb4

Merge remote-tracking branch 'origin/config_monster' into config_monster

bd6a4b5

# Conflicts: # megatron/config_monster.py

move batch_params_fn

0dc6ebb

delete scheduler defaults (unused)

b69b274

sdtblck added 3 commits February 26, 2021 23:09

~minor fixes~

7109f2b

~minor fixes~

d061fe9

Update config_monster.py

a17f91d

ShivanshuPurohit and others added 2 commits February 27, 2021 20:38

Update config_monster.py

027cb15

fix default optimizer params

e10c3e0

ShivanshuPurohit and others added 11 commits February 28, 2021 00:18

Adding the T5 rpe args

581f9b9

Add T5 rpe args

9313685

Add T5 rpe args

47d0523

Update transformer.py

b94ec21

Update transformer.py

6863bd2

adding T5 args

6a41910

unmatched brackets

9ddacb7

Update config_monster.py

a49e35f

undoing whatever I did

737e1f2

Create config_monster.py

c04f818

revert rpe stuffs (will merge in a separate PR)

02ec028

sdtblck approved these changes Feb 27, 2021

View reviewed changes

sdtblck marked this pull request as ready for review February 27, 2021 20:35

sdtblck requested a review from a team as a code owner February 27, 2021 20:35

sdtblck requested a review from StellaAthena February 27, 2021 20:35

sdtblck approved these changes Feb 27, 2021

View reviewed changes

sdtblck merged commit 0df36b2 into main Feb 27, 2021

sdtblck deleted the config_monster branch February 27, 2021 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up Neox configuration #132

Clean up Neox configuration #132

joshlk commented Feb 21, 2021 •

edited by EricHallahan

Loading

joshlk commented Feb 22, 2021 •

edited

Loading

ShivanshuPurohit Feb 22, 2021

joshlk Feb 22, 2021 •

edited

Loading

joshlk commented Feb 23, 2021 •

edited

Loading

ShivanshuPurohit commented Feb 23, 2021 via email

sdtblck commented Feb 26, 2021 •

edited

Loading

StellaAthena commented Feb 27, 2021

sdtblck commented Feb 27, 2021

sdtblck Feb 23, 2021

sdtblck Feb 23, 2021

Clean up Neox configuration #132

Clean up Neox configuration #132

Conversation

joshlk commented Feb 21, 2021 • edited by EricHallahan Loading

joshlk commented Feb 22, 2021 • edited Loading

ShivanshuPurohit Feb 22, 2021

Choose a reason for hiding this comment

joshlk Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

joshlk commented Feb 23, 2021 • edited Loading

ShivanshuPurohit commented Feb 23, 2021 via email

sdtblck commented Feb 26, 2021 • edited Loading

StellaAthena commented Feb 27, 2021

sdtblck commented Feb 27, 2021

sdtblck Feb 23, 2021

Choose a reason for hiding this comment

sdtblck Feb 23, 2021

Choose a reason for hiding this comment

joshlk commented Feb 21, 2021 •

edited by EricHallahan

Loading

joshlk commented Feb 22, 2021 •

edited

Loading

joshlk Feb 22, 2021 •

edited

Loading

joshlk commented Feb 23, 2021 •

edited

Loading

sdtblck commented Feb 26, 2021 •

edited

Loading