Latest DeepSpeed Support #663

Quentin-Anthony · 2022-09-02T17:25:49Z

Note: we will not merge this unless we decide to get rid of DeeperSpeed

This branch completely does away with DeeperSpeed, and instead is based on upstream DeepSpeed. It doesn't take many gpt-neox changes to do this, but we lose some of the DeeperSpeed features. Feel free to use this branch unless your gpt-neox code explicitly relies on DeeperSpeed features.

Tested with:

StellaAthena · 2022-09-04T17:55:34Z

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

Quentin-Anthony · 2022-09-09T15:58:15Z

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed. I've already pushed the major features into upstream DeepSpeed.

My thoughts are that most gpt-neox users don't need/rely on these features and can switch to the latest DeepSpeed.

StellaAthena · 2022-09-18T15:01:18Z

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed. I've already pushed the major features into upstream DeepSpeed.

My thoughts are that most gpt-neox users don't need/rely on these features and can switch to the latest DeepSpeed.

The only thing I disagree with here is the detailed timers, which I and I think many others find quite useful. Would there be an easy way to make them part of GPT-NeoX as opposed to DeeperSpeed?

Quentin-Anthony · 2022-09-19T17:56:10Z

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed. I've already pushed the major features into upstream DeepSpeed.
My thoughts are that most gpt-neox users don't need/rely on these features and can switch to the latest DeepSpeed.

The only thing I disagree with here is the detailed timers, which I and I think many others find quite useful. Would there be an easy way to make them part of GPT-NeoX as opposed to DeeperSpeed?

No there's no way to bring those out of DeeperSpeed. Should we update the DeeperSpeed main branch to just be the DeepSpeed main branch, but with timers (throwing everything else away)? We'd have to update it periodically, but merges would be pretty simple that way. I think bringing these timers into upstream DeepSpeed would be a hard sell.

jamesthesnake · 2022-09-20T23:12:55Z

Who would do the selling though?

Quentin-Anthony · 2022-09-21T12:26:39Z

Who would do the selling though?

Us to the DeepSpeed team. I'm saying it would be difficult to convince them that these timers are needed when they already have the FLOPs profiler and communication logger.

Signed-off-by: Dashiell Stander <[email protected]>

* WIP: Add support for Maximal Update Parametrization and Hyperparameter Transfer (mup) * Update to use MuAdam and MuSGD, fix minor errors * Fix more errors with arguments * Fix error caused by not calling to_sequential on delta model * Update NeoXArgs docs automatically * Address PR feedback * Fix minor error * Update NeoXArgs docs automatically * Revert small.yml config * Update NeoXArgs docs automatically * Reinitialize weights using mup's replacements after set_base_shapes is called * Update NeoXArgs docs automatically * Implement rescale parameters on the output layer, adjust learning rate based on width * Update NeoXArgs docs automatically * Remove debug prints * Update NeoXArgs docs automatically * Add preliminary support for coord check (WIP: not yet functional in this commit) * Update NeoXArgs docs automatically * Add untracked file from last commit * Update NeoXArgs docs automatically * Update for coord check plots * Update NeoXArgs docs automatically * Add all but one (and a half) of the new hyperparameters from the zero-shot hp transfer paper * Update NeoXArgs docs automatically * Add last mup HP * Add mup readme file * Update NeoXArgs docs automatically * Revert changes to configs/small.yml * Update NeoXArgs docs automatically * Update README-MUP.md * Update NeoXArgs docs automatically * Clean up code for PR * Update NeoXArgs docs automatically * Make mup import optional * Update NeoXArgs docs automatically * Revert "Update NeoXArgs docs automatically" This reverts commit a7b97fd. * Update NeoXArgs docs automatically * Revert "Update NeoXArgs docs automatically" This reverts commit 8161a56. * Update NeoXArgs docs automatically * Add neox arg for mup delta model width scale * Update NeoXArgs docs automatically Co-authored-by: Nick Sarkauskas <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Stella Biderman <[email protected]> Co-authored-by: Quentin-Anthony <[email protected]>

Add flash attention

…nt for using the SlurmRunner Signed-off-by: Dashiell Stander <[email protected]>

anthony.301 added 2 commits September 2, 2022 12:12

Latest DeepSpeed initial patches

3327364

Pass args fix and update reqs

4f14166

Quentin-Anthony requested a review from a team as a code owner September 2, 2022 17:25

Quentin-Anthony requested review from StellaAthena and sweinbach September 2, 2022 17:25

Update NeoXArgs docs automatically

411fad7

Quentin-Anthony marked this pull request as draft September 2, 2022 17:26

Quentin-Anthony and others added 7 commits September 9, 2022 11:59

Merge branch 'main' into deepspeed_main

c690439

Update NeoXArgs docs automatically

f1fc338

Remove deprecated deepspeed.utils.distributed call

5e7dae3

Update NeoXArgs docs automatically

cb76a29

Update NeoXArgs docs automatically

d3cc9cd

Merge branch 'main' into deepspeed_main

b012e50

Update NeoXArgs docs automatically

0387abf

This was linked to issues Sep 18, 2022

Add Mixture of Experts #479

Closed

Support ZeRO-Infinity #244

Closed

StellaAthena mentioned this pull request Sep 18, 2022

ModuleAttributeError: 'DeepSpeedEngine' object has no attribute 'is_pipe_parallel' #644

Closed

Quentin-Anthony mentioned this pull request Sep 19, 2022

MoE Support #677

Draft

Quentin-Anthony and others added 2 commits September 21, 2022 08:52

Merge branch 'main' into deepspeed_main

55481a1

Update NeoXArgs docs automatically

1a2d10d

Quentin-Anthony mentioned this pull request Sep 25, 2022

Curriculum Learning Support #695

Merged

3 tasks

Quentin-Anthony and others added 3 commits November 4, 2022 13:54

Merge branch 'main' into deepspeed_main

7f4c742

Update NeoXArgs docs automatically

089ba67

Add support for Flash attention

6a072ba

dashstander added 2 commits December 6, 2022 13:09

Merge branch 'flash_main' into test_flash

145d464

Fix bug to make flash and sparse attention mutually exclusive

6ee1ecd

Signed-off-by: Dashiell Stander <[email protected]>

haileyschoelkopf mentioned this pull request Dec 8, 2022

Model ckpts from DeeperSpeed cannot be loaded using deepspeed_main/upstream DeepSpeed #732

Closed

Quentin-Anthony and others added 5 commits December 8, 2022 00:39

Merge branch 'main' into deepspeed_main

50acbdd

Update NeoXArgs docs automatically

38f4ede

Merge branch 'main' into deepspeed_main

f6a8f5d

Update NeoXArgs docs automatically

aff108d

StellaAthena added this to the Release V2 milestone Dec 20, 2022

ShivanshuPurohit added 2 commits January 10, 2023 05:15

Merge branch 'deepspeed_main' into test_flash

c68c52d

Merge pull request #766 from EleutherAI/test_flash

8dc25da

Add flash attention

dashstander mentioned this pull request Jan 21, 2023

Fine-tuning 20B model doesn't seem to work #767

Open

dashstander and others added 9 commits February 14, 2023 15:53

no_ssh_check is a new DeepSpeed command line argument that is importa…

8666165

…nt for using the SlurmRunner Signed-off-by: Dashiell Stander <[email protected]>

Merge branch 'main' into deepspeed_main

3b417f2

Update NeoXArgs docs automatically

31f8860

remove duplicate mpi4py

122360a

Merge branch 'main' into deepspeed_main

e08cf90

Update NeoXArgs docs automatically

f820e16

Update neox_arguments.md

d49acf3

Update NeoXArgs docs automatically

ad7af96

Update tokenizer.py

50d3e6c

StellaAthena merged commit 2b84f9a into main Mar 9, 2023

StellaAthena deleted the deepspeed_main branch March 9, 2023 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest DeepSpeed Support #663

Latest DeepSpeed Support #663

Quentin-Anthony commented Sep 2, 2022 •

edited

Loading

StellaAthena commented Sep 4, 2022

Quentin-Anthony commented Sep 9, 2022

StellaAthena commented Sep 18, 2022

Quentin-Anthony commented Sep 19, 2022

jamesthesnake commented Sep 20, 2022

Quentin-Anthony commented Sep 21, 2022

Latest DeepSpeed Support #663

Latest DeepSpeed Support #663

Conversation

Quentin-Anthony commented Sep 2, 2022 • edited Loading

StellaAthena commented Sep 4, 2022

Quentin-Anthony commented Sep 9, 2022

StellaAthena commented Sep 18, 2022

Quentin-Anthony commented Sep 19, 2022

jamesthesnake commented Sep 20, 2022

Quentin-Anthony commented Sep 21, 2022

Quentin-Anthony commented Sep 2, 2022 •

edited

Loading