-
Notifications
You must be signed in to change notification settings - Fork 976
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* WIP: Add support for Maximal Update Parametrization and Hyperparameter Transfer (mup) * Update to use MuAdam and MuSGD, fix minor errors * Fix more errors with arguments * Fix error caused by not calling to_sequential on delta model * Update NeoXArgs docs automatically * Address PR feedback * Fix minor error * Update NeoXArgs docs automatically * Revert small.yml config * Update NeoXArgs docs automatically * Reinitialize weights using mup's replacements after set_base_shapes is called * Update NeoXArgs docs automatically * Implement rescale parameters on the output layer, adjust learning rate based on width * Update NeoXArgs docs automatically * Remove debug prints * Update NeoXArgs docs automatically * Add preliminary support for coord check (WIP: not yet functional in this commit) * Update NeoXArgs docs automatically * Add untracked file from last commit * Update NeoXArgs docs automatically * Update for coord check plots * Update NeoXArgs docs automatically * Add all but one (and a half) of the new hyperparameters from the zero-shot hp transfer paper * Update NeoXArgs docs automatically * Add last mup HP * Add mup readme file * Update NeoXArgs docs automatically * Revert changes to configs/small.yml * Update NeoXArgs docs automatically * Update README-MUP.md * Update NeoXArgs docs automatically * Clean up code for PR * Update NeoXArgs docs automatically * Make mup import optional * Update NeoXArgs docs automatically * Revert "Update NeoXArgs docs automatically" This reverts commit a7b97fd. * Update NeoXArgs docs automatically * Revert "Update NeoXArgs docs automatically" This reverts commit 8161a56. * Update NeoXArgs docs automatically * Add neox arg for mup delta model width scale * Update NeoXArgs docs automatically Co-authored-by: Nick Sarkauskas <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Stella Biderman <[email protected]> Co-authored-by: Quentin-Anthony <[email protected]>
- Loading branch information
1 parent
38f4ede
commit 0535bfb
Showing
11 changed files
with
785 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# How to use Mup (https://github.com/microsoft/mup) | ||
|
||
## Add mup neox args to your config | ||
|
||
``` | ||
# mup | ||
"use-mup": true, | ||
"save-base-shapes": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank | ||
"base-shapes-file": "base-shapes", # load base shapes from this file | ||
"coord-check": false, # generate coord check plots to verify mup's implementation in neox | ||
# mup hp search | ||
"mup-init-scale": 1.0, | ||
"mup-attn-temp": 1.0, | ||
"mup-output-temp": 1.0, | ||
"mup-embedding-mult": 1.0, | ||
"mup-rp-embedding-mult": 1.0, | ||
``` | ||
|
||
## Generate base shapes | ||
|
||
1. Set use-mup to true | ||
2. Set save-base-shapes to true | ||
3. Run once. gpt-neox will instantiate a base model and a delta model, then save one file per rank named <base-shapes-file>.<rank>. gpt-neox will exit immediately. | ||
4. Set save-base-shapes to false | ||
|
||
## Generate coord check plots (optional) | ||
|
||
1. Keep use-mup true | ||
2. Set coord-check to true | ||
3. Run once. gpt-neox will output jpg images similar to https://github.com/microsoft/mutransformers/blob/main/README.md#coord-check. gpt-neox will exit immediately | ||
4. Set coord-check to false | ||
|
||
## Tune mup hyperparameters and LR | ||
|
||
The values under `mup hp search` were added and correspond to appendix F.4 from https://arxiv.org/pdf/2203.03466.pdf. These and LR are tuned with a random search using the scaled-up config (tested with 6-7B.yml) but with hidden-size set to the value from the scaled-down config (small.yml). | ||
|
||
## Transfer | ||
|
||
With the best LR set and the best mup HPs set, revert the value of hidden-size in the scaled-up config and run again. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.