Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[muP] Rework #1087

Draft
wants to merge 107 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
0d921f7
changed ordering for setting up norm_factor
lintangsutawika Dec 1, 2023
abee54d
Update NeoXArgs docs automatically
invalid-email-address Dec 1, 2023
a08c3ef
updated muP args to the minimum required
lintangsutawika Dec 1, 2023
c35e830
calculate m_width
lintangsutawika Dec 1, 2023
2807e52
Merge branch 'main' of https://github.com/EleutherAI/gpt-neox into re…
lintangsutawika Dec 1, 2023
2d127df
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika Dec 1, 2023
81fdc4d
Update NeoXArgs docs automatically
invalid-email-address Dec 1, 2023
7d6b246
changed ordering for setting up norm_factor
lintangsutawika Dec 1, 2023
a0d1929
updated muP args to the minimum required
lintangsutawika Dec 1, 2023
d63b3b8
calculate m_width
lintangsutawika Dec 1, 2023
9be82fe
Update NeoXArgs docs automatically
invalid-email-address Dec 1, 2023
66214d9
removed redundant line
lintangsutawika Dec 1, 2023
17b7183
removed redundant lines
lintangsutawika Dec 1, 2023
a6bad07
Update NeoXArgs docs automatically
invalid-email-address Dec 1, 2023
63984bd
removed redundant lines
lintangsutawika Dec 1, 2023
02687a8
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika Dec 1, 2023
11114e2
Update NeoXArgs docs automatically
invalid-email-address Dec 1, 2023
05c4de3
modify init with mup
lintangsutawika Dec 1, 2023
71a91e4
divide logits by the m_width
lintangsutawika Dec 1, 2023
99c8ce0
moved position of mup parameters being processed
lintangsutawika Dec 1, 2023
b253ab6
add note
lintangsutawika Dec 1, 2023
1919499
made param groups to hold flag for mup scaling
lintangsutawika Dec 6, 2023
17678e0
lr scale
lintangsutawika Dec 6, 2023
2bd5ae6
update config
lintangsutawika Dec 6, 2023
6642291
adjust process of mup variables
lintangsutawika Dec 6, 2023
8be6c66
remove calling save_base_shapes
lintangsutawika Dec 18, 2023
c9fb18b
lr adjustments is done in train_step to address lr being reset due to…
lintangsutawika Dec 18, 2023
795371c
lr scaling for mup is moved here instead
lintangsutawika Dec 18, 2023
087beee
removed mup usage for coord check
lintangsutawika Jan 3, 2024
16d04b1
merged with main
lintangsutawika Jan 3, 2024
e7b7bf6
latest update on coord check implementation
lintangsutawika Jan 24, 2024
8dea9ce
fix merge conflict
lintangsutawika Feb 2, 2024
3664eba
changed `mup_m_width` to `mup_width_multiplier`
lintangsutawika Feb 2, 2024
6a46247
fixed notations
lintangsutawika Feb 2, 2024
7439f9a
correct scale
lintangsutawika Feb 2, 2024
5b2d31c
m_emb * embed(X)
lintangsutawika Feb 2, 2024
98caa82
removed mup rescale in the layers
lintangsutawika Feb 2, 2024
5c99637
removed mup rescale in the layers
lintangsutawika Feb 2, 2024
a636f06
adjust mup_m_emb to mup_embedding_multiplier
lintangsutawika Feb 2, 2024
39190c5
add multiplier mup_output_multiplier
lintangsutawika Feb 20, 2024
2489cc0
reorder model loading
lintangsutawika Feb 20, 2024
23b8776
removed comments
lintangsutawika Feb 20, 2024
10e935e
removed comments
lintangsutawika Feb 20, 2024
a0aca99
implement full process
lintangsutawika Feb 20, 2024
9472b35
set neox_args.iteration to 0 for coord_check mode
lintangsutawika Feb 21, 2024
5c5f2df
move mup_width_multiplier init
lintangsutawika Feb 21, 2024
7eca3e7
mup_coord_check returns 2 df
lintangsutawika Feb 21, 2024
c9a3a65
can run
lintangsutawika Feb 21, 2024
a7877d4
remove commehts
lintangsutawika Feb 22, 2024
bd9d399
add hooks
lintangsutawika Feb 22, 2024
fe180d3
remove comments
lintangsutawika Feb 22, 2024
b240c19
uncomment activation data
lintangsutawika Feb 22, 2024
93b4241
plot coords
lintangsutawika Feb 22, 2024
d4899fc
removed variables, add way to plot only from rank 0
lintangsutawika Feb 22, 2024
f589e29
changed key name in dict
lintangsutawika Feb 22, 2024
8261e0d
remove print
lintangsutawika Feb 22, 2024
25aa786
fix how width_multiplier is applied
lintangsutawika Feb 22, 2024
4d246a1
updated plot config
lintangsutawika Feb 22, 2024
84c5380
update files
lintangsutawika Feb 26, 2024
b2f1101
Merge branch 'main' into rework-mup
lintangsutawika Feb 26, 2024
42d4cde
Update NeoXArgs docs automatically
invalid-email-address Feb 26, 2024
4c477d5
init function, add input embedding different initialization
lintangsutawika Feb 27, 2024
64dc4c5
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika Feb 27, 2024
65c103e
changeoutput layer to normal
lintangsutawika Feb 27, 2024
08b5d40
change from mean to std
lintangsutawika Feb 27, 2024
2ca94a8
double attention head for every hidden size doubled
lintangsutawika Feb 27, 2024
7483246
Merge branch 'main' into rework-mup
lintangsutawika Feb 27, 2024
497485c
Update NeoXArgs docs automatically
invalid-email-address Feb 27, 2024
34fb7ca
added args
lintangsutawika Feb 27, 2024
2d53f1f
simplify coordcheck
lintangsutawika Feb 27, 2024
7897610
seperate sp and mup configs
lintangsutawika Feb 27, 2024
4f39209
perform coordcheck for sp and mup seperately
lintangsutawika Feb 27, 2024
5f84a3f
Update NeoXArgs docs automatically
invalid-email-address Feb 27, 2024
479b854
update
lintangsutawika Feb 28, 2024
21a7e32
update how params are sorted
lintangsutawika Feb 28, 2024
bb2e0c9
remove unused comments
lintangsutawika Feb 28, 2024
bf1ce06
adjust
lintangsutawika Feb 29, 2024
50a3dba
simplify
lintangsutawika Feb 29, 2024
c4c1660
fix mup embedding multiplier
lintangsutawika Feb 29, 2024
1c35911
embeddingpipe fix init
lintangsutawika Feb 29, 2024
84be4d4
changed how manual seed is loaded
lintangsutawika Feb 29, 2024
fbb4daf
removed musgd and other changces
lintangsutawika Feb 29, 2024
fa142ff
update config
lintangsutawika Feb 29, 2024
ad2336f
fixed how params are sorted
lintangsutawika Feb 29, 2024
fe73bc3
update how seed is computed
lintangsutawika Feb 29, 2024
a3bd44c
update to follow pre-commit format
lintangsutawika Feb 29, 2024
56b6c9b
update from main
lintangsutawika Feb 29, 2024
2365fd5
update
lintangsutawika Feb 29, 2024
e8639a0
Update NeoXArgs docs automatically
invalid-email-address Feb 29, 2024
47e1438
fix lr weighting
lintangsutawika Mar 5, 2024
a064f9b
hard set to 1.0 if neox_args.use_mup is false
lintangsutawika Mar 5, 2024
b0da27a
Merge branch 'main' into rework-mup
Quentin-Anthony Apr 21, 2024
6fe55f4
Update NeoXArgs docs automatically
invalid-email-address Apr 21, 2024
8bf8bcd
add new parameters
lintangsutawika May 2, 2024
7f0b033
add parameter checks
lintangsutawika May 2, 2024
f802869
updates to argument processing for mup
lintangsutawika May 2, 2024
cc71104
add data save and descriptions being printed
lintangsutawika May 2, 2024
c8feb39
update mup
lintangsutawika May 2, 2024
b6b3a02
update seed
lintangsutawika May 2, 2024
847e892
remove print text
lintangsutawika May 2, 2024
1b0027c
fixed kv
lintangsutawika May 2, 2024
055596f
update
lintangsutawika May 2, 2024
fabb45b
update dewcriptions being printed
lintangsutawika May 2, 2024
5ccf693
removed unused lines
lintangsutawika May 2, 2024
9dd583b
Merge branch 'rework-mup' of https://github.com/EleutherAI/gpt-neox i…
lintangsutawika May 2, 2024
6a8ad71
Merge branch 'main' into rework-mup
lintangsutawika May 2, 2024
485cad4
Update NeoXArgs docs automatically
invalid-email-address May 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Prev Previous commit
Next Next commit
lr adjustments is done in train_step to address lr being reset due to…
… lr_scheduling
  • Loading branch information
lintangsutawika committed Dec 18, 2023
commit c9fb18ba12b8974f9310e5094e75317c71666192
11 changes: 5 additions & 6 deletions megatron/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -585,12 +585,6 @@ def get_optimizer(model, neox_args):
else:
raise ValueError(f"Optimizer type {neox_args.optimizer_type} not recognized")

# This is where the LR scaling is applied
if neox_args.use_mup:
for pg in optimizer.param_groups:
if ("lr_adjust" in pg) and pg["lr_adjust"] is True:
pg["lr"] /= neox_args.mup_m_width

if neox_args.deepspeed:
# fp16 wrapper is not required for DeepSpeed.
return optimizer, param_groups
Expand Down Expand Up @@ -729,6 +723,11 @@ def backward_step(neox_args, timers, optimizer, model, loss):
def train_step(neox_args, timers, data_iterator, model, optimizer, lr_scheduler):
"""Single training step."""

if neox_args.use_mup:
for pg in optimizer.param_groups:
if ("lr_adjust" in pg) and pg["lr_adjust"] is True:
pg["lr"] /= neox_args.mup_m_width

# Pipeline parallelism schedules forward/backward/step
if neox_args.is_pipe_parallel:
reduced_loss = train_step_pipe(
Expand Down