Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft PR Adding mistral 0.1 #1131

Merged
merged 69 commits into from
Feb 23, 2024
Merged
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
ab38f60
add support for flash attention 2
zhangir-azerbayev Aug 9, 2023
840c09f
change cosine decay to chinchilla style
zhangir-azerbayev Aug 9, 2023
ae26360
set default warmup to none so that warmup_iters can be set
zhangir-azerbayev Aug 9, 2023
bf4cab5
fixed bug
zhangir-azerbayev Aug 9, 2023
ff86462
fixed chinchilla lr
zhangir-azerbayev Aug 9, 2023
757320b
add s3 checkpoint syncing
haileyschoelkopf Aug 9, 2023
a765819
Merge branch 'add-s3-ckpting' into math-lm-2
haileyschoelkopf Aug 9, 2023
8a11029
rotary embedding in fp32
zhangir-azerbayev Aug 10, 2023
d869e47
fix for seq_len < max_seq_len
zhangir-azerbayev Aug 10, 2023
52ba5e4
some fixes, still not working
zhangir-azerbayev Aug 11, 2023
dfedf05
?'
zhangir-azerbayev Aug 11, 2023
e680f68
Merge branch 'fix_rotary_precision' into math-lm-2-rotary
zhangir-azerbayev Aug 12, 2023
fcbd8a1
fix bugs; evaluate on step 0
zhangir-azerbayev Aug 13, 2023
416bafa
Merge branch 'math-lm-2' into math-lm-2-rotary
haileyschoelkopf Aug 13, 2023
334bbd5
first attempt at gqa
zhangir-azerbayev Aug 23, 2023
3c8616f
gqa works in kv_heads==query_heads case
haileyschoelkopf Sep 8, 2023
e59c873
gqa working
haileyschoelkopf Sep 15, 2023
801192e
workaround for FSX quota
haileyschoelkopf Sep 15, 2023
e52b749
update with llemma
zhangir-azerbayev Oct 5, 2023
6bc724b
update with recent PR
zhangir-azerbayev Oct 5, 2023
48d394e
README and requirements updated
AIproj Oct 25, 2023
694bc7f
Added Mistral config
AIproj Oct 25, 2023
612de29
Added sliding window through flash attention 2
AIproj Oct 25, 2023
9bd58f1
Added sliding window
AIproj Oct 25, 2023
d5d90dc
Mistral should likely use mp=2 like llama2
AIproj Oct 27, 2023
67638e1
Update gitignore
AIproj Nov 1, 2023
b521408
Removed unused CPCargo import
AIproj Nov 1, 2023
c842ea9
Conversion script (WIP)
AIproj Nov 1, 2023
aa50fd1
Fixed missing slurm environ vars
AIproj Nov 1, 2023
6a86310
updated mistral config
AIproj Nov 8, 2023
b5f2c6a
updated job script
AIproj Nov 8, 2023
44e0397
initial commit conversion mistral hf to sequential
AIproj Nov 8, 2023
fa71c63
Added stacking q, k, v appropriately for mp ranks
AIproj Nov 12, 2023
e263367
pp=0 support from end of 2023
AIproj Jan 20, 2024
bcfb279
Cleaning up config and removing Autoconfig in conversion script
AIproj Jan 25, 2024
753ef0f
Cleaned up conversion example script
AIproj Jan 25, 2024
3488dae
cleanup: add back configs folder, discard Llemma readme
haileyschoelkopf Feb 13, 2024
a55d69c
cleanup: remove llemma lr sched changes, re-add requirements/ folder
haileyschoelkopf Feb 13, 2024
a521a82
docs: add explanation of intermediate_size behavior
haileyschoelkopf Feb 13, 2024
4df0c4e
args: add argument checking for num_kv_heads, clean up usage syntax
haileyschoelkopf Feb 13, 2024
beb66d4
args: prevent num KV heads < TP worldsize
haileyschoelkopf Feb 13, 2024
08f80fe
readd triton flash attn func
haileyschoelkopf Feb 13, 2024
7325880
cleanup: use tools/ dir from main
haileyschoelkopf Feb 13, 2024
9b2331f
docs: re-add mistral , GQA as supported
haileyschoelkopf Feb 13, 2024
d6accd8
cleanup: delete duplicate tools/ files
haileyschoelkopf Feb 13, 2024
975d8f8
cleanup: use fp32 rope (non-fused) from main
haileyschoelkopf Feb 13, 2024
23b7577
cleanup: no longer block out GQA codepaths in conversion scripts
haileyschoelkopf Feb 13, 2024
9704976
Merge branch 'main' into adding-mistral-0.1
Quentin-Anthony Feb 14, 2024
54135b4
cleanup: gqa code a bit
haileyschoelkopf Feb 14, 2024
594d926
add llama2, llemma configs
haileyschoelkopf Feb 14, 2024
0827bb8
add non-flash GQA ; refactor modeling code
haileyschoelkopf Feb 21, 2024
558bdd8
clean up mistral config for commit
haileyschoelkopf Feb 21, 2024
726935f
further cleanup configs dir
haileyschoelkopf Feb 21, 2024
4cec223
remove slurm script from llemma
haileyschoelkopf Feb 21, 2024
eca632d
update seqlen params for codellama, llemma configs
haileyschoelkopf Feb 21, 2024
b07e63a
add more comments to GQA code, and make reshapes more readable
haileyschoelkopf Feb 21, 2024
f0dcf17
make inv_freq non-persistent
haileyschoelkopf Feb 22, 2024
95afe82
actually, just ensure mistral has inv_freqs as a persistent buffer
haileyschoelkopf Feb 22, 2024
5cfe8ee
non-flash GQA works, so ensure arguments.py permits it
haileyschoelkopf Feb 22, 2024
627a287
no longer use our own copies of flash attention interface functions
haileyschoelkopf Feb 22, 2024
63c2fbe
remove unused mpu util fn
haileyschoelkopf Feb 22, 2024
e768492
delete unused config file
haileyschoelkopf Feb 22, 2024
caa440d
fix diff on mpu/utils.py
haileyschoelkopf Feb 22, 2024
74fde98
remove slurm scripts that won't be in this PR
haileyschoelkopf Feb 22, 2024
e7d1282
run pre-commit
haileyschoelkopf Feb 22, 2024
50ed9b5
Merge remote-tracking branch 'upstream/main' into adding-mistral-0.1
haileyschoelkopf Feb 22, 2024
ace0e94
update tests for conversion scripts
haileyschoelkopf Feb 22, 2024
157ec47
add flash version check for sliding window
Quentin-Anthony Feb 23, 2024
db9947e
pre-commit
Quentin-Anthony Feb 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix diff on mpu/utils.py
  • Loading branch information
haileyschoelkopf committed Feb 22, 2024
commit caa440db5d2501de035d616a8c7a7219d5290408
4 changes: 2 additions & 2 deletions megatron/mpu/utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) 2021, EleutherAI
# Copyright (c) 2024, EleutherAI
# This file is based on code by the authors denoted below and has been modified from its original version.
#
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Loading