Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #63

Closed
wants to merge 114 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
43ea51c
README Update (#1017)
StellaAthena Aug 28, 2023
2922bef
Bump transformers version and update enwik8 link (#1024)
dashstander Sep 13, 2023
960ed3d
Fix Generation with Sequential Model (#1026)
xu-song Sep 15, 2023
97e376c
Fix broken link (#1022)
StellaAthena Sep 15, 2023
7821aa7
add llama training config (#1023)
xu-song Sep 15, 2023
cdc94ee
Create README_llama.md (#1027)
Quentin-Anthony Sep 15, 2023
737c913
Rename README_llama.md to README.md (#1028)
Quentin-Anthony Sep 15, 2023
c883e8c
Add llama generation script (#1030)
xu-song Sep 15, 2023
d9166bf
Fix bf16 for zero > 0 and pipeline parallelism > 0 (#1032)
dashstander Sep 18, 2023
fcd5f92
Remove support for lazy dataset implementation (#1033)
dashstander Sep 18, 2023
70af6e8
Fix SequentialWrapper Generation (pipe_parallel_size = 0) (#1031)
xu-song Sep 18, 2023
8903a96
integrated flash attention 2 (#1035)
a663E-36z1120 Sep 20, 2023
0ce77ab
Fix register_buffer parameter (#1036)
xu-song Sep 20, 2023
444c0ef
Add flash 2.x message to README.md (#1037)
Quentin-Anthony Sep 20, 2023
f9503b7
Add s3 checkpoint syncing (#1010)
haileyschoelkopf Sep 23, 2023
390d37c
Fixed final value of linear decay lr (#1039)
foggy-frost-forest Sep 23, 2023
e431ff5
Fix final value of exponential decay lr (#1040)
Quentin-Anthony Sep 23, 2023
2ab05be
Remove the NeoX implementation of GPT2Tokenizer (#1042)
dashstander Sep 25, 2023
3bfedf4
Pre-compute RoPE embeddings in fp32 (#1041)
dashstander Sep 25, 2023
ba51ca0
Patch LR Annealing Bug (#1046)
dashstander Sep 27, 2023
5f36401
Improve FLOPS Calculation (#1044)
dashstander Sep 27, 2023
5fa85ad
adding boilerplate coverity scan to submit to public analysis (#1047)
jaimemcc-intel Sep 28, 2023
f44db66
Add section to the README detailing how to start distributed jobs (#1…
dashstander Sep 29, 2023
2c60645
Fix readme typos (#1049)
Quentin-Anthony Sep 29, 2023
b14d6f7
Update citation list (#1052)
Quentin-Anthony Sep 29, 2023
93cac79
Update CITATION.cff (#1053)
Quentin-Anthony Sep 29, 2023
7a8569f
Remove duplicated hf_config (#1054)
xu-song Oct 1, 2023
3f43f07
Organize the `tools` directory (#1055)
dashstander Oct 2, 2023
f6ac04d
Add documentation about using labelled datasets (#1056)
dashstander Oct 4, 2023
e001a04
LR scheduler fix no longer breaks inference (#1060)
dashstander Oct 17, 2023
b02d989
Lion Optimizer (#1062)
andylolu2 Oct 20, 2023
e277bc7
fix lion optimizer documentation (#1067)
jahatef Oct 31, 2023
f574f22
Fix preprocess_data.py link (#1064)
Quentin-Anthony Oct 31, 2023
fcc5af5
Edge-casing for multi-GPU HF-to-NeoX conversion (#1065)
haileyschoelkopf Nov 1, 2023
8c9fc00
Create tools __init__.py for import (#1068)
Quentin-Anthony Nov 1, 2023
a10f69c
Pin version of `lm_eval` (#1070)
haileyschoelkopf Nov 1, 2023
41f019e
fixed case when ntasks_per_node is used instead (#1069)
AIproj Nov 1, 2023
90aa131
Update README.md
StellaAthena Nov 5, 2023
04dc2ba
When processing mlp.dense_4h_to_h.bias and attention.dense.bias, tp_r…
kyuheejang Nov 7, 2023
f214358
Merge pull request #1072 from kyuheejang/Fixing-neox-to-huggingface
StellaAthena Nov 7, 2023
d8028f8
Resolve error in the `test_neoxargs_usage` unit test (#1074)
mkerin Nov 8, 2023
10bf788
Update neox_args.py (#1081)
jahatef Nov 16, 2023
f48d3a6
Update README.md (#1082)
StellaAthena Nov 22, 2023
efea81f
Update README.md
StellaAthena Nov 30, 2023
3be59a4
Extend ci suite (#1080)
mkerin Dec 4, 2023
a2b2020
Patch coverity scan (#1090)
jaimemcc-intel Dec 4, 2023
050f560
Corrects FLOPs formula as per 1093 (#1094)
StellaAthena Dec 6, 2023
f19b2ec
Update CODEOWNERS
StellaAthena Dec 19, 2023
07166da
Bump transformers from 4.30.2 to 4.36.0 in /requirements (#1097)
dependabot[bot] Dec 20, 2023
9283eff
Pins old DeeperSpeed until bug is fixed (#1095)
StellaAthena Dec 20, 2023
9eef954
Update README.md
StellaAthena Dec 22, 2023
a48e09e
Update README.md
StellaAthena Dec 22, 2023
613e5a6
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
be7eeda
Update README.md
StellaAthena Dec 22, 2023
2117afc
Update README.md
StellaAthena Dec 22, 2023
8dba5b6
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
f161245
Add QK Normalization (#1100)
lintangsutawika Dec 22, 2023
7fb3b3c
Update README.md
StellaAthena Dec 22, 2023
a7509f0
Update README.md
StellaAthena Dec 22, 2023
8eaac4e
Merge branch 'main' into StellaAthena-patch-4-1
StellaAthena Dec 22, 2023
4d5a811
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
05cc29c
Merge pull request #1099 from EleutherAI/StellaAthena-patch-4-1
StellaAthena Dec 22, 2023
e25446e
Merge branch 'main' into StellaAthena-patch-4
StellaAthena Dec 22, 2023
287f9f7
Merge pull request #1102 from EleutherAI/StellaAthena-patch-4
StellaAthena Dec 22, 2023
b27e409
Lm eval 0.4.0 support (#1101)
haileyschoelkopf Dec 23, 2023
1148a0f
Update README.md
StellaAthena Dec 23, 2023
e5a7ea7
Update neox_args.py (#1107)
StellaAthena Dec 26, 2023
eca6b1a
Fix repo for CI (#1106)
yang Jan 4, 2024
98716eb
Fix install, Dockerfile, CI (#1104)
yang Jan 4, 2024
77605ca
Fused Rotary Embeddings (fixed) (#1108)
yang Jan 5, 2024
f14782a
Add pythia 14M and 31M configs (#1111)
segyges Jan 5, 2024
e6e944a
Add docker compose and change containerized setup instructions to use…
segyges Jan 9, 2024
92b1b6f
Fix openwebtext2 downloader, backport improvements to DataDownloader …
segyges Jan 11, 2024
90f70ff
Bump jinja2 from 3.1.2 to 3.1.3 in /requirements (#1120)
dependabot[bot] Jan 13, 2024
6399155
Enable passing of `--account` to `srun` / SlurmLauncher (#1126)
haileyschoelkopf Jan 19, 2024
7a8fa2f
update copyrights (#1128)
jahatef Jan 24, 2024
3d8fec0
fused layernorm (#1105)
yang Jan 26, 2024
e5602c3
Contributing Guide (#1138)
jahatef Jan 29, 2024
1c133bf
moved eval import and added to docs (#1139)
R0n12 Jan 30, 2024
032ec8c
Update lm_eval v0.4 to PyPI dependencies (#1141)
haileyschoelkopf Feb 1, 2024
91c44bc
Remove gas (beano) (#1144)
segyges Feb 5, 2024
f7373f8
Improve Conversion Utilities (#1124)
haileyschoelkopf Feb 8, 2024
412cf6e
Fixes distributed tests, and skips tests that are broken. (#1149)
jahatef Feb 21, 2024
46d179c
Memory profiling (#1153)
jahatef Feb 21, 2024
eee03b2
add profiling to readme (#1154)
jahatef Feb 23, 2024
a7638a8
Python version update (#1122)
segyges Feb 23, 2024
72d1803
Minor changes (#1125)
segyges Feb 23, 2024
f36aed7
Draft PR Adding mistral 0.1 (#1131)
AIproj Feb 23, 2024
9663802
[Bug?] Fix profiling argument names (#1155)
haileyschoelkopf Feb 26, 2024
3c03fc7
Update cpu_ci.yml (#1159)
jaimemcc-intel Feb 29, 2024
19596b0
Improve argument validation for Flash-attn + SWA (#1162)
haileyschoelkopf Mar 2, 2024
119950c
Single node Pythia 14M training on ngc pytorch 24.02 container (#1170)
tf-nv Mar 4, 2024
7b8187a
Remove unnecessary fp32/bf16 conversion (#1169)
DayOfThePenguin Mar 4, 2024
31cfe52
Ignore markdown for pre-commit (#1171)
Quentin-Anthony Mar 4, 2024
e109bf5
Make rotary freqs buffer non-persistent (#1168)
haileyschoelkopf Mar 4, 2024
df8cf24
Support Lion with Zero Optimizer (#1166)
DayOfThePenguin Mar 4, 2024
86758c3
Add MoE (#1129)
yang Mar 7, 2024
63b9fa1
remove `best_download` as dependency (#1179)
haileyschoelkopf Mar 8, 2024
90d4cb3
Fix documentation for --jsonl-keys argument of preprocess_data script…
KeitaW Mar 8, 2024
8c13642
clean up dockerfile: (#1175)
tf-nv Mar 8, 2024
c1fa994
When using kv cache and flash attention in conjunction, it's crucial …
chaochen99 Mar 8, 2024
1e7abe7
Remove gas from Pythia configs (#1181)
yang Mar 8, 2024
82ddc66
Fix moe_loss in gpt_j_residual path (#1180)
yang Mar 8, 2024
6809bbc
Add Mamba Architecture (#1157)
haileyschoelkopf Mar 10, 2024
03186de
Switch to using Cuda Flash Attn for Alibi (#1183)
haileyschoelkopf Mar 13, 2024
277141e
Mamba + Tensor Parallel Support (#1184)
haileyschoelkopf Mar 15, 2024
7267a74
[ZeRO-3] Partitioned init with `deepspeed.zero.Init()` (#1190)
R0n12 Mar 19, 2024
e6b5261
Small typo in the README
Mar 26, 2024
4085302
Merge pull request #1196 from edouardoyallon/typo_readme
StellaAthena Mar 26, 2024
1960b66
Added more papers
StellaAthena Mar 26, 2024
3616658
Update README.md
StellaAthena Mar 26, 2024
977448e
making PR triggered CPU test for changes to megatron (#1195)
jaimemcc-intel Apr 1, 2024
51a7de9
[AMD] Supporting fused kernels build using JIT (#1188)
R0n12 Apr 1, 2024
01657aa
[ZeRO-3] Ensured passing neox deepspeed_config when using partitioned…
R0n12 Apr 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Support Lion with Zero Optimizer (EleutherAI#1166)
* feat: deepspeed zero lion support

* feat: bump DeeperSpeed version to one that includes DeepSpeed FusedLion

* feat: bump DeeperSpeed version to include pipeline logging fix

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
  • Loading branch information
DayOfThePenguin and Quentin-Anthony committed Mar 4, 2024
commit df8cf244be4b73d3c6d50cd610f4b1a57af0c678
13 changes: 11 additions & 2 deletions megatron/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,9 +538,18 @@ def get_optimizer(model, neox_args):
**neox_args.optimizer["params"],
)
elif neox_args.optimizer_type.lower() == "lion":
from .optimizers import Lion
# if we want the deepspeed zero lion...megatron lion will throw DeepSpeed Error
if neox_args.zero_optimization["stage"] != 0:
from deepspeed.ops.lion import FusedLion

optimizer = Lion(
lion_optimizer = FusedLion
# if not zero
else:
from .optimizers import Lion

lion_optimizer = Lion

optimizer = lion_optimizer(
param_groups,
weight_decay=neox_args.weight_decay,
**neox_args.optimizer["params"],
Expand Down
2 changes: 1 addition & 1 deletion requirements/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
best_download
git+https://github.com/EleutherAI/DeeperSpeed.git@b9260436e7da3e297fc6bedfd27d9e69fbba6f5c#egg=deepspeed
git+https://github.com/EleutherAI/DeeperSpeed.git@02e2ebf7dee6aaab3d89094ed470a4609763c742#egg=deepspeed
ftfy>=6.0.1
git+https://github.com/EleutherAI/lm_dataformat.git@4eec05349977071bf67fc072290b95e31c8dd836
huggingface_hub>=0.11.0
Expand Down