merge upstream #64

kshitijkg · 2024-04-04T04:19:38Z

No description provided.

* Fixed final value of cosine decay lr * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Update Dockerfile * Update Dockerfile

* Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update transformers version Signed-off-by: Dashiell Stander <[email protected]> * Update the enwik8 URL to the one HF uses, the old one is down. Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]>

* Update README.md Fix broken link * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Fix bugs so we can use bf16 with zero > 0 Signed-off-by: Dashiell Stander <[email protected]> * Typo Signed-off-by: Dashiell Stander <[email protected]> * Typo Signed-off-by: Dashiell Stander <[email protected]> * With the DeepSpeed updates there may be no need to do grad_accum in fp32 Signed-off-by: Dashiell Stander <[email protected]> * Add warning about necessity of fp32 grad_accum with bf16, pp>0, and zero1 Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]>

* Remove lazy dataset implementation option Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Fix SequentialGeneration * Fix SequentialGeneration

* Fix register_buffer parameter * Fix register_buffer parameter

* Add flash 2.x message to README.md * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* add s3 checkpoint syncing * Update NeoXArgs docs automatically * remove CPCargo requirement * Update NeoXArgs docs automatically * Make s3 imports try-except and separate requirements to s3 file * Update NeoXArgs docs automatically * Announce feature * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Try out just using the HF implementation Signed-off-by: Dashiell Stander <[email protected]> * Rely solely on HF tokenizer. Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]>

* Pre-commit Signed-off-by: Dashiell Stander <[email protected]> * Sequence dimension is 0 Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Ensure that LR annealing is correct even after loading from checkpoint. Patch from Eric Nguyen Co-authored-by: Eric Nguyen <[email protected]> Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Test whether we need the whole patch Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Turns out we do not need the entire patch, just one line Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: Eric Nguyen <[email protected]> Co-authored-by: github-actions <[email protected]>

* Use Megatron-DeepSpeed flops calculation Signed-off-by: Dashiell Stander <[email protected]> * Use Megatron-DeepSpeed flops calculation Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Direct comparison of FLOPS calculations Signed-off-by: Dashiell Stander <[email protected]> * Remove test logging Signed-off-by: Dashiell Stander <[email protected]> --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* adding boilerplate coverity scan to submit to public analysis * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

) * Add documentation about kicking off distributed jobs Signed-off-by: Dashiell Stander <[email protected]> * Add documentation about kicking off distributed jobs Signed-off-by: Dashiell Stander <[email protected]> * Add documentation about kicking off distributed jobs Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Added more info on run command modification and cleaned up a bit * slight cleanup * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Fix readme typo * Update NeoXArgs docs automatically * More typos * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Update CITATION.cff * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Switch default command for docker image * Rename pythia paths docker file for clarity * Update docker build to use python 3.10 * Update github workflows to use ubuntu 22.04 and python 3.10 * Bump pytorch library patch versions * Add pytest-html for reasonably formatted test reports * Fix build after torch and cuda version bump * Fix apex install for newer version 1) This, empirically, works, as tested by running the build and kicking off training. 2) Apex documentation says it is incorrect syntax and deprecated. 3) It takes so long to compile that it is probably, all by itself, something that needs fixing. 4) I will probably pull the fused adamw out of apex. 5) It has been building for twenty minutes so I am going to go do something else. * Fix pip version to ensure apex compilation remains good * Fix unit test for evaluate * Fix pip requirement Prevents possible build issues with apex especially across divergent pip versions * Update dockerfile to point to stripped-down apex repo * Revert "Update dockerfile to point to stripped-down apex repo" This reverts commit 40c7656. * Update apex version in dockerfile * Switch to downloading prebuilt apex wheel * Clean up docker copy commands * Have docker build conditionally get binaries or build apex * Apply precommit

* Switch default command for docker image * Rename pythia paths docker file for clarity * Fix unit test for evaluate * Update readme for testing to omit --forked argument * Add pytest-html to requirements-dev.txt * Revert "Update readme for testing to omit --forked argument" This reverts commit 19021fc. * Add data/ directory and .bin and .idx files in /tests/data to .gitignore This makes it so that git doesn't try to let you commit (or force you to stash) data files * Make .gitignore for data files slightly more elegant * Add utility script for doing token counts on processed datasets * Run precommit hook * Fix token count script, run precommit

* add support for flash attention 2 * change cosine decay to chinchilla style * set default warmup to none so that warmup_iters can be set * fixed bug * fixed chinchilla lr * add s3 checkpoint syncing * rotary embedding in fp32 * fix for seq_len < max_seq_len * some fixes, still not working * ?' : * fix bugs; evaluate on step 0 * first attempt at gqa * gqa works in kv_heads==query_heads case * gqa working * workaround for FSX quota * update with llemma * update with recent PR * README and requirements updated * Added Mistral config * Added sliding window through flash attention 2 * Added sliding window * Mistral should likely use mp=2 like llama2 * Update gitignore * Removed unused CPCargo import * Conversion script (WIP) * Fixed missing slurm environ vars * updated mistral config * updated job script * initial commit conversion mistral hf to sequential * Added stacking q, k, v appropriately for mp ranks * pp=0 support from end of 2023 * Cleaning up config and removing Autoconfig in conversion script * Cleaned up conversion example script * cleanup: add back configs folder, discard Llemma readme * cleanup: remove llemma lr sched changes, re-add requirements/ folder * docs: add explanation of intermediate_size behavior * args: add argument checking for num_kv_heads, clean up usage syntax * args: prevent num KV heads < TP worldsize * readd triton flash attn func * cleanup: use tools/ dir from main * docs: re-add mistral , GQA as supported * cleanup: delete duplicate tools/ files * cleanup: use fp32 rope (non-fused) from main * cleanup: no longer block out GQA codepaths in conversion scripts * cleanup: gqa code a bit * add llama2, llemma configs * add non-flash GQA ; refactor modeling code * clean up mistral config for commit * further cleanup configs dir * remove slurm script from llemma * update seqlen params for codellama, llemma configs * add more comments to GQA code, and make reshapes more readable * make inv_freq non-persistent * actually, just ensure mistral has inv_freqs as a persistent buffer * non-flash GQA works, so ensure arguments.py permits it * no longer use our own copies of flash attention interface functions * remove unused mpu util fn * delete unused config file * fix diff on mpu/utils.py * remove slurm scripts that won't be in this PR * run pre-commit * update tests for conversion scripts * add flash version check for sliding window * pre-commit --------- Co-authored-by: zhangir-azerbayev <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* possibly fix profiling flag names * actually, profile_backward already exists * Update NeoXArgs docs automatically * neox_args.profile was also used some places, update that too * Update NeoXArgs docs automatically * profiling --> profile * Update NeoXArgs docs automatically * Revert neox_arguments.md changes * Update NeoXArgs docs automatically * Update gen_docs since __name__ only returns the Literal for string args with Python 3.10 * Update NeoXArgs docs automatically * Another update to preserve non-literals * Update NeoXArgs docs automatically * add union * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update cpu_ci.yml Updating the workflow to point CPU workflow towards self hosted runner versus Github provided runners * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Improve argument validation for Flash-attn + SWA * Update NeoXArgs docs automatically * don't pass window_size if not necessary * Update NeoXArgs docs automatically * Update 7B.yml * Update NeoXArgs docs automatically * apply precommit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Pythia 14M training on ngc pytorch 24.02 container * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

* feat: remove unnecessary bf16 conversions since no collective op is performed * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

* ignore markdown for pre-commit * only ignore end of file and trailing whitespace * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* make inv_freq non-persistent by default * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* feat: deepspeed zero lion support * feat: bump DeeperSpeed version to one that includes DeepSpeed FusedLion * feat: bump DeeperSpeed version to include pipeline logging fix * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

* Add DeepSpeed MoE Thanks to dayofthepenguin for extensive testing Closes #479 * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update requirements.txt * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

#1176)

- Eliminate already installed apt packages - sparse attn requirement lead to a triton downgrade - flash attn is already part of the ngc container (in another version that is compatible with TE)

…to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results. (#1178)

Fixes #1165 Co-authored-by: Yang Zhang <[email protected]>

Fixes #1174 Co-authored-by: Yang Zhang <[email protected]>

* initial mamba support (no kernels, no parallelism) * Mamba runs! Also, add flags for sel. scan and conv1d fused kernels * Update NeoXArgs docs automatically * add mamba_inner_fn ; try really hard to make A_log and D no-WD and stored in fp32 * cleanup print statements * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add draft conversion script (tested working TP=1) * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * update parallelism checks for mamba--partition activations works * add mamba requirements * clean up and better comment mamba code * clean up and better comment mamba code * update arg validation in mamba * more cleanup * add flag for fp32 Alog/D, add init_methods support for mamba * Update NeoXArgs docs automatically * update conversion script name, add docstring * name conversion script * Update NeoXArgs docs automatically * add demo configs * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add arguments to control conv and (in,out)_proj biases in mamba separately * Update NeoXArgs docs automatically * make x_proj bias also controlled by flag * Update NeoXArgs docs automatically * pre-commit, add comments * Update NeoXArgs docs automatically * Add mamba import print * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* add cuda support for flash attn w/ alibi, warn of deprecation of triton * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* TP works! * merge TP mamba changes with most current MambaLayer * cleanup TP, confirmed working still * make shapes with TP>1 work with conversion * tested and PP works, so no need for assert blocking it in arguments * update comment * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* added ds zero.Init() to get_model * Clean up conditional with block * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

ENH Small typo in the README

* making PR triggered CPU test for changes to megatron * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* initial JIT load functions * passing neox_arge to load() as optional for easy testing * modified headers for correct copyright statements

… init (#1191) * added ds zero.Init() to get_model * Clean up conditional with block * pre-commit * ensured deepspeed configs are passed to init --------- Co-authored-by: Quentin Anthony <[email protected]>

kshitijkg and others added 30 commits August 12, 2023 16:10

Fixed final value of cosine decay lr (#1011)

eda5aae

* Fixed final value of cosine decay lr * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

turn hyphens into underscores in merge20b.py (#986)

01b5e22

Update Dockerfile (#1014)

d8bcd97

* Update Dockerfile * Update Dockerfile

Fix Generation with Sequential Model (#1026)

960ed3d

Fix broken link (#1022)

97e376c

* Update README.md Fix broken link * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

add llama training config (#1023)

7821aa7

Create README_llama.md (#1027)

cdc94ee

Rename README_llama.md to README.md (#1028)

737c913

Add llama generation script (#1030)

c883e8c

Fix SequentialWrapper Generation (pipe_parallel_size = 0) (#1031)

70af6e8

* Fix SequentialGeneration * Fix SequentialGeneration

integrated flash attention 2 (#1035)

8903a96

Fix register_buffer parameter (#1036)

0ce77ab

* Fix register_buffer parameter * Fix register_buffer parameter

Add flash 2.x message to README.md (#1037)

444c0ef

* Add flash 2.x message to README.md * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Fixed final value of linear decay lr (#1039)

390d37c

Fix final value of exponential decay lr (#1040)

e431ff5

Fix readme typos (#1049)

2c60645

* Fix readme typo * Update NeoXArgs docs automatically * More typos * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Update citation list (#1052)

b14d6f7

Update CITATION.cff (#1053)

93cac79

* Update CITATION.cff * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Remove duplicated hf_config (#1054)

7a8569f

segyges and others added 29 commits February 22, 2024 16:52

Update cpu_ci.yml (#1159)

3c03fc7

* Update cpu_ci.yml Updating the workflow to point CPU workflow towards self hosted runner versus Github provided runners * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Single node Pythia 14M training on ngc pytorch 24.02 container (#1170)

119950c

* Pythia 14M training on ngc pytorch 24.02 container * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

Remove unnecessary fp32/bf16 conversion (#1169)

7b8187a

* feat: remove unnecessary bf16 conversions since no collective op is performed * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

Ignore markdown for pre-commit (#1171)

31cfe52

* ignore markdown for pre-commit * only ignore end of file and trailing whitespace * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Make rotary freqs buffer non-persistent (#1168)

e109bf5

* make inv_freq non-persistent by default * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

remove best_download as dependency (#1179)

63b9fa1

* Update requirements.txt * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Fix documentation for --jsonl-keys argument of preprocess_data scripts (

90d4cb3

#1176)

clean up dockerfile: (#1175)

8c13642

- Eliminate already installed apt packages - sparse attn requirement lead to a triton downgrade - flash attn is already part of the ngc container (in another version that is compatible with TE)

When using kv cache and flash attention in conjunction, it's crucial …

c1fa994

…to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results. (#1178)

Remove gas from Pythia configs (#1181)

1e7abe7

Fixes #1165 Co-authored-by: Yang Zhang <[email protected]>

Fix moe_loss in gpt_j_residual path (#1180)

82ddc66

Fixes #1174 Co-authored-by: Yang Zhang <[email protected]>

Switch to using Cuda Flash Attn for Alibi (#1183)

03186de

* add cuda support for flash attn w/ alibi, warn of deprecation of triton * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

[ZeRO-3] Partitioned init with deepspeed.zero.Init() (#1190)

7267a74

* added ds zero.Init() to get_model * Clean up conditional with block * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

Small typo in the README

e6b5261

Merge pull request #1196 from edouardoyallon/typo_readme

4085302

ENH Small typo in the README

Added more papers

1960b66

Update README.md

3616658

[AMD] Supporting fused kernels build using JIT (#1188)

51a7de9

* initial JIT load functions * passing neox_arge to load() as optional for easy testing * modified headers for correct copyright statements

[ZeRO-3] Ensured passing neox deepspeed_config when using partitioned…

01657aa

… init (#1191) * added ds zero.Init() to get_model * Clean up conditional with block * pre-commit * ensured deepspeed configs are passed to init --------- Co-authored-by: Quentin Anthony <[email protected]>

kshitijkg merged commit 5790435 into CERC-AAI:main Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge upstream #64

merge upstream #64

kshitijkg commented Apr 4, 2024

merge upstream #64

merge upstream #64

Conversation

kshitijkg commented Apr 4, 2024