forked from EleutherAI/gpt-neox
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeepSpeed MoE #4
Open
yang
wants to merge
25
commits into
main
Choose a base branch
from
moe2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* update copyrights * Update NeoXArgs docs automatically * nvidia copyright years * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>
* Add simple util for CUDA timings * Add fused layernorm kernel from Megatron Closes EleutherAI#952 * change default fused layernorm to false * Update test_setup.yml * Update test_train_base.yml --------- Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: jahatef <[email protected]> Co-authored-by: Jacob Hatef <[email protected]>
* contributing guide * Update NeoXArgs docs automatically * Update CONTRIBUTING.md * Update NeoXArgs docs automatically * Remove microsoft references and link on main readme * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
* Update requirements.txt * Update requirements.txt * Update NeoXArgs docs automatically * add note to neox_args.py * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
* Remove 'gas' configuration variable * Remove gas from configs and config documentation * Update training.py
* draft: unify sequential + PPModule conversion scripts * Update NeoXArgs docs automatically * draft: pull out model param names / model definition * Update NeoXArgs docs automatically * tested: neox models with TP = 1, PipelineModule, work * Update NeoXArgs docs automatically * draft: Llama + GQA QKV resharding * Update NeoXArgs docs automatically * update Llama conversion script to support Mistral and GQA * Update NeoXArgs docs automatically * test Mistral-7B conversion * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * push documentation on imports / Llama loading * push further readme updates (Mistral included) * Preventconversions for unsupported featurees, disclaim in README * Update NeoXArgs docs automatically * revert PR#1072 RowParallel bias conversion error * remove sequential_to_hf and module_to_hf scripts, deprecated in favor of convert_neox_to_hf.py * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
…#1149) * Fixes distributed tests, and skips tests that are broken. * Update NeoXArgs docs automatically * improve pytest msgs and remove commented code * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
* Fixes distributed tests, and skips tests that are broken. * memory profiling for gpt-neox. Only works for pp=0, pp=1+ needs DS commits. * Update NeoXArgs docs automatically * adds memory profiling for pipeline parallel * Update NeoXArgs docs automatically * fix spacing * Update NeoXArgs docs automatically * fix spacing again * Update NeoXArgs docs automatically * get rid of unwanted changes * Update NeoXArgs docs automatically * get rid of file * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add nsight systems support * remove tests changes again * Update NeoXArgs docs automatically * add tests * Update NeoXArgs docs automatically * Update training.py * Update NeoXArgs docs automatically * Add assertion message * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
* add profiling to readme * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>
* Switch default command for docker image * Rename pythia paths docker file for clarity * Update docker build to use python 3.10 * Update github workflows to use ubuntu 22.04 and python 3.10 * Bump pytorch library patch versions * Add pytest-html for reasonably formatted test reports * Fix build after torch and cuda version bump * Fix apex install for newer version 1) This, empirically, works, as tested by running the build and kicking off training. 2) Apex documentation says it is incorrect syntax and deprecated. 3) It takes so long to compile that it is probably, all by itself, something that needs fixing. 4) I will probably pull the fused adamw out of apex. 5) It has been building for twenty minutes so I am going to go do something else. * Fix pip version to ensure apex compilation remains good * Fix unit test for evaluate * Fix pip requirement Prevents possible build issues with apex especially across divergent pip versions * Update dockerfile to point to stripped-down apex repo * Revert "Update dockerfile to point to stripped-down apex repo" This reverts commit 40c7656. * Update apex version in dockerfile * Switch to downloading prebuilt apex wheel * Clean up docker copy commands * Have docker build conditionally get binaries or build apex * Apply precommit
* Switch default command for docker image * Rename pythia paths docker file for clarity * Fix unit test for evaluate * Update readme for testing to omit --forked argument * Add pytest-html to requirements-dev.txt * Revert "Update readme for testing to omit --forked argument" This reverts commit 19021fc. * Add data/ directory and .bin and .idx files in /tests/data to .gitignore This makes it so that git doesn't try to let you commit (or force you to stash) data files * Make .gitignore for data files slightly more elegant * Add utility script for doing token counts on processed datasets * Run precommit hook * Fix token count script, run precommit
* add support for flash attention 2 * change cosine decay to chinchilla style * set default warmup to none so that warmup_iters can be set * fixed bug * fixed chinchilla lr * add s3 checkpoint syncing * rotary embedding in fp32 * fix for seq_len < max_seq_len * some fixes, still not working * ?' : * fix bugs; evaluate on step 0 * first attempt at gqa * gqa works in kv_heads==query_heads case * gqa working * workaround for FSX quota * update with llemma * update with recent PR * README and requirements updated * Added Mistral config * Added sliding window through flash attention 2 * Added sliding window * Mistral should likely use mp=2 like llama2 * Update gitignore * Removed unused CPCargo import * Conversion script (WIP) * Fixed missing slurm environ vars * updated mistral config * updated job script * initial commit conversion mistral hf to sequential * Added stacking q, k, v appropriately for mp ranks * pp=0 support from end of 2023 * Cleaning up config and removing Autoconfig in conversion script * Cleaned up conversion example script * cleanup: add back configs folder, discard Llemma readme * cleanup: remove llemma lr sched changes, re-add requirements/ folder * docs: add explanation of intermediate_size behavior * args: add argument checking for num_kv_heads, clean up usage syntax * args: prevent num KV heads < TP worldsize * readd triton flash attn func * cleanup: use tools/ dir from main * docs: re-add mistral , GQA as supported * cleanup: delete duplicate tools/ files * cleanup: use fp32 rope (non-fused) from main * cleanup: no longer block out GQA codepaths in conversion scripts * cleanup: gqa code a bit * add llama2, llemma configs * add non-flash GQA ; refactor modeling code * clean up mistral config for commit * further cleanup configs dir * remove slurm script from llemma * update seqlen params for codellama, llemma configs * add more comments to GQA code, and make reshapes more readable * make inv_freq non-persistent * actually, just ensure mistral has inv_freqs as a persistent buffer * non-flash GQA works, so ensure arguments.py permits it * no longer use our own copies of flash attention interface functions * remove unused mpu util fn * delete unused config file * fix diff on mpu/utils.py * remove slurm scripts that won't be in this PR * run pre-commit * update tests for conversion scripts * add flash version check for sliding window * pre-commit --------- Co-authored-by: zhangir-azerbayev <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
* possibly fix profiling flag names * actually, profile_backward already exists * Update NeoXArgs docs automatically * neox_args.profile was also used some places, update that too * Update NeoXArgs docs automatically * profiling --> profile * Update NeoXArgs docs automatically * Revert neox_arguments.md changes * Update NeoXArgs docs automatically * Update gen_docs since __name__ only returns the Literal for string args with Python 3.10 * Update NeoXArgs docs automatically * Another update to preserve non-literals * Update NeoXArgs docs automatically * add union * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
* Update cpu_ci.yml Updating the workflow to point CPU workflow towards self hosted runner versus Github provided runners * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>
* Improve argument validation for Flash-attn + SWA * Update NeoXArgs docs automatically * don't pass window_size if not necessary * Update NeoXArgs docs automatically * Update 7B.yml * Update NeoXArgs docs automatically * apply precommit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>
…herAI#1170) * Pythia 14M training on ngc pytorch 24.02 container * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>
* feat: remove unnecessary bf16 conversions since no collective op is performed * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>
* ignore markdown for pre-commit * only ignore end of file and trailing whitespace * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>
* make inv_freq non-persistent by default * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>
* feat: deepspeed zero lion support * feat: bump DeeperSpeed version to one that includes DeepSpeed FusedLion * feat: bump DeeperSpeed version to include pipeline logging fix * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>
a14cedd
to
89e16e8
Compare
Thanks to dayofthepenguin for extensive testing Closes EleutherAI#479
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks to dayofthepenguin for extensive testing
Closes EleutherAI#479