Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeepSpeed MoE #4

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Add DeepSpeed MoE #4

wants to merge 25 commits into from

Conversation

yang
Copy link
Owner

@yang yang commented Mar 4, 2024

Thanks to dayofthepenguin for extensive testing

Closes EleutherAI#479

jahatef and others added 21 commits January 24, 2024 13:18
* update copyrights

* Update NeoXArgs docs automatically

* nvidia copyright years

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* Add simple util for CUDA timings

* Add fused layernorm kernel from Megatron

Closes EleutherAI#952

* change default fused layernorm to false

* Update test_setup.yml

* Update test_train_base.yml

---------

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: jahatef <[email protected]>
Co-authored-by: Jacob Hatef <[email protected]>
* contributing guide

* Update NeoXArgs docs automatically

* Update CONTRIBUTING.md

* Update NeoXArgs docs automatically

* Remove microsoft references and link on main readme

* Update NeoXArgs docs automatically

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Update requirements.txt

* Update requirements.txt

* Update NeoXArgs docs automatically

* add note to neox_args.py

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Remove 'gas' configuration variable

* Remove gas from configs and config documentation

* Update training.py
* draft: unify sequential + PPModule conversion scripts

* Update NeoXArgs docs automatically

* draft: pull out model param names / model definition

* Update NeoXArgs docs automatically

* tested: neox models with TP = 1, PipelineModule, work

* Update NeoXArgs docs automatically

* draft: Llama + GQA QKV resharding

* Update NeoXArgs docs automatically

* update Llama conversion script to support Mistral and GQA

* Update NeoXArgs docs automatically

* test Mistral-7B conversion

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* push documentation on imports / Llama loading

* push further readme updates (Mistral included)

* Preventconversions for unsupported featurees, disclaim in README

* Update NeoXArgs docs automatically

* revert PR#1072 RowParallel bias conversion error

* remove sequential_to_hf and module_to_hf scripts, deprecated in favor of convert_neox_to_hf.py

* Update NeoXArgs docs automatically

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
…#1149)

* Fixes distributed tests, and skips tests that are broken.

* Update NeoXArgs docs automatically

* improve pytest msgs and remove commented code

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Fixes distributed tests, and skips tests that are broken.

* memory profiling for gpt-neox. Only works for pp=0, pp=1+ needs DS commits.

* Update NeoXArgs docs automatically

* adds memory profiling for pipeline parallel

* Update NeoXArgs docs automatically

* fix spacing

* Update NeoXArgs docs automatically

* fix spacing again

* Update NeoXArgs docs automatically

* get rid of unwanted changes

* Update NeoXArgs docs automatically

* get rid of file

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* add nsight systems support

* remove tests changes again

* Update NeoXArgs docs automatically

* add tests

* Update NeoXArgs docs automatically

* Update training.py

* Update NeoXArgs docs automatically

* Add assertion message

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* add profiling to readme

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* Switch default command for docker image

* Rename pythia paths docker file for clarity

* Update docker build to use python 3.10

* Update github workflows to use ubuntu 22.04 and python 3.10

* Bump pytorch library patch versions

* Add pytest-html for reasonably formatted test reports

* Fix build after torch and cuda version bump

* Fix apex install for newer version

1) This, empirically, works, as tested by running the build and kicking off training.
2) Apex documentation says it is incorrect syntax and deprecated.
3) It takes so long to compile that it is probably, all by itself, something that needs fixing.
4) I will probably pull the fused adamw out of apex.
5) It has been building for twenty minutes so I am going to go do something else.

* Fix pip version to ensure apex compilation remains good

* Fix unit test for evaluate

* Fix pip requirement

Prevents possible build issues with apex especially across divergent pip versions

* Update dockerfile to point to stripped-down apex repo

* Revert "Update dockerfile to point to stripped-down apex repo"

This reverts commit 40c7656.

* Update apex version in dockerfile

* Switch to downloading prebuilt apex wheel

* Clean up docker copy commands

* Have docker build conditionally get binaries or build apex

* Apply precommit
* Switch default command for docker image

* Rename pythia paths docker file for clarity

* Fix unit test for evaluate

* Update readme for testing to omit --forked argument

* Add pytest-html to requirements-dev.txt

* Revert "Update readme for testing to omit --forked argument"

This reverts commit 19021fc.

* Add data/ directory and .bin and .idx files in /tests/data to .gitignore

This makes it so that git doesn't try to let you commit (or force you to stash) data files

* Make .gitignore for data files slightly more elegant

* Add utility script for doing token counts on processed datasets

* Run precommit hook

* Fix token count script, run precommit
* add support for flash attention 2

* change cosine decay to chinchilla style

* set default warmup to none so that warmup_iters can be set

* fixed bug

* fixed chinchilla lr

* add s3 checkpoint syncing

* rotary embedding in fp32

* fix for seq_len < max_seq_len

* some fixes, still not working

* ?'
:

* fix bugs; evaluate on step 0

* first attempt at gqa

* gqa works in kv_heads==query_heads case

* gqa working

* workaround for FSX quota

* update with llemma

* update with recent PR

* README and requirements updated

* Added Mistral config

* Added sliding window through flash attention 2

* Added sliding window

* Mistral should likely use mp=2 like llama2

* Update gitignore

* Removed unused CPCargo import

* Conversion script (WIP)

* Fixed missing slurm environ vars

* updated mistral config

* updated job script

* initial commit conversion mistral hf to sequential

* Added stacking q, k, v appropriately for mp ranks

* pp=0 support from end of 2023

* Cleaning up config and removing Autoconfig in conversion script

* Cleaned up conversion example script

* cleanup: add back configs folder, discard Llemma readme

* cleanup: remove llemma lr sched changes, re-add requirements/ folder

* docs: add explanation of intermediate_size behavior

* args: add argument checking for num_kv_heads, clean up usage syntax

* args: prevent num KV heads < TP worldsize

* readd triton flash attn func

* cleanup: use tools/ dir from main

* docs: re-add mistral , GQA as supported

* cleanup: delete duplicate tools/ files

* cleanup: use fp32 rope (non-fused) from main

* cleanup: no longer block out GQA codepaths in conversion scripts

* cleanup: gqa code a bit

* add llama2, llemma configs

* add non-flash GQA ; refactor modeling code

* clean up mistral config for commit

* further cleanup configs dir

* remove slurm script from llemma

* update seqlen params for codellama, llemma configs

* add more comments to GQA code, and make reshapes more readable

* make inv_freq non-persistent

* actually, just ensure mistral has inv_freqs as a persistent buffer

* non-flash GQA works, so ensure arguments.py permits it

* no longer use our own copies of flash attention interface functions

* remove unused mpu util fn

* delete unused config file

* fix diff on mpu/utils.py

* remove slurm scripts that won't be in this PR

* run pre-commit

* update tests for conversion scripts

* add flash version check for sliding window

* pre-commit

---------

Co-authored-by: zhangir-azerbayev <[email protected]>
Co-authored-by: haileyschoelkopf <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* possibly fix profiling flag names

* actually, profile_backward already exists

* Update NeoXArgs docs automatically

* neox_args.profile was also used some places, update that too

* Update NeoXArgs docs automatically

* profiling --> profile

* Update NeoXArgs docs automatically

* Revert neox_arguments.md changes

* Update NeoXArgs docs automatically

* Update gen_docs since __name__ only returns the Literal for string args with Python 3.10

* Update NeoXArgs docs automatically

* Another update to preserve non-literals

* Update NeoXArgs docs automatically

* add union

* Update NeoXArgs docs automatically

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Update cpu_ci.yml

Updating the workflow to point CPU workflow towards self hosted runner versus Github provided runners

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* Improve argument validation for Flash-attn + SWA

* Update NeoXArgs docs automatically

* don't pass window_size if not necessary

* Update NeoXArgs docs automatically

* Update 7B.yml

* Update NeoXArgs docs automatically

* apply precommit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
…herAI#1170)

* Pythia 14M training on ngc pytorch 24.02 container

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* feat: remove unnecessary bf16 conversions since no collective op is performed

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* ignore markdown for pre-commit

* only ignore end of file and trailing whitespace

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* make inv_freq non-persistent by default

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* feat: deepspeed zero lion support

* feat: bump DeeperSpeed version to one that includes DeepSpeed FusedLion

* feat: bump DeeperSpeed version to include pipeline logging fix

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
@yang yang force-pushed the moe2 branch 2 times, most recently from a14cedd to 89e16e8 Compare March 6, 2024 21:30
Thanks to dayofthepenguin for extensive testing

Closes EleutherAI#479
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants