Update deeperspeed final #46

Quentin-Anthony · 2023-03-09T22:49:15Z

No description provided.

* add quant unit test * add codeowner * format fix * fix undefined symbol: curandSetPseudoRandomGeneratorSeed * modify ref fn name and add comment * add comments * add 4bit quant 16groups * fix * modify groups in ref code * parameterize tensor shape * single param * detach tensor * remove -lcurand flag * add back -lcurand flag Co-authored-by: Ammar Ahmad Awan <[email protected]>

MOE residual matmul unit tests Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>

* Fix formatting * Remove redundant variable

Co-authored-by: Ammar Ahmad Awan <[email protected]>

* mem access for quantize kernel * format * format fp32 * modify quant kernel * modify quant kernel2 * modify format * format * fix comments in pytest * fix comments in pytest * format * rerun Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Connor Holmes <[email protected]>

Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

* Unify macro definitions and constants in a single file * Conversion utility implementation. * Fix reversion from formatting * Bugfixes after testing with correct DeepSpeed * Inline markers are available on both HIP + CUDA

Co-authored-by: Saeyeol Lee <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

Co-authored-by: Olatunji Ruwase <[email protected]>

…2358) Co-authored-by: Reza Yazdani <[email protected]>

* format * remove round fn

Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

) Co-authored-by: Jeff Rasley <[email protected]>

…t#2356) Co-authored-by: Olatunji Ruwase <[email protected]>

* Collect error messages in results.csv Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

* batch of refactored tests * more test refactoring * fp16 test refactor * more refactors * added DistributedFixture class * applied DistributedFixture to first batch of tests as a trial * added DistributedFixture test and documentation * last tests * fixes for refactored tests * remove subdirs in workflow files * fix pytest syntax error * fix another syntax error * update imports * use DistFixture with elastic checkpoint test * missing import * update to shared class tmpdir for elastic test * moved test files * avoid duplicate test file name * last refactor and moving test files * formatting * fix broken import * testing forked AMD tests * update abstract method * use blob storage for accelerate and transformers tests * upgrade torch for acclerate CI Co-authored-by: Olatunji Ruwase <[email protected]>

…icrosoft#2390)

Co-authored-by: Michael Wyatt <[email protected]>

* data efficiency library update * data efficiency library update * data efficiency update * data efficiency update

* Make z3 respect comm dtype * Support fp32 comm dtype * Remove obsolete assert * Code cleanup

* Modify table for compatible web format * Add tutorial links to navigation * Add news bit to main readme * Update docs/_tutorials/automatic-tensor-parallelism.md Co-authored-by: Michael Wyatt <[email protected]> --------- Co-authored-by: Michael Wyatt <[email protected]>

* Check device count before running dist tests * fixing format for "Check device count before running dist tests" * Check device count against max world size * Check GPU count before launching dist tests * double-check GPU actually exists --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

* Remove deprecated `torch._six` imports Closes microsoft#2845. * Support older versions of PyTorch as well. --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

* Enable tensor fragments for zero 2 * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Support offload * Support multi-gpu * Cleanup * WIP * Update deepspeed/runtime/zero/stage3.py Co-authored-by: Stas Bekman <[email protected]> * Support padding * Update deepspeed/runtime/zero/stage3.py Co-authored-by: Stas Bekman <[email protected]> * z3 optimizer state support; aligned api * Support frozen z3 params * Unit tests * Check NVMe offload capability * Formatting * Docs * More docs * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: Stas Bekman <[email protected]> * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: Stas Bekman <[email protected]> * More docs * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * More docs * Support unsharded fp32 grad * Remove debug prints * Fix off-by-one detection of empty grads * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/runtime/zero/stage3.py Co-authored-by: Stas Bekman <[email protected]> * Fix off-by-one error * Skip ranks with no gradient data * Formatting * Add license * Fix license --------- Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

Co-authored-by: Olatunji Ruwase <[email protected]>

This PR updates the replace_fn function when loading inference checkpoints. The container will now be passed to the load_model_with_checkpoint() so we can call load_params() from there. load_params() is also updated to access the variables in the policy.

* microsoft#1213: Fix CPUAdam for when `vendor_id_raw` is not provided * formatting (yapf) fix --------- Co-authored-by: Olatunji Ruwase <[email protected]>

Updates `deepspeed/monitor/monitor.py` to instantiate objects with correct configs Relevant issue: microsoft#2853 Co-authored-by: Olatunji Ruwase <[email protected]>

* MPICH support * MPICH changes * MPICH changes * MPICH changes * MPICH changes * accelerator runtime modifications * Accelerator runtime changes * Accelerator runtime modifications * Remove redundant print from single node * Move hostfile to tmp * Code cleanup for MPICH class * Code cleanup, rm whitespace * Removing mpiexec environment check details * Not needed tmp hostfile as pass directly * Remove debugging comments * rm print statement * Revert comm changes as WA not needed * Use MPICHRunner name for class * Use MPICHRunner as class name * No need to use args.force_multi and args.launcher . This should be set in deepspeedexamples gpt-3.6b .sh script as: $launcher=MPICH run_cmd=" deepspeed --hostfile=${hostfile_ds} --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --launcher=${launcher} --force_multi pretrain_gpt2.py $@ ${gpt_options}" * Adhere to code pattern * Rm empty lines in MPICHRunner class * Uncomment check for num nodes and workers when used hostfile_deepspeed in gpt-3.6b.sh * pass MPICH hostfile through launcher_args in gpt-3.6b.sh * Clean code and remove args hostfile * fix merge * fix merge --------- Co-authored-by: Abhilash Majumder <[email protected]> * clean up and fix format * add ut --------- Co-authored-by: Abhilash Majumder <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

* check kernel injection supported models * Clarify why user should use kernel injection

Co-authored-by: Jeff Rasley <[email protected]>

…icrosoft#2221) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

Co-authored-by: Jeff Rasley <[email protected]>

Co-authored-by: Rajhans Samdani <[email protected]>

…f op_builder (microsoft#2963) Co-authored-by: Logan Adams <[email protected]>

)

mrwyattii and others added 30 commits September 14, 2022 01:11

refactor to use mem_access (microsoft#2317)

c199eda

only override forward if using cuda-graph (microsoft#2291)

cf638be

Add more options to inference benchmark (microsoft#2325)

1592381

bump to 0.7.4

0f0a7a5

MOE residual matmult unit test (microsoft#2323)

80b10d0

MOE residual matmul unit tests Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>

MOE matmult with memaccess (microsoft#2336)

12e1cb8

* Fix formatting * Remove redundant variable

Refactor residual add kernels (microsoft#2333)

48c5220

Co-authored-by: Ammar Ahmad Awan <[email protected]>

increase min pre-commit versions (microsoft#2346)

b76e0f4

Extend scratch buffer for long prompts (microsoft#2212)

3d097bb

Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

fix zero docs (microsoft#2350)

76de924

Inference profiling updates/fixes (microsoft#2348) (microsoft#2349)

9932643

Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

Kernel Data Conversion Utility (microsoft#2327)

9aa7b63

* Unify macro definitions and constants in a single file * Conversion utility implementation. * Fix reversion from formatting * Bugfixes after testing with correct DeepSpeed * Inline markers are available on both HIP + CUDA

Add Onebit Optimzers in __init__ (microsoft#2340)

f210256

Co-authored-by: Saeyeol Lee <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

docs(mixture-of-experts-inference): fix typo in tuto (microsoft#2345)

2b1b0d2

Co-authored-by: Olatunji Ruwase <[email protected]>

download cifar to blob storage (microsoft#2342)

6ef16de

Co-authored-by: Olatunji Ruwase <[email protected]>

Refactor gptj_residual_add kernels for better readability (microsoft#…

9df604b

…2358) Co-authored-by: Reza Yazdani <[email protected]>

Updated issue templates (microsoft#2363)

70e883a

Update issue templates

8e8c866

fix cuda invalid config error in dequant kernel (microsoft#2362)

3486afb

* format * remove round fn

Add missing pytest fixture scope (microsoft#2353)

b450da4

Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>

Extend residual_add kernel tests to conver pre_attn_norm (microsoft#2354

79692af

) Co-authored-by: Jeff Rasley <[email protected]>

Refactor fused_bias_residual kernels for better readability (microsof…

e14d40e

…t#2356) Co-authored-by: Olatunji Ruwase <[email protected]>

Capture error message during sweep tests (microsoft#2351)

eed4032

* Collect error messages in results.csv Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

fix an exception when recursively casting dicts to fp16 (microsoft#2370)

b609a29

Fix the MLP output tensor's shape (microsoft#2380)

0a2ae2e

allow building with latest CUDA (11.8), it is backwards compatible (m…

f5a8348

…icrosoft#2390)

pin transformers version for unit tests (microsoft#2402)

6f3dec6

molly-smith and others added 27 commits February 21, 2023 11:52

Create tensor parallelism blog/tutorial (microsoft#2766)

3253e7d

Co-authored-by: Michael Wyatt <[email protected]>

Data efficiency library update (microsoft#2866)

7c99def

* data efficiency library update * data efficiency library update * data efficiency update * data efficiency update

Make z3 respect comm dtype (microsoft#2807)

81b4d5d

* Make z3 respect comm dtype * Support fp32 comm dtype * Remove obsolete assert * Code cleanup

AutoTP tutorial web formatting and news (microsoft#2883)

b47c592

Co-authored-by: Jeff Rasley <[email protected]>

Remove deprecated torch._six imports (microsoft#2863)

d3de737

* Remove deprecated `torch._six` imports Closes microsoft#2845. * Support older versions of PyTorch as well. --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

Reduce I/O size (microsoft#2814)

8710f05

add missing license info to top of all source code (microsoft#2889)

da84e60

Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

better eval sampler (microsoft#2907)

f1d2a15

Co-authored-by: Olatunji Ruwase <[email protected]>

Fix CPUAdam for when vendor_id_raw is not provided (microsoft#2836)

9886d6d

* microsoft#1213: Fix CPUAdam for when `vendor_id_raw` is not provided * formatting (yapf) fix --------- Co-authored-by: Olatunji Ruwase <[email protected]>

Always convert input mask to half (microsoft#2851)

17fa087

Fixes AttributeError in microsoft#2853 (microsoft#2854)

91d7090

Updates `deepspeed/monitor/monitor.py` to instantiate objects with correct configs Relevant issue: microsoft#2853 Co-authored-by: Olatunji Ruwase <[email protected]>

TP unsupported models and assertions (microsoft#2810)

4ae3a3d

Co-authored-by: Jeff Rasley <[email protected]>

AutoTP Assert Kernel Injection Support (microsoft#2939)

2ede0d9

* check kernel injection supported models * Clarify why user should use kernel injection

Check for local CUDA graphs when enable_cuda_graph=True (microsoft#2941)

87eaf8f

Improve overflow handling (microsoft#2944)

80d8fcb

Co-authored-by: Jeff Rasley <[email protected]>

[RFC] add device abstraction to allow other device than CUDA be used (m…

0acf7e9

…icrosoft#2221) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>

deepspeed.init_distributed() support for TCP protocols (microsoft#2905)

db15ef5

Co-authored-by: Jeff Rasley <[email protected]>

bump to 0.8.3

d58b4df

bug fix for skipping mbs (microsoft#2171)

6379def

Co-authored-by: Rajhans Samdani <[email protected]>

Fix issue between our abstract accelerator and colossalai's version o…

58a4a4d

…f op_builder (microsoft#2963) Co-authored-by: Logan Adams <[email protected]>

[zero] prevent poor configs from running w. zero-offload (microsoft#2971

457850d

)

Merge branch 'master' into update_deeperspeed_final

1622ff0

Quentin-Anthony requested a review from StellaAthena as a code owner March 9, 2023 22:49

Merge branch 'main' into update_deeperspeed_final

7f2d926

Quentin-Anthony merged commit fdfb825 into main Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update deeperspeed final #46

Update deeperspeed final #46

Quentin-Anthony commented Mar 9, 2023

Update deeperspeed final #46

Update deeperspeed final #46

Conversation

Quentin-Anthony commented Mar 9, 2023