forked from microsoft/DeepSpeed
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update deeperspeed final #46
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* add quant unit test * add codeowner * format fix * fix undefined symbol: curandSetPseudoRandomGeneratorSeed * modify ref fn name and add comment * add comments * add 4bit quant 16groups * fix * modify groups in ref code * parameterize tensor shape * single param * detach tensor * remove -lcurand flag * add back -lcurand flag Co-authored-by: Ammar Ahmad Awan <[email protected]>
MOE residual matmul unit tests Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>
* Fix formatting * Remove redundant variable
Co-authored-by: Ammar Ahmad Awan <[email protected]>
* mem access for quantize kernel * format * format fp32 * modify quant kernel * modify quant kernel2 * modify format * format * fix comments in pytest * fix comments in pytest * format * rerun Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Connor Holmes <[email protected]>
Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
* Unify macro definitions and constants in a single file * Conversion utility implementation. * Fix reversion from formatting * Bugfixes after testing with correct DeepSpeed * Inline markers are available on both HIP + CUDA
Co-authored-by: Saeyeol Lee <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
…2358) Co-authored-by: Reza Yazdani <[email protected]>
* format * remove round fn
Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
) Co-authored-by: Jeff Rasley <[email protected]>
…t#2356) Co-authored-by: Olatunji Ruwase <[email protected]>
* Collect error messages in results.csv Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
* batch of refactored tests * more test refactoring * fp16 test refactor * more refactors * added DistributedFixture class * applied DistributedFixture to first batch of tests as a trial * added DistributedFixture test and documentation * last tests * fixes for refactored tests * remove subdirs in workflow files * fix pytest syntax error * fix another syntax error * update imports * use DistFixture with elastic checkpoint test * missing import * update to shared class tmpdir for elastic test * moved test files * avoid duplicate test file name * last refactor and moving test files * formatting * fix broken import * testing forked AMD tests * update abstract method * use blob storage for accelerate and transformers tests * upgrade torch for acclerate CI Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]>
* data efficiency library update * data efficiency library update * data efficiency update * data efficiency update
* Make z3 respect comm dtype * Support fp32 comm dtype * Remove obsolete assert * Code cleanup
* Modify table for compatible web format * Add tutorial links to navigation * Add news bit to main readme * Update docs/_tutorials/automatic-tensor-parallelism.md Co-authored-by: Michael Wyatt <[email protected]> --------- Co-authored-by: Michael Wyatt <[email protected]>
* Check device count before running dist tests * fixing format for "Check device count before running dist tests" * Check device count against max world size * Check GPU count before launching dist tests * double-check GPU actually exists --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
* Remove deprecated `torch._six` imports Closes microsoft#2845. * Support older versions of PyTorch as well. --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
* Enable tensor fragments for zero 2 * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Support offload * Support multi-gpu * Cleanup * WIP * Update deepspeed/runtime/zero/stage3.py Co-authored-by: Stas Bekman <[email protected]> * Support padding * Update deepspeed/runtime/zero/stage3.py Co-authored-by: Stas Bekman <[email protected]> * z3 optimizer state support; aligned api * Support frozen z3 params * Unit tests * Check NVMe offload capability * Formatting * Docs * More docs * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: Stas Bekman <[email protected]> * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: Stas Bekman <[email protected]> * More docs * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * More docs * Support unsharded fp32 grad * Remove debug prints * Fix off-by-one detection of empty grads * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: Stas Bekman <[email protected]> * Update deepspeed/runtime/zero/stage3.py Co-authored-by: Stas Bekman <[email protected]> * Fix off-by-one error * Skip ranks with no gradient data * Formatting * Add license * Fix license --------- Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
This PR updates the replace_fn function when loading inference checkpoints. The container will now be passed to the load_model_with_checkpoint() so we can call load_params() from there. load_params() is also updated to access the variables in the policy.
* microsoft#1213: Fix CPUAdam for when `vendor_id_raw` is not provided * formatting (yapf) fix --------- Co-authored-by: Olatunji Ruwase <[email protected]>
Updates `deepspeed/monitor/monitor.py` to instantiate objects with correct configs Relevant issue: microsoft#2853 Co-authored-by: Olatunji Ruwase <[email protected]>
* MPICH support * MPICH changes * MPICH changes * MPICH changes * MPICH changes * accelerator runtime modifications * Accelerator runtime changes * Accelerator runtime modifications * Remove redundant print from single node * Move hostfile to tmp * Code cleanup for MPICH class * Code cleanup, rm whitespace * Removing mpiexec environment check details * Not needed tmp hostfile as pass directly * Remove debugging comments * rm print statement * Revert comm changes as WA not needed * Use MPICHRunner name for class * Use MPICHRunner as class name * No need to use args.force_multi and args.launcher . This should be set in deepspeedexamples gpt-3.6b .sh script as: $launcher=MPICH run_cmd=" deepspeed --hostfile=${hostfile_ds} --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --launcher=${launcher} --force_multi pretrain_gpt2.py $@ ${gpt_options}" * Adhere to code pattern * Rm empty lines in MPICHRunner class * Uncomment check for num nodes and workers when used hostfile_deepspeed in gpt-3.6b.sh * pass MPICH hostfile through launcher_args in gpt-3.6b.sh * Clean code and remove args hostfile * fix merge * fix merge --------- Co-authored-by: Abhilash Majumder <[email protected]> * clean up and fix format * add ut --------- Co-authored-by: Abhilash Majumder <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
* check kernel injection supported models * Clarify why user should use kernel injection
Co-authored-by: Jeff Rasley <[email protected]>
…icrosoft#2221) Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Rajhans Samdani <[email protected]>
…f op_builder (microsoft#2963) Co-authored-by: Logan Adams <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.