-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the MPU from Megatron #75
Comments
Superseded by codebase refactoring. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The Megatron code contains a "MPU" library. MPU stands for "model parallelism unit." The purpose of an MPU is to allow custom tensor slicing across GPUs. DeepSpeed allows you to hook up a MPU, but doesn't provide one. The goal is to convert the MPU from Megatron to GPT-NeoX. This is a modified clone of Megatron: https://github.com/EleutherAI/MegatronPipeline
You may find the (minimalistic) descriptions DeepSpeed provides helpful:
https://www.deepspeed.ai/features/#model-parallelism
https://www.deepspeed.ai/tutorials/megatron/
The full DeepSpeed docs can be found here: https://deepspeed.readthedocs.io/en/latest/index.html
The text was updated successfully, but these errors were encountered: