Implement the MPU from Megatron #75

StellaAthena · 2021-01-21T22:06:31Z

The Megatron code contains a "MPU" library. MPU stands for "model parallelism unit." The purpose of an MPU is to allow custom tensor slicing across GPUs. DeepSpeed allows you to hook up a MPU, but doesn't provide one. The goal is to convert the MPU from Megatron to GPT-NeoX. This is a modified clone of Megatron: https://github.com/EleutherAI/MegatronPipeline

You may find the (minimalistic) descriptions DeepSpeed provides helpful:
https://www.deepspeed.ai/features/#model-parallelism
https://www.deepspeed.ai/tutorials/megatron/

The full DeepSpeed docs can be found here: https://deepspeed.readthedocs.io/en/latest/index.html

StellaAthena · 2021-02-15T01:01:57Z

Superseded by codebase refactoring.

StellaAthena added the feature request New feature or request label Jan 21, 2021

StellaAthena added this to To do in 1T or BUST via automation Jan 21, 2021

StellaAthena assigned umbra-scientia Jan 21, 2021

glebshevchukk mentioned this issue Jan 23, 2021

Added MPU from Sid's MegatronPipeline #88

Closed

StellaAthena moved this from To do to In progress in 1T or BUST Jan 23, 2021

StellaAthena linked a pull request Jan 23, 2021 that will close this issue

Added MPU from Sid's MegatronPipeline #88

Closed

StellaAthena closed this as completed Feb 15, 2021

1T or BUST automation moved this from In progress to Done Feb 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the MPU from Megatron #75

Implement the MPU from Megatron #75

StellaAthena commented Jan 21, 2021

StellaAthena commented Feb 15, 2021

Implement the MPU from Megatron #75

Implement the MPU from Megatron #75

Comments

StellaAthena commented Jan 21, 2021

StellaAthena commented Feb 15, 2021