-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate tensor parallelism code to use OSLO #578
Comments
I will actively support this work. |
The main problem is that currently the model is loaded on the CPU and then moved to the GPU. OSLO was originally designed for transformers, and there was no way to pass downloaded checkpoints directly to the GPU in the transformers. (At least when I'm developing, so I didn't care about this) But we need to implement something like deepspeed.ZeroInit internally so that it's allocated to the GPU from scratch. I will try this right from tomorrow. |
@hyunwoongko actually in neox we also load onto the CPU and then move to the GPU, so i'm not sure this is a problem |
this is actually something we have a work-around for. I don't know if Transformers ever got around to merging it though. |
@sdtblck please check my branch. https://github.com/EleutherAI/gpt-neox/tree/kevin_new |
@sdtblck Did you check my branch? |
@hyunwoongko -- Would you like to restart this effort? |
Is your feature request related to a problem? Please describe.
Would be good to remove the megatron tensor parallelism code from NeoX, and OSLO currently has support for this, and a slightly nicer interface.
Describe the solution you'd like
Steps:
mpu
dependency from any internal code as much as possible. (so, anything that's currently anmpu.[Column|Row]ParallelLinear
ormpu.VocabParallelEmbedding
should be replaced with its plain pytorch equivalent (nn.Linear
/nn.Embedding
respectively).The text was updated successfully, but these errors were encountered: