-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Pipeline Parallelism #45
Comments
According to the DeepSpeed documentation,
Although our code is defined in terms of layers, our layers do not have this feedforward structure. It shouldn’t be much work to rearrange things, but there is the potential for it to be fiddly especially with the token and positional embeddings. Since we never write any forward passing code for the pipeline parallel mode, we may have to create a token embedding and a positional embedding layer. I should have time to try this out tomorrow. |
hey @StellaAthena , I think @anthony-dipofi is already working on this, apologies, meant to assign him. Maybe you can check in on his progress. |
Still working on this, but I pushed what I have currently to https://github.com/EleutherAI/gpt-neox/tree/pipeline_parrallel . The main changes were to create a new model class for generating the LayerSpec, but I tried to keep it as similar to the original model as possible. |
Running |
Yes, so the pipelining requires some changes to the training loop and how the data is represented by the Dataset class. I wasn't really sure how to integrate that with the other changes that are being made to the data loading so I just got it working with enwik8, which is whats in the train_enwik8_pipeline.py file. I think what should be required is making some changes in train.py similar to whats in train_en_wik8_pipeline.py, the main thing being using |
This sorta works, to the point where I am going to declare it done. However there are some problems, see #62 |
Should be fairly easy as our net is already expressed in terms of layers
https://www.deepspeed.ai/tutorials/pipeline/
The text was updated successfully, but these errors were encountered: