-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training without Pipeline Parallelism #5
Comments
Hi, I have tested on a small model with pp=1 mp=1, but the output of model looks fine. https://github.com/floatingsnake/gpt-neox/blob/magma/mytests/test_model_build.py part of output:
|
As we abandond the sequential wrapper. and mp=1, pp=1 works will without it. We can reopen the issue when it is needed |
Yes, I had changed that line to test sequential wrapper. But yeah, solving this is not high priority for now since we are moving away from the sequential wrapper :) |
When training without pipeline parallelism, the sequential wrapper is used: https://github.com/floatingsnake/gpt-neox/blob/magma/megatron/training.py#L461. Code for to_sequential: https://github.com/floatingsnake/gpt-neox/blob/magma/megatron/model/gpt2_model.py#L343
However, all the adapters added are lost when this is done.
This is probably because the model is rebuilt with self.specs which wasnt updated when the adapters are added.
The text was updated successfully, but these errors were encountered: