-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline parallelism and gradient checkpointing (edit: and ZeRO 2!) don’t work together #62
Comments
Update: we can now use any two of the following three options: ZeRO Stage 2, Parallel Pipelining, and Activation Checkpointing. If all three are enabled, it throws the following error:
if you open the To turn off activation checkpointing, set To turn off pipelining, run To turn off ZeRO Stage 2, use |
I ran all pairs of 3 and the results are as follows. Zero2+pipeline: does not work (contiguous gradients both on and off) |
All of the errors and warnings that occur for zero2+pipeline:
(this one shows up 4 times)
(this one pops up 4 times and I think 2 of them got mangled together here) As well as the warning that stella mentioned. |
With contiguous gradients off, the |
Checkpoint+pipeline works with both continuous gradients on and off. Therefore, I don't think it's a major factor for zero2 breaking, but I'll keep it off for the remainder of my tests. |
Focusing on the The only place where |
So the only place where |
With the patch applied, Zero2+pipeline now works Zero2+checkpoint+pipeline now works |
Profiling results: patched, zero2+checkpoint+pipeline: samples/sec: 1159.741, max vram: 3245MiB |
With DeepSpeed's updates this seems to run just fine. The question of if it runs efficiently is still open though. |
Turns out we weren’t using gradient checkpointing at all! You can add checkpointing to the params without initializing the checkpointer and you can initialize the checkpointer without actually using it! #90 should actually implement gradient checkpointing. |
Pipeline parallelism and gradient checkpointing both work when you use them individually. However when you turn them both on you get a mysterious
KeyError: 0
from somewhere deep in DeepSpeed.The text was updated successfully, but these errors were encountered: