-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DDP] DDP bucket memory release during fwd step #128696
Comments
Hi @lichenlu , this is a really good question! after discussion with @awgu , this is indeed a design question for DDP. just more curious about your workload. In your case, is memory peaking at forward instead of backward? I want to make sure releasing DDP buckets actually reduce the peak memory [Ignore original comment below that assume DDP freezes buckets]
The way you describe it |
Or do you need gpu memory for other things outside forward/backward/optim, but still within pytorch cuda caching allocator ? Just to confirm how much value it brings to free DDP bucket outside of bwd |
Yes, in our case, the memory peaking is at forward, out model is stable diffusion model, the peak memory is at the VAE encoder stage. |
Thanks for explaining this in detail. I will bring proposal to team for discussion |
It seems like this is a reasonable feature request on the surface. However it adds complexity to the DDP logic and would probably need one more 'flag' that users have to set and we have to maintain. This feature is positioned as a way to 'squeeze in' a large model with DDP. FSDP is our offering for data parallelism for large models. If FSDP could be used instead of DDP this might be a non-issue. Would FSDP be an option for this case? |
🚀 The feature, motivation and pitch
DDP bucket will always in GPU HBM,which size is same as the sum of module all weight gradients' size.
In fwd stage and optimizer stage, this memory is wasteful
I wonder know can this memory be release and malloc runtime for some larger module trainging such as stable diffusion?
Alternatives
No response
Additional context
No response
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @robieta @chaekit @aaronenyeshi @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise
The text was updated successfully, but these errors were encountered: