Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why doesn't 1-bit adam improve pipeline parallel speed? #206

Closed
sdtblck opened this issue Apr 5, 2021 · 0 comments
Closed

Why doesn't 1-bit adam improve pipeline parallel speed? #206

sdtblck opened this issue Apr 5, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@sdtblck
Copy link
Contributor

sdtblck commented Apr 5, 2021

I made several changes to deepspeed's onebit adam to get it working with pipeline parallel models, but it doesn't seem to improve speed with pipeline parallel at all.

This could just be because we're not bottlenecked by data parallel in the current scheme, or something else.

Conglong Li (1-bit adam dev) thought it may be to do with the fact that 1-bit adam was calling compressed_allreduce many times per step, adding a lot of overhead, but testing confirmed this wasn't the case (it was only calling it once or twice per step depending on the number of parameter groups). see this issue thread microsoft/DeepSpeed#818 (comment)

@StellaAthena StellaAthena added the bug Something isn't working label Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants