You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I made several changes to deepspeed's onebit adam to get it working with pipeline parallel models, but it doesn't seem to improve speed with pipeline parallel at all.
This could just be because we're not bottlenecked by data parallel in the current scheme, or something else.
Conglong Li (1-bit adam dev) thought it may be to do with the fact that 1-bit adam was calling compressed_allreduce many times per step, adding a lot of overhead, but testing confirmed this wasn't the case (it was only calling it once or twice per step depending on the number of parameter groups). see this issue thread microsoft/DeepSpeed#818 (comment)
The text was updated successfully, but these errors were encountered:
I made several changes to deepspeed's onebit adam to get it working with pipeline parallel models, but it doesn't seem to improve speed with pipeline parallel at all.
This could just be because we're not bottlenecked by data parallel in the current scheme, or something else.
Conglong Li (1-bit adam dev) thought it may be to do with the fact that 1-bit adam was calling
compressed_allreduce
many times per step, adding a lot of overhead, but testing confirmed this wasn't the case (it was only calling it once or twice per step depending on the number of parameter groups). see this issue thread microsoft/DeepSpeed#818 (comment)The text was updated successfully, but these errors were encountered: