Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DeepSpeed (ZeRO2 + Pipeline Parallel) #67

Closed
StellaAthena opened this issue Jan 16, 2021 · 1 comment
Closed

Fix DeepSpeed (ZeRO2 + Pipeline Parallel) #67

StellaAthena opened this issue Jan 16, 2021 · 1 comment
Assignees
Labels
bug Something isn't working help wanted This issue needs assistance
Projects

Comments

@StellaAthena
Copy link
Member

StellaAthena commented Jan 16, 2021

There is an issue with the DeepSpeed library that prevents you from using Pipeline Parallelism and ZeRO Stage 2 at the same time. @leogao2 has a rudimentary patch that allows the code to run (see here) but it causes a significant slowdown. We need to figure out how to do this better. For additional on the problem at hand, see #62

Profiling results from initial patch:

  • patched, zero2+checkpoint+pipeline: samples/sec: 1159.741, max vram: 3245MiB
  • patched, zero2+checkpoint: samples/sec: 1120.8568733324405, max vram: 1704MiB
@StellaAthena StellaAthena added bug Something isn't working help wanted This issue needs assistance labels Jan 16, 2021
@StellaAthena StellaAthena added this to To do in 1T or BUST via automation Jan 16, 2021
1T or BUST automation moved this from To do to Done Jan 16, 2021
@g-karthik
Copy link

@StellaAthena @leogao2 How is this a significant slowdown? Looks like the SamplesPerSec are quite high, w/ and w/o PP.

Also, what do you attribute the reduction in VRAM to in the case w/o PP?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted This issue needs assistance
Projects
Development

No branches or pull requests

5 participants