-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] zero3_init_flag
should be True
by default for stage 3
#922
Comments
True
by default
True
by defaultTrue
by default for stage 3
True
by default for stage 3zero3_init_flag
should be True
by default for stage 3
btw, actually, I wanted to also say thanks for making the In the HF Trainer it's always on for Z3. |
Hello Stas, Thank you for the feature requests, suggestions and queries. The above PR addresses these. With respect to points 2 and 3 regarding documentation ambiguity, wanted to mention that users conventionally setup deespeed config via |
Thank you for the PR, @pacman100 How does one As a different use-case at m4 we use |
Working on an in-depth tutorial / dive into the |
Hello, are we good to close this issue? |
yes please and thank you for the awesome work, @pacman100! p.s. you can always include |
This feature request is about deepspeed's plugin's
zero3_init_flag
here (and a few related issues):accelerate/src/accelerate/utils/dataclasses.py
Lines 381 to 387 in 7889ba6
IMHO, this flag should be
True
by default as zero stage 3 is for large models, it's very unlikely the user will be able to load those models w/ozero.Init
, so why not help users to just have one less thing to figure out.I realize that one may want not to use
deepspeed.zero.Init
to save the hassle of addingdeepspeed.zero.GatheredParameters
duringinit_weights
and some other early setup stage in some special models so that the model remains unsharded till the last moment. But this is very rare situation. And I guess in those cases a user may want to turn it off sometimes, but it's unlikely to work for any largish model.Also there is a documentation ambiguity: what does:
default=None,
mean when defining the default behavior in the docs? This is super confusing - is it Off by default? after reading the code I see later it's set toFalse
but how is the user to know that? The doc here saysNone
:https://huggingface.co/docs/accelerate/package_reference/deepspeed#accelerate.DeepSpeedPlugin
The relevant code that I needed to decipher to find out the default is here:
accelerate/src/accelerate/utils/dataclasses.py
Lines 452 to 456 in 7889ba6
same ambiguity applies to
zero3_save_16bit_model
(saysNone
) as well - the user has no way of telling what the default is.(already going into possibly another Issue: all the Accelerate
DEEPSPEED_*
env vars probably should beACCELERATE_DEEPSPEED_*
env vars as in a recent rename of all Accelerate env vars, but these got missed) - please let me know if you prefer a separate issue about it.Thank you.
The text was updated successfully, but these errors were encountered: