Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] zero3_init_flag should be True by default for stage 3 #922

Closed
stas00 opened this issue Dec 14, 2022 · 6 comments
Closed
Assignees
Labels
enhancement New feature or request feature request Request for a new feature to be added to Accelerate

Comments

@stas00
Copy link
Contributor

stas00 commented Dec 14, 2022

This feature request is about deepspeed's plugin's zero3_init_flag here (and a few related issues):

zero3_init_flag: bool = field(
default=None,
metadata={
"help": "Flag to indicate whether to enable `deepspeed.zero.Init` for constructing massive models."
"Only applicable with ZeRO Stage-3."
},
)

  1. IMHO, this flag should be True by default as zero stage 3 is for large models, it's very unlikely the user will be able to load those models w/o zero.Init, so why not help users to just have one less thing to figure out.

    I realize that one may want not to use deepspeed.zero.Init to save the hassle of adding deepspeed.zero.GatheredParameters during init_weights and some other early setup stage in some special models so that the model remains unsharded till the last moment. But this is very rare situation. And I guess in those cases a user may want to turn it off sometimes, but it's unlikely to work for any largish model.

  2. Also there is a documentation ambiguity: what does: default=None, mean when defining the default behavior in the docs? This is super confusing - is it Off by default? after reading the code I see later it's set to False but how is the user to know that? The doc here says None:
    https://huggingface.co/docs/accelerate/package_reference/deepspeed#accelerate.DeepSpeedPlugin

    The relevant code that I needed to decipher to find out the default is here:

if self.zero3_init_flag is None:
self.zero3_init_flag = os.environ.get("DEEPSPEED_ZERO3_INIT", "false") == "true"
if self.zero3_init_flag and not self.hf_ds_config.is_zero3():
warnings.warn("DeepSpeed Zero3 Init flag is only applicable for ZeRO Stage 3. Setting it to False.")
self.zero3_init_flag = False

  1. same ambiguity applies to zero3_save_16bit_model (says None) as well - the user has no way of telling what the default is.

  2. (already going into possibly another Issue: all the Accelerate DEEPSPEED_* env vars probably should be ACCELERATE_DEEPSPEED_* env vars as in a recent rename of all Accelerate env vars, but these got missed) - please let me know if you prefer a separate issue about it.

Thank you.

@stas00 stas00 changed the title [feature request] [feature request] zero3_init_flag should be True by default Dec 14, 2022
@muellerzr muellerzr added enhancement New feature or request feature request Request for a new feature to be added to Accelerate labels Dec 14, 2022
@stas00 stas00 changed the title [feature request] zero3_init_flag should be True by default [feature request] zero3_init_flag should be True by default for stage 3 Dec 14, 2022
@stas00 stas00 changed the title [feature request] zero3_init_flag should be True by default for stage 3 [feature request] zero3_init_flag should be True by default for stage 3 Dec 14, 2022
@stas00
Copy link
Contributor Author

stas00 commented Dec 15, 2022

btw, actually, I wanted to also say thanks for making the zero.Init functionality configurable, as I'm using it right now while debugging z3 with m4 which doesn't yet quite work due to nested zero.Init calls (via nested from_pretrained).

In the HF Trainer it's always on for Z3.

@pacman100
Copy link
Contributor

Hello Stas, Thank you for the feature requests, suggestions and queries. The above PR addresses these.

With respect to points 2 and 3 regarding documentation ambiguity, wanted to mention that users conventionally setup deespeed config via accelerate config command & answer the questionnaire and because of this zero3_init_flag and zero3_save_16bit_model values are never None.

@stas00
Copy link
Contributor Author

stas00 commented Dec 16, 2022

Thank you for the PR, @pacman100

How does one accelerate config command address many different projects with various needs? Unless one runs this for each project but then the need can change through the project as well, so I'm not sure how that would substitute clear documentation.

As a different use-case at m4 we use accelerate_config.yaml template and then manually adjust flags as needed, which is much faster than running an interactive config, especially since during experimentation these various flags change. So it's very helpful when the docs clearly state what the default are. I already see our template includes a bunch of flags that we don't use like fsdp, but I don't know if it's safe to remove it, if I don't know what the default is. (I'm yet to check, not asking for an answer, just sharing the situation.)

@muellerzr
Copy link
Collaborator

Working on an in-depth tutorial / dive into the config.yaml will be something I'll be looking at in the new year btw to help with the CLI docs more, so hopefully I can address that then and @pacman100 can assist to make sure we get all the deepspeed stuff in there :)

@pacman100
Copy link
Contributor

Hello, are we good to close this issue?

@stas00
Copy link
Contributor Author

stas00 commented Dec 20, 2022

yes please and thank you for the awesome work, @pacman100!

p.s. you can always include Fixes: https://github.com/huggingface/accelerate/issues/922 in the OP of the PR to have the merged PR automatically close the corresponding issue (just in case this github trick is new to you, if it isn't please ignore)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request Request for a new feature to be added to Accelerate
Projects
None yet
Development

No branches or pull requests

3 participants