[feature request] `zero3_init_flag` should be `True` by default for stage 3 #922

stas00 · 2022-12-14T19:49:03Z

This feature request is about deepspeed's plugin's zero3_init_flag here (and a few related issues):

accelerate/src/accelerate/utils/dataclasses.py

Lines 381 to 387 in 7889ba6

 zero3_init_flag: bool = field( 

 default=None, 

 metadata={ 

 "help": "Flag to indicate whether to enable `deepspeed.zero.Init` for constructing massive models." 

 "Only applicable with ZeRO Stage-3." 

 }, 

 )

IMHO, this flag should be True by default as zero stage 3 is for large models, it's very unlikely the user will be able to load those models w/o zero.Init, so why not help users to just have one less thing to figure out.

I realize that one may want not to use deepspeed.zero.Init to save the hassle of adding deepspeed.zero.GatheredParameters during init_weights and some other early setup stage in some special models so that the model remains unsharded till the last moment. But this is very rare situation. And I guess in those cases a user may want to turn it off sometimes, but it's unlikely to work for any largish model.
Also there is a documentation ambiguity: what does: default=None, mean when defining the default behavior in the docs? This is super confusing - is it Off by default? after reading the code I see later it's set to False but how is the user to know that? The doc here says None:
https://huggingface.co/docs/accelerate/package_reference/deepspeed#accelerate.DeepSpeedPlugin

The relevant code that I needed to decipher to find out the default is here:

accelerate/src/accelerate/utils/dataclasses.py

Lines 452 to 456 in 7889ba6

 if self.zero3_init_flag is None: 

 self.zero3_init_flag = os.environ.get("DEEPSPEED_ZERO3_INIT", "false") == "true" 

 if self.zero3_init_flag and not self.hf_ds_config.is_zero3(): 

 warnings.warn("DeepSpeed Zero3 Init flag is only applicable for ZeRO Stage 3. Setting it to False.") 

 self.zero3_init_flag = False

same ambiguity applies to zero3_save_16bit_model (says None) as well - the user has no way of telling what the default is.
(already going into possibly another Issue: all the Accelerate DEEPSPEED_* env vars probably should be ACCELERATE_DEEPSPEED_* env vars as in a recent rename of all Accelerate env vars, but these got missed) - please let me know if you prefer a separate issue about it.

Thank you.

The text was updated successfully, but these errors were encountered:

stas00 · 2022-12-15T21:00:17Z

btw, actually, I wanted to also say thanks for making the zero.Init functionality configurable, as I'm using it right now while debugging z3 with m4 which doesn't yet quite work due to nested zero.Init calls (via nested from_pretrained).

In the HF Trainer it's always on for Z3.

pacman100 · 2022-12-16T18:03:31Z

Hello Stas, Thank you for the feature requests, suggestions and queries. The above PR addresses these.

With respect to points 2 and 3 regarding documentation ambiguity, wanted to mention that users conventionally setup deespeed config via accelerate config command & answer the questionnaire and because of this zero3_init_flag and zero3_save_16bit_model values are never None.

stas00 · 2022-12-16T18:09:11Z

Thank you for the PR, @pacman100

How does one accelerate config command address many different projects with various needs? Unless one runs this for each project but then the need can change through the project as well, so I'm not sure how that would substitute clear documentation.

As a different use-case at m4 we use accelerate_config.yaml template and then manually adjust flags as needed, which is much faster than running an interactive config, especially since during experimentation these various flags change. So it's very helpful when the docs clearly state what the default are. I already see our template includes a bunch of flags that we don't use like fsdp, but I don't know if it's safe to remove it, if I don't know what the default is. (I'm yet to check, not asking for an answer, just sharing the situation.)

muellerzr · 2022-12-16T18:14:14Z

Working on an in-depth tutorial / dive into the config.yaml will be something I'll be looking at in the new year btw to help with the CLI docs more, so hopefully I can address that then and @pacman100 can assist to make sure we get all the deepspeed stuff in there :)

pacman100 · 2022-12-20T09:26:23Z

Hello, are we good to close this issue?

stas00 · 2022-12-20T23:39:58Z

yes please and thank you for the awesome work, @pacman100!

p.s. you can always include Fixes: https://github.com/huggingface/accelerate/issues/922 in the OP of the PR to have the merged PR automatically close the corresponding issue (just in case this github trick is new to you, if it isn't please ignore)

stas00 changed the title ~~[feature request]~~ [feature request] zero3_init_flag should be True by default Dec 14, 2022

muellerzr assigned pacman100 Dec 14, 2022

muellerzr added enhancement New feature or request feature request Request for a new feature to be added to Accelerate labels Dec 14, 2022

stas00 changed the title ~~[feature request] zero3_init_flag should be True by default~~ [feature request] zero3_init_flag should be True by default for stage 3 Dec 14, 2022

stas00 changed the title ~~[feature request] zero3_init_flag should be True by default for stage 3~~ [feature request] zero3_init_flag should be True by default for stage 3 Dec 14, 2022

This was referenced Dec 16, 2022

ds-z3-init and prepending ds env variables with ACCELERATE_ #927

Closed

ds-z3-init and prepending ds env variables with ACCELERATE_ #928

Merged

stas00 closed this as completed Dec 20, 2022

pacman100 mentioned this issue Dec 30, 2022

raise error for duplicate accelerate config values when using deepspeed_config_file #941

Merged

R0n12 mentioned this issue Mar 18, 2024

Large model instantiation using DeepSpeed.zero.Init under ZeRO-3 EleutherAI/gpt-neox#1189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] `zero3_init_flag` should be `True` by default for stage 3 #922

[feature request] `zero3_init_flag` should be `True` by default for stage 3 #922

stas00 commented Dec 14, 2022 •

edited

Loading

stas00 commented Dec 15, 2022

pacman100 commented Dec 16, 2022

stas00 commented Dec 16, 2022 •

edited

Loading

muellerzr commented Dec 16, 2022

pacman100 commented Dec 20, 2022

stas00 commented Dec 20, 2022 •

edited

Loading

[feature request] zero3_init_flag should be True by default for stage 3 #922

[feature request] zero3_init_flag should be True by default for stage 3 #922

Comments

stas00 commented Dec 14, 2022 • edited Loading

stas00 commented Dec 15, 2022

pacman100 commented Dec 16, 2022

stas00 commented Dec 16, 2022 • edited Loading

muellerzr commented Dec 16, 2022

pacman100 commented Dec 20, 2022

stas00 commented Dec 20, 2022 • edited Loading

[feature request] `zero3_init_flag` should be `True` by default for stage 3 #922

[feature request] `zero3_init_flag` should be `True` by default for stage 3 #922

stas00 commented Dec 14, 2022 •

edited

Loading

stas00 commented Dec 16, 2022 •

edited

Loading

stas00 commented Dec 20, 2022 •

edited

Loading