-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeepSpeed bf16 configuration #787
Conversation
Signed-off-by: Dashiell Stander <[email protected]>
Signed-off-by: Dashiell Stander <[email protected]>
Since it seems like this is a dict with a single value, instead of making the user specify it wouldn’t it make more sense to have the code set the configs that DeepSpeed expects automatically when the precision is set to bf16? |
Signed-off-by: Dashiell Stander <[email protected]>
I think at it’s core my question is whether this is actually more intuitive than:
|
Signed-off-by: Dashiell Stander <[email protected]>
Signed-off-by: Dashiell Stander <[email protected]>
As things currently work we can't just have a The changes I've made (and didn't push all of last night, whoops) make it so that a user only has to set the I'm open to other approaches though. |
I still need to test this to confirm there aren't any stupid bugs, but otherwise @StellaAthena and @Quentin-Anthony I'd appreciate if you took a look at this when you have a moment. If people still don't like this--and honestly I don't love it--then I think we should open another issue where we figure out a way for people pass stuff straight into DeepSpeed without getting checked by NeoX or needing to have it be checked into our code. I know we've talked about it before. In that scenario we'd be able to just handle the default cases when people set |
Signed-off-by: Dashiell Stander <[email protected]>
Signed-off-by: Dashiell Stander <[email protected]>
eval_tasks/eval_adapter.py
Outdated
@@ -30,6 +30,7 @@ def _download_file(*args, **kwargs): | |||
import sys | |||
import dataclasses | |||
from functools import partial | |||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this change for?
|
||
Default = None | ||
|
||
Configuration for using bfloat16 floating-point format as an alternative to FP16. BFLOAT16 requires hardware support (e.g., NVIDIA A100). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why'd we remove these details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're tied to the specific bf16
config that you wanted removed. I can move them to the section on the precision
config?
For main DeepSpeed, setting the
precision
configuration parameter is insufficient to enable bf16 training. DeepSpeed uses a specialbf16
dict configuration to enable using bf16, as documented here. This adds that configuration to NeoX and main DeepSpeed does the rest of the work.