-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove global vars #277
Remove global vars #277
Conversation
…nto remove_global_vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing! Great job.
logits = logits[:, -1].view(batch_size, -1).contiguous() | ||
|
||
if args.greedy: | ||
# we have to use neox_args instead of kwargs here because deepspeed :| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts this bugfix here #253
|
||
local_rank = os.environ.get("LOCAL_RANK") | ||
if local_rank is None: | ||
print("utils.local_rank() environment variable LOCAL_RANK not set, defaulting to 0", flush=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should default to -1 here? (torch / deepspeed's default if local rank isn't set, i.e we're not distributed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some functions rely on rank being equal to 0 (e.g. tensorboard initialization
gpt-neox/megatron/neox_arguments/arguments.py
Line 106 in 319ad17
if self.tensorboard_dir and self.rank == 0: |
Line 17 in 319ad17
def print_rank_0(*message): |
This pull requests removes the global variables. Exception are mpu global variables. These are more complicated and potentially breaking due to being used in torch. autograd.Function implementations.
Train for some steps on small config results in equal loss graphs:
Tests are adjusted to the new setup and pass: