Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove global vars #277

Merged
merged 68 commits into from
Apr 30, 2021
Merged

Remove global vars #277

merged 68 commits into from
Apr 30, 2021

Conversation

sweinbach
Copy link
Contributor

This pull requests removes the global variables. Exception are mpu global variables. These are more complicated and potentially breaking due to being used in torch. autograd.Function implementations.

Train for some steps on small config results in equal loss graphs:

image

Tests are adjusted to the new setup and pass:

image

@sweinbach sweinbach requested a review from a team as a code owner April 30, 2021 16:50
@sweinbach sweinbach linked an issue Apr 30, 2021 that may be closed by this pull request
Copy link
Member

@StellaAthena StellaAthena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! Great job.

@StellaAthena StellaAthena merged commit 56a4cd0 into main Apr 30, 2021
@StellaAthena StellaAthena deleted the remove_global_vars branch April 30, 2021 18:26
logits = logits[:, -1].view(batch_size, -1).contiguous()

if args.greedy:
# we have to use neox_args instead of kwargs here because deepspeed :|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reverts this bugfix here #253


local_rank = os.environ.get("LOCAL_RANK")
if local_rank is None:
print("utils.local_rank() environment variable LOCAL_RANK not set, defaulting to 0", flush=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to -1 here? (torch / deepspeed's default if local rank isn't set, i.e we're not distributed.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions rely on rank being equal to 0 (e.g. tensorboard initialization

if self.tensorboard_dir and self.rank == 0:
). Would we need to change these functions according to the print_rank_0 function (
def print_rank_0(*message):
)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get rid of global_vars
3 participants