Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train reporters for different layers in parallel #86

Merged
merged 14 commits into from
Feb 19, 2023
Merged

Conversation

norabelrose
Copy link
Member

Fixes #64, based on draft PR #82

Copy link
Collaborator

@AlexTMallen AlexTMallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me overall, except for the somewhat obscure handling of layer indices. It's not super critical to clarify that though since this is a temporary state of affairs while the layer information isn't available.

elk/training/train.py Show resolved Hide resolved
elk/training/train.py Show resolved Hide resolved
Copy link
Collaborator

@AlexTMallen AlexTMallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

extract(args, "train")
maybe_barrier()
print("wow")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the wow thing shoudld be removed (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry that was a stupid debugging thing LOL

extract(args, "validation")

if rank == 0:
print("hi")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and hi

@@ -19,8 +19,11 @@ def list_runs(args):
)
for timestamp, run in subfolders:
# Read the arguments used to run this experiment
with open(run / "args.json", "r") as f:
run_args = json.load(f)
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it cleaner to check for the file with an if instead of using a try / except and then continue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it this way is that args.json really should exist, it's an anomaly if it doesn't exist. Arguably we should be indicating the error somewhere but elk list is going to change a lot soon anyway so I can't be bothered

@norabelrose norabelrose merged commit 3ae6307 into main Feb 19, 2023
@norabelrose norabelrose deleted the train-parallelism branch February 19, 2023 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize probe training across layers
3 participants