Train reporters for different layers in parallel #86

norabelrose · 2023-02-18T10:05:22Z

Fixes #64, based on draft PR #82

for more information, see https://pre-commit.ci

AlexTMallen

It looks good to me overall, except for the somewhat obscure handling of layer indices. It's not super critical to clarify that though since this is a temporary state of affairs while the layer information isn't available.

elk/training/train.py

AlexTMallen

lgtm

lauritowal · 2023-02-19T06:37:24Z

elk/extraction/extraction_main.py

 extract(args, "train")
+ maybe_barrier()
+ print("wow")


I guess the wow thing shoudld be removed (?)

oh sorry that was a stupid debugging thing LOL

lauritowal · 2023-02-19T06:37:42Z

elk/extraction/extraction_main.py

 extract(args, "validation")

- if rank == 0:
+ print("hi")


lauritowal · 2023-02-19T06:40:13Z

elk/list.py

@@ -19,8 +19,11 @@ def list_runs(args):
 )
 for timestamp, run in subfolders:
 # Read the arguments used to run this experiment
- with open(run / "args.json", "r") as f:
- run_args = json.load(f)
+ try:


Isn't it cleaner to check for the file with an if instead of using a try / except and then continue?

The reason I did it this way is that args.json really should exist, it's an anomaly if it doesn't exist. Arguably we should be indicating the error somewhere but elk list is going to change a lot soon anyway so I can't be bothered

Ansh Radhakrishnan and others added 8 commits February 16, 2023 17:37

Initial draft

0d419ce

Merge branch 'main' of https://github.com/EleutherAI/elk into ansh/mp

bca0d96

[pre-commit.ci] auto fixes from pre-commit.com hooks

a8c3f2f

for more information, see https://pre-commit.ci

Add type annotations

59548c6

Fix __main__.py to match upstream

7adede0

Remove spurious comment

745216e

Training now seems to work

ffa88a4

Merge remote-tracking branch 'origin/main' into train-parallelism

557c75a

norabelrose requested a review from AlexWan0 February 18, 2023 10:06

norabelrose mentioned this pull request Feb 18, 2023

(WIP) Use torch.multiprocessing to parallelize probe training across layers #82

Closed

norabelrose requested a review from lauritowal February 18, 2023 10:10

Remove unused import

bd62709

norabelrose requested a review from AlexTMallen February 18, 2023 19:22

Automatic GPU selection

c237438

AlexTMallen reviewed Feb 19, 2023

View reviewed changes

elk/training/train.py Show resolved Hide resolved

elk/training/train.py Show resolved Hide resolved

norabelrose added 2 commits February 19, 2023 03:03

Add --layer-stride warning; fix extraction bugs

3b420e0

Add extra synchronization barriers in extraction

abf4f9d

AlexTMallen approved these changes Feb 19, 2023

View reviewed changes

lauritowal reviewed Feb 19, 2023

View reviewed changes

norabelrose added 2 commits February 19, 2023 08:11

Fix device string bug

2bbc78f

Remove debugging print statements

eda1710

norabelrose merged commit 3ae6307 into main Feb 19, 2023

norabelrose deleted the train-parallelism branch February 19, 2023 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train reporters for different layers in parallel #86

Train reporters for different layers in parallel #86

norabelrose commented Feb 18, 2023

AlexTMallen left a comment

AlexTMallen left a comment

lauritowal Feb 19, 2023

norabelrose Feb 19, 2023

lauritowal Feb 19, 2023

lauritowal Feb 19, 2023

norabelrose Feb 19, 2023

Train reporters for different layers in parallel #86

Train reporters for different layers in parallel #86

Conversation

norabelrose commented Feb 18, 2023

AlexTMallen left a comment

Choose a reason for hiding this comment

AlexTMallen left a comment

Choose a reason for hiding this comment

lauritowal Feb 19, 2023

Choose a reason for hiding this comment

norabelrose Feb 19, 2023

Choose a reason for hiding this comment

lauritowal Feb 19, 2023

Choose a reason for hiding this comment

lauritowal Feb 19, 2023

Choose a reason for hiding this comment

norabelrose Feb 19, 2023

Choose a reason for hiding this comment