Initial implementation of semi-supervised training #58

norabelrose · 2023-02-13T18:01:24Z

Draft PR, fixes #54. As a bonus, also partially fixes #36 (for DDP only)

FabienRoger

lgtm
(haven't tested it since I don't have multiple GPUs)

elk/extraction/prompt_collator.py

elk/extraction/extraction_main.py

FabienRoger · 2023-02-14T15:46:19Z

elk/training/ccs.py

- self.loss = js_loss if loss == "js" else ccs_squared_loss
+ self.unsupervised_loss = js_loss if loss == "js" else ccs_squared_loss
+ self.supervised_weight = supervised_weight


This logic should probably happen outside of the CCS class, which could simply take a loss fn as argument. But this cleanup can wait a future PR.

norabelrose · 2023-02-14T16:06:33Z

elk/extraction/extraction.py

@@ -9,7 +9,7 @@
 import torch.distributed as dist


-@torch.autocast("cuda", enabled=torch.cuda.is_available())
+@torch.autocast("cuda", enabled=torch.cuda.is_available()) # type: ignore


I'm not seeing typing issues here. Are you using mypy?

I think this was a pylance error. I'll check

Pylance
Pytorch 1.13.1+cu117

I don't see any issues here

elk/training/ccs.py

elk/training/train.py

This reverts commit 47a6c76.

FabienRoger · 2023-02-14T16:51:20Z

Another issue I just caught: shuffling now happens after selection of the first max_examples, which means that for datasets where points are sorted by label (such as imdb) you won't be able to use low number of max examples.

I failed to see some problems

norabelrose · 2023-02-14T16:54:05Z

Another issue I just caught: shuffling now happens after selection of the first max_examples, which means that for datasets where points are sorted by label (such as imdb) you won't be able to use low number of max examples.

Fixed

FabienRoger · 2023-02-14T16:55:14Z

elk/extraction/prompt_collator.py

- self.dataset = self.dataset.select(range(max_examples))
 if dist.is_initialized():
 self.dataset = self.dataset.shard(dist.get_world_size(), dist.get_rank())

 self.dataset = self.dataset.shuffle(seed=seed)
+ if max_examples:
+ self.dataset = self.dataset.select(range(max_examples))


Doesn't that select world_size x max_examples examples?

lauritowal · 2023-02-14T19:04:49Z

elk/extraction/extraction.py

@@ -9,7 +9,7 @@
 import torch.distributed as dist


-@torch.autocast("cuda", enabled=torch.cuda.is_available())
+@torch.autocast("cuda", enabled=torch.cuda.is_available()) # type: ignore


I don't see any issues here

lauritowal · 2023-02-14T19:19:07Z

elk/training/ccs.py

@@ -85,6 +90,9 @@ def reset_parameters(self):
 for layer in self.probe:
 if isinstance(layer, nn.Linear):
 layer.reset_parameters()
+ elif self.init == "zero":


Just curious, why did you add this?

I noticed that you tend to get non-trivial probes with high accuracy even without any confidence loss, and I wanted to test the hypothesis that it's due to an initialization issue where it's hard for the optimizer to get to the trivial solution of outputting 0.5 all the time. Turns out to be true. See messages in the ELK channel a couple days ago

lauritowal · 2023-02-14T19:39:15Z

elk/utils.py

+import torch.nn as nn
+
+
+def maybe_all_gather(x: Tensor) -> Tensor:


probably a dumb quesiton, but why the "maybe"s? Is it because you have maybe multiple processes or maybe just one?

Yeah it's because these methods do nothing when the script isn't run with torchrun

Initial implementation of semi-supervised training

3a97312

norabelrose marked this pull request as draft February 13, 2023 18:02

Full DDP support

421996c

norabelrose requested a review from Benw8888 February 14, 2023 01:40

Bug fixes

2cadb28

norabelrose marked this pull request as ready for review February 14, 2023 06:35

FabienRoger previously approved these changes Feb 14, 2023

View reviewed changes

FabienRoger added 4 commits February 14, 2023 15:52

Add .item() where necessary

bd34f96

Use maybe all gather instead of cat

522b95a

Shuffle before sharding

7f1b5af

Fix typing problems

73f4fd5

norabelrose commented Feb 14, 2023

View reviewed changes

elk/training/ccs.py Outdated Show resolved Hide resolved

Revert Fabien's change to shuffle/shard ordering

21272cf

norabelrose requested a review from lauritowal February 14, 2023 16:13

Send labels to cpu before Linear Regression

47a6c76

norabelrose commented Feb 14, 2023

View reviewed changes

elk/training/train.py Outdated Show resolved Hide resolved

FabienRoger and others added 2 commits February 14, 2023 16:44

Revert "Send labels to cpu before Linear Regression"

63848ba

This reverts commit 47a6c76.

Update README for multi-gpu

83ce2a8

norabelrose added 2 commits February 14, 2023 16:52

Fix README typo

51d7b19

First shuffle, then select max_examples examples

ba9a0d2

FabienRoger reviewed Feb 14, 2023

View reviewed changes

norabelrose added 2 commits February 14, 2023 16:55

Fix the fix to the typo

fdc94a2

Add --supervised-weight cli arg

ba91ba4

lauritowal approved these changes Feb 14, 2023

View reviewed changes

Silence HF warnings from other ranks

7037297

norabelrose merged commit 32e633b into main Feb 14, 2023

norabelrose deleted the semi-supervised branch February 14, 2023 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of semi-supervised training #58

Initial implementation of semi-supervised training #58

norabelrose commented Feb 13, 2023 •

edited

Loading

FabienRoger left a comment

FabienRoger Feb 14, 2023

norabelrose Feb 14, 2023

FabienRoger Feb 14, 2023

FabienRoger Feb 14, 2023

lauritowal Feb 14, 2023

FabienRoger commented Feb 14, 2023

norabelrose commented Feb 14, 2023

FabienRoger Feb 14, 2023

norabelrose Feb 14, 2023

lauritowal Feb 14, 2023

lauritowal Feb 14, 2023

norabelrose Feb 14, 2023

lauritowal Feb 14, 2023

norabelrose Feb 14, 2023

		import torch.nn as nn


		def maybe_all_gather(x: Tensor) -> Tensor:

Initial implementation of semi-supervised training #58

Initial implementation of semi-supervised training #58

Conversation

norabelrose commented Feb 13, 2023 • edited Loading

FabienRoger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabienRoger commented Feb 14, 2023

norabelrose commented Feb 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

norabelrose commented Feb 13, 2023 •

edited

Loading