Support custom loss functions #17

norabelrose · 2023-02-02T18:00:32Z

We all want to change the CCS loss function in various ways, so we need a flexible way of defining and specifying loss functions.

We need some sort of API, maybe a class that can be inherited from, for defining entirely new loss functions programmatically and passing them into the CCS class. We also need a small library of predefined loss functions that can be accessed by name from the command line.

The custom losses need to be able to specify what inputs they take— for example, a conjunction/disjunction consistency loss would need hidden states from N independent propositions. Prompt invariance losses will take M different variants of the same proposition. We'll need some sort of data collation logic to piece together the prompts required by the given loss and then extract the hidden states from the model.

This is a big task which should probably be split into multiple PRs.

The text was updated successfully, but these errors were encountered:

FabienRoger · 2023-02-02T18:22:03Z

Additionally, you would want to train on multiple datasets at once? (either with different losses, or at least with different numbers of variants)

lauritowal · 2023-02-02T23:26:24Z

Additionally, you would want to train on multiple datasets at once? (either with different losses, or at least with different numbers of variants)

You mean train one probe on multiple datasets?

FabienRoger · 2023-02-03T00:12:49Z

Exactly. It can help

having a purely consistency based probe (because when the number of classes vary, an easy way to be consistent is to be truthlike)
exploring whether you can find a direction which is truthlike according to many datasets/losses simultaneously.

FabienRoger · 2023-02-03T00:13:34Z

But if it's too complicated, maybe start by having sth that works in simpler cases?

norabelrose assigned norabelrose, lauritowal and FabienRoger Feb 2, 2023

norabelrose mentioned this issue Feb 15, 2023

Rename "CCS" class to "Reporter" #65

Closed

norabelrose added this to the PyPI 0.2 Release milestone Feb 15, 2023

AlexTMallen self-assigned this Mar 6, 2023

AlexTMallen linked a pull request Mar 7, 2023 that will close this issue

Custom loss #111

Merged

norabelrose closed this as completed in #111 Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom loss functions #17

Support custom loss functions #17

norabelrose commented Feb 2, 2023 •

edited

Loading

FabienRoger commented Feb 2, 2023

lauritowal commented Feb 2, 2023 •

edited

Loading

FabienRoger commented Feb 3, 2023

FabienRoger commented Feb 3, 2023

Support custom loss functions #17

Support custom loss functions #17

Comments

norabelrose commented Feb 2, 2023 • edited Loading

FabienRoger commented Feb 2, 2023

lauritowal commented Feb 2, 2023 • edited Loading

FabienRoger commented Feb 3, 2023

FabienRoger commented Feb 3, 2023

norabelrose commented Feb 2, 2023 •

edited

Loading

lauritowal commented Feb 2, 2023 •

edited

Loading