Skip to content

Commit

Permalink
Merge branch 'joss-paper' of https://github.com/EleutherAI/elk into j…
Browse files Browse the repository at this point in the history
…oss-paper
  • Loading branch information
lauritowal committed Aug 11, 2023
2 parents ae1ca60 + a453126 commit 5aaa18e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ bibliography: paper.bib

Language models are proficient at predicting successive tokens in a sequence of text. However, they often inadvertently mirror human errors and misconceptions, even when equipped with the capability to "know better." This behavior becomes particularly concerning when models are trained to generate text that is highly rated by human evaluators, leading to the potential output of erroneous statements that may go undetected. Our solution is to directly elicit latent knowledge (([elk](`https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit`) [@author:elk]) from within the activations of a language model to mitigate this challenge.

`elk` is a specialized library developed to provide both the original and an enhanced version of the CSS methodology. Described in the paper "Discovering Latent Knowledge in Language Models Without Supervision" by Burns et al. [@author:burns]. In addition, we have implemented an approach based on the Contrastive Representation Clustering (CRC) method (2022) from the same paper.
`elk` is a specialized library developed to provide both the original and an enhanced version of the CSS methodology. Described in the paper "Discovering Latent Knowledge in Language Models Without Supervision" by Burns et al. [@author:burns]. In addition, we have implemented an approach, called VINC, based on the Contrastive Representation Clustering (CRC) method from the same paper.

`elk` serves as a tool for those seeking to investigate the veracity of model output and explore the underlying beliefs embedded within the model. The library offers:

Expand Down

0 comments on commit 5aaa18e

Please sign in to comment.