Skip to content

Commit

Permalink
update draft
Browse files Browse the repository at this point in the history
  • Loading branch information
lauritowal committed Jul 31, 2023
1 parent 930c745 commit 5553fd3
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 85 deletions.
62 changes: 5 additions & 57 deletions joss/paper.bib
Original file line number Diff line number Diff line change
@@ -1,59 +1,7 @@
@article{Pearson:2017,
url = {http:https://adsabs.harvard.edu/abs/2017arXiv170304627P},
Archiveprefix = {arXiv},
Author = {{Pearson}, S. and {Price-Whelan}, A.~M. and {Johnston}, K.~V.},
Eprint = {1703.04627},
Journal = {ArXiv e-prints},
Keywords = {Astrophysics - Astrophysics of Galaxies},
Month = mar,
Title = {{Gaps in Globular Cluster Streams: Pal 5 and the Galactic Bar}},
Year = 2017
@article{burns,
title={Discovering latent knowledge in language models without supervision},
author={Burns, Collin and Ye, Haotian and Klein, Dan and Steinhardt, Jacob},
journal={arXiv preprint arXiv:2212.03827},
year={2022}
}

@book{Binney:2008,
url = {http:https://adsabs.harvard.edu/abs/2008gady.book.....B},
Author = {{Binney}, J. and {Tremaine}, S.},
Booktitle = {Galactic Dynamics: Second Edition, by James Binney and Scott Tremaine.~ISBN 978-0-691-13026-2 (HB).~Published by Princeton University Press, Princeton, NJ USA, 2008.},
Publisher = {Princeton University Press},
Title = {{Galactic Dynamics: Second Edition}},
Year = 2008
}

@article{gaia,
author = {{Gaia Collaboration}},
title = "{The Gaia mission}",
journal = {Astronomy and Astrophysics},
archivePrefix = "arXiv",
eprint = {1609.04153},
primaryClass = "astro-ph.IM",
keywords = {space vehicles: instruments, Galaxy: structure, astrometry, parallaxes, proper motions, telescopes},
year = 2016,
month = nov,
volume = 595,
doi = {10.1051/0004-6361/201629272},
url = {http:https://adsabs.harvard.edu/abs/2016A%26A...595A...1G},
}

@article{astropy,
author = {{Astropy Collaboration}},
title = "{Astropy: A community Python package for astronomy}",
journal = {Astronomy and Astrophysics},
archivePrefix = "arXiv",
eprint = {1307.6212},
primaryClass = "astro-ph.IM",
keywords = {methods: data analysis, methods: miscellaneous, virtual observatory tools},
year = 2013,
month = oct,
volume = 558,
doi = {10.1051/0004-6361/201322068},
url = {http:https://adsabs.harvard.edu/abs/2013A%26A...558A..33A}
}

@misc{fidgit,
author = {A. M. Smith and K. Thaney and M. Hahnel},
title = {Fidgit: An ungodly union of GitHub and Figshare},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/arfon/fidgit}
}
36 changes: 8 additions & 28 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,47 +26,27 @@ affiliations:
date: 13 August 2017
bibliography: paper.bib

# Optional fields if submitting to a AAS journal too, see this blog post:
# https://blog.joss.theoj.org/2018/12/a-new-collaboration-with-aas-publishing
aas-doi: 10.3847/xxxxx <- update this with the DOI from AAS once you know it.
aas-journal: Astrophysical Journal <- The name of the AAS journal.
---

# Summary

`elk` is a library designed to elicit latent knowledge ([ELK](`https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit`) [@author:elk]) from language models. It includes implementations of both the original and an enhanced version of the CSS method, as well as an approach based on the CRC method. Designed for researchers, `elk` offers features such as multi-GPU support, integration with Huggingface, and continuous improvement by a dedicated team. The Eleuther AI Discord's `elk` channel provides a platform for collaboration and discussion related to the library and associated research.
`elk` is a library designed to elicit latent knowledge ([elk](`https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/`) [@author:elk]) from language models. It includes implementations of both the original and an enhanced version of the CSS method, as well as an approach based on the CRC method. Designed for researchers, `elk` offers features such as multi-GPU support, integration with Huggingface, and continuous improvement by a dedicated team. The Eleuther AI Discord's `elk` channel provides a platform for collaboration and discussion related to the library and associated research.

# Statement of need

Language models are proficient at predicting successive tokens in a sequence of text. However, they often inadvertently mirror human errors and misconceptions, even when equipped with the capability to "know better." This behavior becomes particularly concerning when models are trained to generate text that is highly rated by human evaluators, leading to the potential output of erroneous statements that may go undetected. Our solution is to directly Elicit Latent Knowledge (ELK) from within the activations of a language model to mitigate this challenge.
Language models are proficient at predicting successive tokens in a sequence of text. However, they often inadvertently mirror human errors and misconceptions, even when equipped with the capability to "know better." This behavior becomes particularly concerning when models are trained to generate text that is highly rated by human evaluators, leading to the potential output of erroneous statements that may go undetected. Our solution is to directly elicit latent knowledge (([elk](`https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit`) [@author:elk]) from within the activations of a language model to mitigate this challenge.

`elk` is a specialized library developed to provide both the original and an enhanced version of the CSS methodology. Described in the paper "Discovering Latent Knowledge in Language Models Without Supervision" by Burns et al. [@author:burns], the CSS method has been instrumental in our understanding of language models. In addition, we have implemented an approach based on the Contrastive Representation Clustering (CRC) method (2022) from the same paper. The CRC technique allows for the discovery of features in the hidden states of a language model that adhere to specific logical consistency requirements. Interestingly, these features have proven to be highly effective for question-answering and text classification tasks, even when trained without labels.
`elk` is a specialized library developed to provide both the original and an enhanced version of the CSS methodology. Described in the paper "Discovering Latent Knowledge in Language Models Without Supervision" by Burns et al. [@author:burns]. In addition, we have implemented an approach based on the Contrastive Representation Clustering (CRC) method (2022) from the same paper.

Designed with the research community in mind, elk serves as a powerful tool for those seeking to investigate the veracity of model output and explore the underlying beliefs embedded within the model. The library offers:
`elk` serves as a tool for those seeking to investigate the veracity of model output and explore the underlying beliefs embedded within the model. The library offers:

Multi-GPU Support: Efficient extraction, training, and evaluation through parallel processing.
Integration with Huggingface: Easy utilization of models and datasets from a popular source.
Active Development and Support: Continuous improvement by a dedicated team of researchers and engineers.
- Multi-GPU Support: Efficient extraction, training, and evaluation through parallel processing.
- Integration with Huggingface: Easy utilization of models and datasets from a popular source.
- Active Development and Support: Continuous improvement by a dedicated team of researchers and engineers.

For collaboration, discussion, and support, the [Eleuther AI Discord's elk channel](https://discord.com/channels/729741769192767510/1070194752785489991) provides a platform for engaging with others interested in the library or related research projects.


# Citations

Citations to entries in paper.bib should be in
[rMarkdown](http:https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html)
format.

If you want to cite a software repository URL (e.g. something on GitHub without a preferred
citation) then you can do it with the example BibTeX entry below for @fidgit.

For a quick reference, the following citation commands can be used:
- `@author:2001` -> "Author et al. (2001)"
- `[@author:2001]` -> "(Author et al., 2001)"
- `[@author1:2001; @author2:2001]` -> "(Author1 et al., 2001; Author2 et al., 2002)"


# Acknowledgements

We want to thank [SERI MATS](https://www.serimats.org/) and [EleutherAI](https://www.eleuther.ai/) for supporting this work.

# References

0 comments on commit 5553fd3

Please sign in to comment.