Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
StellaAthena committed Oct 7, 2023
1 parent dbfca97 commit 91d72e8
Showing 1 changed file with 20 additions and 9 deletions.
29 changes: 20 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,27 @@

This repository is for EleutherAI's project *Pythia* which combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. For detailed info on the models, their training, and their behavior, please see our paper [Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling](https://arxiv.org/abs/2304.01373).

# Contents
## Contents

* [Models](#models)
- [Pythia: Interpreting Transformers Across Time and Scale](#pythia--interpreting-transformers-across-time-and-scale)
* [Models](#models)
* [Changelog](#changelog)
* [Quickstart](#quickstart)
* [Reproducing Training](#reproducing-training)
* [Dataset Viewer](#dataset-viewer)
* [Benchmark Scores](#benchmark-scores)
* [Other Papers](#other-papers)
* [License](#license)
- [Using Pythia](#using-pythia)
* [Quickstart](#quickstart)
* [Reproducing Training](#reproducing-training)
* [Exploring the Dataset](#exploring-the-dataset)
* [Pythia Paper Replication](#pythia-paper-replication)
- [Benchmark Scores](#benchmark-scores)
- [Research Building on Pythia](#research-building-on-pythia)
* [EleutherAI Projects](#eleutherai-projects)
* [Other EleutherAI Research](#other-eleutherai-research)
* [External Interpretability Research](#external-interpretability-research)
* [Other Research Projects](#other-research-projects)
- [Citation Details](#citation-details)
- [License](#license)

<small><i><a href='http:https://ecotrust-canada.github.io/markdown-toc/'>Table of contents generated with markdown-toc</a></i></small>


## Models

Expand Down Expand Up @@ -48,7 +59,7 @@ We also upload the pre-tokenized data files and a script to reconstruct the data
- We remedied a minor inconsistency that existed in the original suite: all models of size 2.8B parameters or smaller had a learning rate (LR) schedule which decayed to a minimum LR of 10% the starting LR rate, but the 6.9B and 12B models all used an LR schedule which decayed to a minimum LR of 0. In the redone training runs, we rectified this inconsistency: all models now were trained with LR decaying to a minimum of 0.1× their maximum LR.
- the new `EleutherAI/pythia-1b` is trained with bf16, because in fp16 the model corrupted due to loss spikes late in training.

The old models ("v0") remain available at [https://huggingface.co/models?other=pythia_v0](https://huggingface.co/models?other=pythia_v0).
The old models ("v0") remain available [here](https://huggingface.co/models?other=pythia_v0) and may be useful for ablation studies.

[January 20, 2023]
On January 20, 2023, we chose to rename the Pythia model suite to include both embedding layer and unembedding layer parameters in our total parameter counts, in line with many other model suites and because we believe this convention better reflects the on-device memory usage of these models. We also discovered that due to a typo one of our models was smaller than we thought, and replaced it with a model of the intended size. See [here](https://huggingface.co/EleutherAI/pythia-410m-deduped#naming-convention-and-parameter-count) for more details.
Expand Down

0 comments on commit 91d72e8

Please sign in to comment.