Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Penn Tree Bank evaluation #5

Closed
StellaAthena opened this issue Sep 16, 2020 · 6 comments
Closed

Implement the Penn Tree Bank evaluation #5

StellaAthena opened this issue Sep 16, 2020 · 6 comments
Assignees
Labels
feature request A feature that isn't implemented yet. good first issue Good for newcomers

Comments

@StellaAthena
Copy link
Member

We calculate zero-shot perplexity on the Penn Tree Bank (PTB) [MKM+94] dataset measured in [RWC+19]. We omit the 4 Wikipedia-related tasks in that work because they are entirely contained in our training data, and we also omit the one-billion word benchmark due to a high fraction of the dataset being contained in our training set. PTB escapes these issues due to predating the modern internet. Our largest model sets a new SOTA on PTB by a substantial margin of 15 points, achieving a perplaexity of 20.50. Note that since PTB is a traditional language modeling dataset it does not have a clear separation of examples to define one-shot or few-shot evaluation around, so we measure only zero-shot.

@StellaAthena StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020
@StellaAthena StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020
@StellaAthena StellaAthena changed the title Penn Tree Bank Implement the Penn Tree Bank evaluation Sep 16, 2020
@sdtblck
Copy link
Contributor

sdtblck commented Oct 3, 2020

@StellaAthena
Copy link
Member Author

Dropping because it isn’t free.

Implementing Evaluations automation moved this from To do to Done Oct 4, 2020
@StellaAthena StellaAthena moved this from Done to Data integrated, Eval not done in Implementing Evaluations Oct 23, 2020
@StellaAthena StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020
@StellaAthena StellaAthena moved this from Data integrated, Eval not done to Done in Implementing Evaluations Oct 23, 2020
@leogao2 leogao2 moved this from Done to Declined in Implementing Evaluations Oct 24, 2020
@leogao2
Copy link
Contributor

leogao2 commented Nov 29, 2020

We received confirmation from OA that they use the version of PTB with all the <unk>s. So we are going to be using PTB now.

@leogao2 leogao2 reopened this Nov 29, 2020
@leogao2 leogao2 removed the Declined label Nov 29, 2020
@leogao2 leogao2 moved this from Declined to To do in Implementing Evaluations Nov 29, 2020
@StellaAthena
Copy link
Member Author

We received confirmation from OA that they use the version of PTB with all the s. So we are going to be using PTB now.

What does this mean?

@leogao2
Copy link
Contributor

leogao2 commented Jan 28, 2021

Edited comment for clarity, apparently github dropped the <unk> because it thought it was a html tag or something, and i never noticed.

@leogao2 leogao2 reopened this Jan 28, 2021
@leogao2 leogao2 added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 28, 2021
@leogao2 leogao2 removed this from To do in Implementing Evaluations Jan 28, 2021
@leogao2 leogao2 added this to To do in Implementing Evaluations via automation Jan 28, 2021
@leogao2 leogao2 moved this from To do to In Progress in Implementing Evaluations Jan 28, 2021
@leogao2 leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021
@leogao2 leogao2 moved this from To do to In Progress in Implementing Evaluations Jan 29, 2021
@leogao2
Copy link
Contributor

leogao2 commented Feb 8, 2021

eric hallahan is working on this

@StellaAthena StellaAthena linked a pull request Feb 18, 2021 that will close this issue
@leogao2 leogao2 moved this from In Progress to Deferred in Implementing Evaluations Apr 3, 2021
StellaAthena referenced this issue in bigscience-workshop/lm-evaluation-harness Apr 27, 2022
Use eval-hackathon branch for installing prompt-source.
StellaAthena pushed a commit that referenced this issue Apr 29, 2022
Implementing Evaluations automation moved this from Deferred to Done, evaluations Mar 25, 2023
lintangsutawika pushed a commit that referenced this issue Jun 22, 2023
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023
…rsion

Use eval-hackathon branch for installing prompt-source.
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023
…rsion

Use eval-hackathon branch for installing prompt-source.
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet. good first issue Good for newcomers
Projects
No open projects
Implementing Evaluations
  
Done, evaluations
Development

Successfully merging a pull request may close this issue.

5 participants