Implement the Penn Tree Bank evaluation #5

StellaAthena · 2020-09-16T16:31:10Z

We calculate zero-shot perplexity on the Penn Tree Bank (PTB) [MKM+94] dataset measured in [RWC+19]. We omit the 4 Wikipedia-related tasks in that work because they are entirely contained in our training data, and we also omit the one-billion word benchmark due to a high fraction of the dataset being contained in our training set. PTB escapes these issues due to predating the modern internet. Our largest model sets a new SOTA on PTB by a substantial margin of 15 points, achieving a perplaexity of 20.50. Note that since PTB is a traditional language modeling dataset it does not have a clear separation of examples to define one-shot or few-shot evaluation around, so we measure only zero-shot.

sdtblck · 2020-10-03T22:09:54Z

not free - https://catalog.ldc.upenn.edu/LDC99T42 / https://stackoverflow.com/questions/8949517/is-there-any-treebank-for-free
I suggest leaving this one out

StellaAthena · 2020-10-04T22:20:17Z

Dropping because it isn’t free.

leogao2 · 2020-11-29T23:04:48Z

We received confirmation from OA that they use the version of PTB with all the <unk>s. So we are going to be using PTB now.

StellaAthena · 2020-11-30T01:28:17Z

We received confirmation from OA that they use the version of PTB with all the s. So we are going to be using PTB now.

What does this mean?

leogao2 · 2021-01-28T02:01:14Z

Edited comment for clarity, apparently github dropped the <unk> because it thought it was a html tag or something, and i never noticed.

leogao2 · 2021-02-08T05:13:10Z

eric hallahan is working on this

Use eval-hackathon branch for installing prompt-source.

Add MLsum Tasks

fix edge cases for seq2seq

…rsion Use eval-hackathon branch for installing prompt-source.

Add MLsum Tasks

…rsion Use eval-hackathon branch for installing prompt-source.

Add MLsum Tasks

StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020

StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020

StellaAthena changed the title ~~Penn Tree Bank~~ Implement the Penn Tree Bank evaluation Sep 16, 2020

StellaAthena closed this as completed Oct 4, 2020

Implementing Evaluations automation moved this from To do to Done Oct 4, 2020

StellaAthena moved this from Done to Data integrated, Eval not done in Implementing Evaluations Oct 23, 2020

StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020

StellaAthena moved this from Data integrated, Eval not done to Done in Implementing Evaluations Oct 23, 2020

StellaAthena added the Declined label Oct 23, 2020

leogao2 moved this from Done to Declined in Implementing Evaluations Oct 24, 2020

leogao2 reopened this Nov 29, 2020

leogao2 removed the Declined label Nov 29, 2020

leogao2 moved this from Declined to To do in Implementing Evaluations Nov 29, 2020

StellaAthena removed the Eval Set label Dec 23, 2020

StellaAthena closed this as completed Jan 4, 2021

leogao2 reopened this Jan 28, 2021

leogao2 added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 28, 2021

leogao2 removed this from To do in Implementing Evaluations Jan 28, 2021

leogao2 added this to To do in Implementing Evaluations via automation Jan 28, 2021

leogao2 moved this from To do to In Progress in Implementing Evaluations Jan 28, 2021

leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021

leogao2 moved this from To do to In Progress in Implementing Evaluations Jan 29, 2021

EricHallahan mentioned this issue Feb 8, 2021

Implement PTB evaluation #135

Closed

StellaAthena assigned EricHallahan Feb 18, 2021

StellaAthena linked a pull request Feb 18, 2021 that will close this issue

Implement PTB evaluation #135

Closed

leogao2 moved this from In Progress to Deferred in Implementing Evaluations Apr 3, 2021

StellaAthena referenced this issue in bigscience-workshop/lm-evaluation-harness Apr 27, 2022

Merge pull request #5 from Shashi456/update-promptsource-version

18af502

Use eval-hackathon branch for installing prompt-source.

StellaAthena pushed a commit that referenced this issue Apr 29, 2022

Merge pull request #5 from Shashi456/mlsum

f65e196

Add MLsum Tasks

haileyschoelkopf closed this as completed Mar 25, 2023

Implementing Evaluations automation moved this from Deferred to Done, evaluations Mar 25, 2023

lintangsutawika pushed a commit that referenced this issue Jun 22, 2023

Merge pull request #5 from EleutherAI/seq2seq-support

d3cfdcf

fix edge cases for seq2seq

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#5 from Shashi456/update-promptsource-ve…

2fe0fac

…rsion Use eval-hackathon branch for installing prompt-source.

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#5 from Shashi456/mlsum

25571fc

Add MLsum Tasks

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#5 from Shashi456/update-promptsource-ve…

c375284

…rsion Use eval-hackathon branch for installing prompt-source.

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#5 from Shashi456/mlsum

e546c90

Add MLsum Tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the Penn Tree Bank evaluation #5

Implement the Penn Tree Bank evaluation #5

StellaAthena commented Sep 16, 2020

sdtblck commented Oct 3, 2020 •

edited

Loading

StellaAthena commented Oct 4, 2020

leogao2 commented Nov 29, 2020 •

edited

Loading

StellaAthena commented Nov 30, 2020

leogao2 commented Jan 28, 2021

leogao2 commented Feb 8, 2021

Implement the Penn Tree Bank evaluation #5

Implement the Penn Tree Bank evaluation #5

Comments

StellaAthena commented Sep 16, 2020

sdtblck commented Oct 3, 2020 • edited Loading

StellaAthena commented Oct 4, 2020

leogao2 commented Nov 29, 2020 • edited Loading

StellaAthena commented Nov 30, 2020

leogao2 commented Jan 28, 2021

leogao2 commented Feb 8, 2021

sdtblck commented Oct 3, 2020 •

edited

Loading

leogao2 commented Nov 29, 2020 •

edited

Loading