Implement `PTB` evaluation #135

EricHallahan · 2021-02-08T21:47:00Z

Implement the Penn Treebank evaluation as described in #5

codecov · 2021-02-08T21:54:33Z

Codecov Report

Merging #135 (69ec7a8) into master (2b8956b) will decrease coverage by 8.37%.
The diff coverage is 100.00%.

❗ Current head 69ec7a8 differs from pull request most recent head b667da9. Consider uploading reports for the commit b667da9 to get more accurate results

@@            Coverage Diff             @@
##           master     #135      +/-   ##
==========================================
- Coverage   83.85%   75.47%   -8.38%     
==========================================
  Files          43       34       -9     
  Lines        3048     2035    -1013     
==========================================
- Hits         2556     1536    -1020     
- Misses        492      499       +7

Impacted Files	Coverage Δ
lm_eval/tasks/__init__.py	`100.00% <100.00%> (+8.16%)`	⬆️
lm_eval/tasks/ptb.py	`100.00% <100.00%> (ø)`
lm_eval/tasks/drop.py	`0.00% <0.00%> (-91.61%)`	⬇️
lm_eval/tasks/coqa.py	`0.00% <0.00%> (-88.51%)`	⬇️
lm_eval/tasks/openbookqa.py	`42.85% <0.00%> (-57.15%)`	⬇️
lm_eval/tasks/squad.py	`52.63% <0.00%> (-47.37%)`	⬇️
lm_eval/utils.py	`59.25% <0.00%> (-22.71%)`	⬇️
lm_eval/models/dummy.py	`60.00% <0.00%> (-13.69%)`	⬇️
lm_eval/tasks/superglue.py	`88.18% <0.00%> (-11.82%)`	⬇️
lm_eval/tasks/pubmedqa.py	`90.47% <0.00%> (-6.83%)`	⬇️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c4967b...b667da9. Read the comment docs.

It now takes loglikelihood of full sentence rather than just last word. Length normalizing is done using the *original*, pre-detokenization word count.

leogao2 · 2021-02-12T05:06:01Z

Blocked on figuring out why the heck our score is an order of magnitude worse than the value for 117M in the GPT2 paper

# Conflicts: # lm_eval/tasks/__init__.py

Initial support for PTB.

b1a5532

EricHallahan and others added 3 commits February 8, 2021 20:57

Add detokenizer.

b800210

PTB update

be82390

It now takes loglikelihood of full sentence rather than just last word. Length normalizing is done using the *original*, pre-detokenization word count.

Update test_tasks.py

6cfa325

StellaAthena linked an issue Feb 18, 2021 that may be closed by this pull request

Implement the Penn Tree Bank evaluation #5

Closed

leogao2 added 2 commits May 11, 2021 11:13

Merge branch 'master' into task-ptb

ade995d

# Conflicts: # lm_eval/tasks/__init__.py

Make PTB use the new rolling perplexity

b667da9

leogao2 closed this Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `PTB` evaluation #135

Implement `PTB` evaluation #135

EricHallahan commented Feb 8, 2021

codecov bot commented Feb 8, 2021 •

edited

Loading

leogao2 commented Feb 12, 2021

Implement PTB evaluation #135

Implement PTB evaluation #135

Conversation

EricHallahan commented Feb 8, 2021

codecov bot commented Feb 8, 2021 • edited Loading

Codecov Report

leogao2 commented Feb 12, 2021

Implement `PTB` evaluation #135

Implement `PTB` evaluation #135

codecov bot commented Feb 8, 2021 •

edited

Loading