-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the WSC273 Winograd Schemas Challenge evaluation #12
Comments
I'm having trouble finding an official source for this, but it appears that this may be a BERT version of the WSC273 dataset: https://github.com/vid-koci/bert-commonsense/blob/master/data/wsc273.txt |
The original was hosted on a Google Cloud Storage bucket. It likely is extant at the link below. The examples in Trinh and Le's paper are included. |
Reopening. In case anyone else ends up confused (as we were) there are 4 different Winograd schema datasets that will be in this harness:
|
Added null prompt support for T5 & Added BLIMP task template
Added null prompt support for T5 & Added BLIMP task template
Added null prompt support for T5 & Added BLIMP task template
From the GPT-3 paper
The evaluation code should be modeled after the interface in
lm_eval/base.py
and the example of theBoolQ
task inlm_eval/tasks/suerglue.py
The text was updated successfully, but these errors were encountered: