-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the symbolic manipulations evaluation #26
Comments
I'd like to do this!
Any really specific tools I should use, or will Python suffice? |
@nicholaskross thats correct! The text above is the description of the task as it is presented in the the GPT-3 paper, “language models are few shot learners.” If you look at the existing tests in this repo you can use them as a template. For this task there isn’t any external data – you should randomly generate a dataset that meets the specifications. And yes, everything should be in Python. Welcome to the team :) Let me know if there’s any questions I can answer. And feel free to ask anything in the #lm-thunderdome channel on Discord. |
Thanks! Will do |
@nicholaskross Hey, wanted to ping you and check in. How is it coming? |
Ah, sorry I haven't made much progress yet! I was busy with schoolwork etc. Hoping to get more energy soon... |
Not sure what you mean by meta-format, but sample texts can be found in Appendix G in the GPT3 paper (https://arxiv.org/pdf/2005.14165.pdf). For example on page 55 you can see how a CL task doc looked like |
Ah, okay thanks! I'll have it output tests in that format, then (txt files until/unless we use use a different one). |
@StellaAthena I have code that can create the dataset of common words (from [Nor09]) and the transformed words for the tasks. |
Nicholas, Sorry about disappearing on you, I got distracted by othre things. Are you still interested in doing this? |
Fix max generation limit
Fix max generation limit
From the GPT-3 paper:
This is a task where we need to create a custom dataset for evaluation.
The evaluation code should be modeled after the interface in
lm_eval/base.py
and the example of theBoolQ
task inlm_eval/tasks/suerglue.py
The text was updated successfully, but these errors were encountered: