-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Klokan-qa task #1657
base: main
Are you sure you want to change the base?
Klokan-qa task #1657
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, this looks like a cool dataset!
A few questions before we could merge:
- Is there a (small? open weights) model that could be run on this task to get nonzero performance, to check that the implementation is correct?
- Could you add a
lm_eval/tasks/klokan-qa/README.md
file that follows the typical template format, mentioning the source of this dataset (and a link to the source that introduced the dataset) and any other relevant details about the task implementation?
- "</s>" | ||
- "<|im_end|>" | ||
do_sample: true | ||
temperature: 0.0000001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not temperature: 0.0
and do_sample: false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another look and had some other comments that might be helpful before merge! Lmk what you think.
|
||
- name: "flexbile-extract" | ||
filter: | ||
- function: "regex" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use the multiple choice filter that's existing for this? or was this meant to be regex_pattern: "([A-E])"
?
lm-evaluation-harness/lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_cot_zeroshot_template_yaml
Lines 13 to 20 in e9a4054
- name: "flexible-extract" | |
filter: | |
- function: !function utils.MultiChoiceRegexFilter | |
group_select: -1 | |
ignore_case: true | |
ignore_punctuation: true | |
regex_pattern: "(\\([A-Z]\\))" | |
- function: "take_first" |
"Jsi expert v matematice, logice, českém jazyce a všeobecných znalostech.\nDostaneš otázku s 5 možnými odpověďmi (A, B, C, D, E) a tvým úkolem je vybrat správnou odpověď.\nNejprve bys měl o odpovědi přemýšlet a poté vypsat písmeno správné odpovědi.\nJazyk otázek a odpovědí je čeština.\n\n\ | ||
Q: Jak se jmenuje hlavní město České republiky?\n(A) Brno\n(B) Praha\n(C) Ostrava\n(D): Plzeň\n(E) Liberec\nA: Přestože Brno a Ostrava jsou velká města, Praha je hlavní město České republiky, správná odpověď je Praha. Správná odpověď: B.\n\n\ | ||
Q: Mařenka má 5 jablíček a 3 hrušky. Kolik má ovoce?\n(A) 5\n(B) 1\n(C) 3\n(D) 2\n(E) 8\nA: Jak jablka, tak hrušky jsou ovoce a tedy 3 + 5 = 8. Správná odpověď: E\n\n\ | ||
Q: {{question}}\n(A) {{A}}\n(B) {{B}}\n(C) {{C}}\n(D) {{D}}\n(E) {{E}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized this might be missing the trailing \nA:
?
Q: {{question}}\n(A) {{A}}\n(B) {{B}}\n(C) {{C}}\n(D) {{D}}\n(E) {{E}}" | |
Q: {{question}}\n(A) {{A}}\n(B) {{B}}\n(C) {{C}}\n(D) {{D}}\n(E) {{E}}\nA:" |
Also if you intend people to only ever evaluate the task with these specific fewshot samples, consider setting num_fewshot: 0
in this config and adding
metadata:
version: 3.0
num_fewshot: 2
for making this only ever able to be evaluated "zero-shot" (no extra fewshots added to your prompt here) and to print that it is actually a 2-shot prompt, respectively
Co-authored-by: Hailey Schoelkopf <[email protected]>
What is Klokan-qa