Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Klokan-qa task #1657

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Klokan-qa task #1657

wants to merge 5 commits into from

Conversation

hynky1999
Copy link

What is Klokan-qa

  • Dataset of mathematical questions in Czech for elementary and high-school children
  • ~850 questions
  • Dataset is not translated, which makes it the only publicly available dataset in Czech for math, which is not translated

@CLAassistant
Copy link

CLAassistant commented Apr 1, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, this looks like a cool dataset!

A few questions before we could merge:

  • Is there a (small? open weights) model that could be run on this task to get nonzero performance, to check that the implementation is correct?
  • Could you add a lm_eval/tasks/klokan-qa/README.md file that follows the typical template format, mentioning the source of this dataset (and a link to the source that introduced the dataset) and any other relevant details about the task implementation?

- "</s>"
- "<|im_end|>"
do_sample: true
temperature: 0.0000001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not temperature: 0.0 and do_sample: false ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad

lm_eval/tasks/klokan-qa/klokan-qa.yaml Show resolved Hide resolved
Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another look and had some other comments that might be helpful before merge! Lmk what you think.

lm_eval/tasks/klokan-qa/klokan-qa.yaml Outdated Show resolved Hide resolved

- name: "flexbile-extract"
filter:
- function: "regex"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use the multiple choice filter that's existing for this? or was this meant to be regex_pattern: "([A-E])" ?

- name: "flexible-extract"
filter:
- function: !function utils.MultiChoiceRegexFilter
group_select: -1
ignore_case: true
ignore_punctuation: true
regex_pattern: "(\\([A-Z]\\))"
- function: "take_first"

"Jsi expert v matematice, logice, českém jazyce a všeobecných znalostech.\nDostaneš otázku s 5 možnými odpověďmi (A, B, C, D, E) a tvým úkolem je vybrat správnou odpověď.\nNejprve bys měl o odpovědi přemýšlet a poté vypsat písmeno správné odpovědi.\nJazyk otázek a odpovědí je čeština.\n\n\
Q: Jak se jmenuje hlavní město České republiky?\n(A) Brno\n(B) Praha\n(C) Ostrava\n(D): Plzeň\n(E) Liberec\nA: Přestože Brno a Ostrava jsou velká města, Praha je hlavní město České republiky, správná odpověď je Praha. Správná odpověď: B.\n\n\
Q: Mařenka má 5 jablíček a 3 hrušky. Kolik má ovoce?\n(A) 5\n(B) 1\n(C) 3\n(D) 2\n(E) 8\nA: Jak jablka, tak hrušky jsou ovoce a tedy 3 + 5 = 8. Správná odpověď: E\n\n\
Q: {{question}}\n(A) {{A}}\n(B) {{B}}\n(C) {{C}}\n(D) {{D}}\n(E) {{E}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized this might be missing the trailing \nA:?

Suggested change
Q: {{question}}\n(A) {{A}}\n(B) {{B}}\n(C) {{C}}\n(D) {{D}}\n(E) {{E}}"
Q: {{question}}\n(A) {{A}}\n(B) {{B}}\n(C) {{C}}\n(D) {{D}}\n(E) {{E}}\nA:"

Also if you intend people to only ever evaluate the task with these specific fewshot samples, consider setting num_fewshot: 0 in this config and adding

metadata:
  version: 3.0
  num_fewshot: 2

for making this only ever able to be evaluated "zero-shot" (no extra fewshots added to your prompt here) and to print that it is actually a 2-shot prompt, respectively

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants