-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to MGSM #1391
Comments
Hi! @juletx should be able to confirm but I think just using
|
Hi @leocnj @baberabb! Not sure why it's implemented like that but I'd say it's a bug. None of the mgsm tasks generates the correct few-shot prompt when testing them with
|
Thanks for the confirmation @juletx! Also looks like the |
Did more reading on why [6+1] was used. Inside "en": { # English
"QUESTION": "Question:",
"ANSWER": "Step-by-Step Answer:",
"DIRECT": "Answer:",
"REGEX": "The answer is (\\-?[0-9\\.\\,]+)",
...
yaml.dump(
{
"include": yaml_template,
"dataset_name": lang,
"task": f"mgsm_{lang}_direct",
"doc_to_text": f"""{{% if answer is not none %}}"""
f"""{{{{question+"\\n{ANSWER}"}}}}"""
f"""{{% else %}}"""
f"""{{{{"{QUESTION} "+question+"\\n{ANSWER}"}}}}"""
f"""{{% endif %}}""",
"doc_to_target": f"""{{% if answer is not none %}}"""
f"""{{{{answer[{len(ANSWER)}+1]}}}}"""
f"""{{% else %}}"""
f"""{{{{answer_number|string}}}}"""
f"""{{% endif %}}""",
**filter_list,
}, This will generate a yaml file like # Generated by utils.py
dataset_name: en
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"Question:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_en I looks that |
aah that makes more sense. "Step-by-Step Answer:" has 20 characters by my count though ("answer" has 6). |
Made a PR to fix the issues we observed above. Now generated contexts look normal. For example, for !!@@##@@!! -- Example 1
Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
Answer:Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
Question: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
Answer:There are 3 cars in the beginning, 2 more arrive, so now there should be 3 + 2 = 5 cars. The answer is 5.
Question: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?
Answer: For !!@@##@@!! -- Example 1
问题:罗杰有 5 个网球。他又买了 2 罐网球。每罐有 3 个网球。他现在有多少个网球?
逐步解答: 杰一开始有 5 个球。2 罐各 3 个网球就是 6 个网球。5 + 6 = 11。答案是 11。
问题:如果停车场里有 3 辆车,又来了 2 辆车,停车场里有多少辆车?
逐步解答: 开始有 3 辆车,又来了 2 辆,所以现在应该有 3 + 2 = 5 辆车。答案是 5。
问题:服务器机房里有九台电脑。从周一到周四,每天又安装了五台电脑。服务器机房里现在有多少台电脑?
逐步解答: |
* fix the issue #1391, wrong contexts in mgsm tasks * fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default) * regenerate all task yaml files - change naming so that file name will match with task name - task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot * English CoTs should have a space as target_delimiter * Update utils.py * Apply suggestions from code review --------- Co-authored-by: Hailey Schoelkopf <[email protected]>
The filter on the native languages tasks should also need some updating which currently uses the English format for the answer.
Hugginface dataset French few shot example:
https://huggingface.co/datasets/juletxara/mgsm/viewer/fr?row=0 |
Agreed-- just taking the final number from the response (as in lm-evaluation-harness/lm_eval/tasks/gsm8k/gsm8k-cot.yaml Lines 43 to 48 in a72babb
|
…EleutherAI#1440) * fix the issue EleutherAI#1391, wrong contexts in mgsm tasks * fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default) * regenerate all task yaml files - change naming so that file name will match with task name - task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot * English CoTs should have a space as target_delimiter * Update utils.py * Apply suggestions from code review --------- Co-authored-by: Hailey Schoelkopf <[email protected]>
Hi! Shouldn't Native CoTdoc_to_target: '{% if answer is not none %}{{answer[21:]}}{% else %}{{answer_number|string}}{% endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nStep-by-Step Answer:"}}{% else %}{{"Question: "+question+"\nStep-by-Step Answer:"}}{% endif %}' Example
✅ Independently of whether the evaluation is zero- or few-shot, the model is asked to generate a chain of thought. Directdoc_to_target: '{% if answer is not none %}{{answer[21:]}}{% else %}{{answer_number|string}}{% endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer:"}}{% else %}{{"Question: "+question+"\nAnswer:"}}{% endif %}' Example:
✅ Independently of whether the evaluation is zero- or few-shot, the model is asked to directly output the answer. As it is now, the only difference between these two setups seems to be that the prompt says "Step-by-Step Answer:" for CoT, and "Answer:" for direct. Am I missing something? Thank you! |
…EleutherAI#1440) * fix the issue EleutherAI#1391, wrong contexts in mgsm tasks * fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default) * regenerate all task yaml files - change naming so that file name will match with task name - task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot * English CoTs should have a space as target_delimiter * Update utils.py * Apply suggestions from code review --------- Co-authored-by: Hailey Schoelkopf <[email protected]>
lm-evaluation-harness/lm_eval/tasks/mgsm/direct/mgsm_direct_en.yaml
Line 3 in 7411947
According to my understanding, for
target
, we need to obtain a sub string ofanswer
from the location 6+1. The existing jiajia-2, however, represent the char on location 7. Is this a bug?The text was updated successfully, but these errors were encountered: