forked from openai/evals
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request openai#260 from openai/sg-japan
[evals] added multilingual example and support
- Loading branch information
Showing
15 changed files
with
313 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Git LFS file not shown
4 changes: 2 additions & 2 deletions
4
evals/registry/data/test_multiio/battles/joke_animals_vs_fruits.jsonl
Git LFS file not shown
4 changes: 2 additions & 2 deletions
4
evals/registry/data/test_multiio/battles/rap_animals_vs_fruits.jsonl
Git LFS file not shown
4 changes: 2 additions & 2 deletions
4
evals/registry/data/test_multiio/battles/rap_people_vs_fruits.jsonl
Git LFS file not shown
4 changes: 2 additions & 2 deletions
4
evals/registry/data/test_multiio/battles/rap_people_vs_people.jsonl
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,4 @@ test-modelgraded: | |
- rap-people-vs-people | ||
- rap-animals-vs-fruits | ||
- rap-people-vs-fruits | ||
- mg-humor-people_jp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# This file is generated by modelgraded_generator.py | ||
|
||
mg-humor-people_jp: | ||
id: mg-humor-people_jp.dev.v0 | ||
metrics: [accuracy] | ||
mg-humor-people_jp.dev.v0: | ||
class: evals.elsuite.modelgraded.classify:ModelBasedClassify | ||
args: | ||
samples_jsonl: test_modelgraded/humor_people_jp.jsonl | ||
eval_type: cot_classify_jp | ||
modelgraded_spec_file: humor_jp | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
prompt: |- | ||
以下の文章って面白い? | ||
(始まり) | ||
{completion} | ||
(終わり) | ||
5段階の中から選んでください。 | ||
1. 退屈 - このレベルの面白さはほとんど感じられず、退屈で時間が過ぎるのが遅いと感じるでしょう。興味を惹かれる要素がほとんどなく、全体的に魅力に欠けます。 | ||
2. やや面白い - このレベルでは、若干の面白さが感じられますが、それでも十分ではありません。物語性やコンセプトには一定の魅力があるものの、期待外れであり、時間を過ごすにはまだ物足りないかもしれません。 | ||
3. まあまあ面白い - このレベルの面白さは、平均的で満足できる範囲です。ある程度の興味深い要素やストーリーがあり、時間を過ごすのに適していますが、特別印象に残るものではないかもしれません。 | ||
4. 面白い - このレベルでは、かなりの面白さが感じられ、魅力的なストーリーやキャラクターが存在します。多くの人が楽しめる内容であり、興味を持続させる要素が豊富に含まれています。ただし、最高の評価には僅かに及ばない部分が残っています。 | ||
5. 大変面白い - このレベルの面白さは、非常に優れており、観る者を魅了し、心に強く残る体験ができます。独創的なストーリーや魅力的なキャラクターが際立ち、多くの人が共感や感動を覚えるでしょう。このレベルの面白さは、他のものと比較しても突出していると言えます。 | ||
choice_scores: from_strings | ||
choice_strings: "12345" | ||
input_outputs: | ||
input: completion |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.