Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to ignore casing in Includes eval template #655

Merged
merged 3 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion evals/elsuite/basic/includes.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@ def __init__(
self,
completion_fns: list[CompletionFn],
samples_jsonl: str,
ignore_case: bool = False,
*args,
**kwargs,
):
super().__init__(completion_fns, *args, **kwargs)
assert len(completion_fns) == 1, "Includes only supports one completion fn"
self.samples_jsonl = samples_jsonl
self.ignore_case = ignore_case

def eval_sample(self, sample: Any, *_):
prompt = sample["input"]
Expand All @@ -27,7 +29,9 @@ def eval_sample(self, sample: Any, *_):
)
sampled = result.get_completions()[0]

includes_answer = any([utils.get_answer(sampled, ref) for ref in sample["ideal"]])
includes_answer = any(
[utils.get_answer(sampled, ref, self.ignore_case) for ref in sample["ideal"]]
)
evals.record.record_metrics(accuracy=float(includes_answer))
return includes_answer

Expand Down
8 changes: 6 additions & 2 deletions evals/elsuite/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,12 @@
)


def get_answer(text, answer_prompt):
idx = text.rfind(answer_prompt)
def get_answer(text, answer_prompt, ignore_case=False):
if ignore_case:
idx = text.lower().rfind(answer_prompt.lower())
else:
idx = text.rfind(answer_prompt)

if idx == -1:
return None
return text[idx + len(answer_prompt) :]
Expand Down
1 change: 1 addition & 0 deletions evals/registry/eval_sets/test-all.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ test:
- test-match
- test-fuzzy-match
- test-includes
- test-includes-ignore-case
- coqa-match
- coqa-fact
- coqa-fact-expl
Expand Down
1 change: 1 addition & 0 deletions evals/registry/eval_sets/test-basic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ test-basic:
- test-match
- test-fuzzy-match
- test-includes
- test-includes-ignore-case
10 changes: 10 additions & 0 deletions evals/registry/evals/test-basic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,13 @@ test-includes.s1.simple-v0:
class: evals.elsuite.basic.includes:Includes
args:
samples_jsonl: test_fuzzy_match/samples.jsonl

test-includes-ignore-case:
id: test-includes-ignore-case.s1.simple-v0
description: Example eval that uses fuzzy matching to score completions.
metrics: [accuracy]
test-includes-ignore-case.s1.simple-v0:
class: evals.elsuite.basic.includes:Includes
args:
samples_jsonl: test_fuzzy_match/samples.jsonl
ignore_case: true