Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dspy.Evaluate Typed and DSPy Only (No dsp) #889

Closed
wants to merge 5 commits into from
Closed

Conversation

KCaverly
Copy link
Collaborator

In an effort to harden the Evaluate function, I have moved all metrics from dsp to dspy, added type hints and made Evaluate Pydantic validated. This should not affect any fundamental functionality, and is simply an effort to increase code quality and coverage.

@KCaverly KCaverly marked this pull request as ready for review April 22, 2024 21:23


class Evaluate(BaseModel):
# Example is not yet Pydantic valid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this comment now :)

ncorrect = 0
ntotal = 0
reordered_devset = []

pbar = tqdm.tqdm(total=len(devset), dynamic_ncols=True, disable=not display_progress, file=sys.stdout)
for idx, arg in devset:
with logging_redirect_tqdm():
print(f"IDX: {idx}, {arg}, {type(idx)}, {type(arg)}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be printed or logged and outputted in a verbose/debug setting?



def answer_exact_match(example: Example, pred: Prediction, frac: float = 0.90, *_args, **_kwargs) -> bool:
if not isinstance(example.answer, (str, list)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine to leave as original assert statement?



def answer_passage_match(example: Example, pred: Prediction, *_args, **_kwargs):
if not isinstance(example.answer, (str, list)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

return dsp.answer_match(pred.answer, [example.answer], frac=frac)
else: # type(example.answer) is list
return dsp.answer_match(pred.answer, example.answer, frac=frac)
def em(prediction: str, answers_list: list[str]) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing asserts?

return dsp.passage_match(pred.context, [example.answer])
else: # type(example.answer) is list
return dsp.passage_match(pred.context, example.answer)
def f1(prediction: str, answers_list: list[str]) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnavsinghvi11
Copy link
Collaborator

Thanks for this PR @KCaverly ! As we're moving the metrics from dsp to dspy, it seems like we are missing some metrics in there (novel_f1_score, precision_score, hotpot_f1_score, HotPotF1, nF1). Granted, these are not used in the repo but maybe we can keep them for now until a greater refactor (similar to the case with datasets).

We can probably revisit this later, but with this change, should be good to remove them from the dsp/ folder now.

Also feel free to check PR #935 since there may be some overlap/changes.

@thomasahle
Copy link
Collaborator

LGTM

@KCaverly KCaverly closed this Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants