Make dspy.Evaluate Typed and DSPy Only (No dsp) #889

KCaverly · 2024-04-22T21:23:16Z

In an effort to harden the Evaluate function, I have moved all metrics from dsp to dspy, added type hints and made Evaluate Pydantic validated. This should not affect any fundamental functionality, and is simply an effort to increase code quality and coverage.

arnavsinghvi11 · 2024-05-06T00:46:26Z

dspy/evaluate/evaluate.py

+
+
+class Evaluate(BaseModel):
+ # Example is not yet Pydantic valid


can remove this comment now :)

arnavsinghvi11 · 2024-05-06T00:47:07Z

dspy/evaluate/evaluate.py

 ncorrect = 0
 ntotal = 0
 reordered_devset = []

 pbar = tqdm.tqdm(total=len(devset), dynamic_ncols=True, disable=not display_progress, file=sys.stdout)
 for idx, arg in devset:
 with logging_redirect_tqdm():
+ print(f"IDX: {idx}, {arg}, {type(idx)}, {type(arg)}")


should this be printed or logged and outputted in a verbose/debug setting?

arnavsinghvi11 · 2024-05-06T00:49:18Z

dspy/evaluate/metrics.py

+
+
+def answer_exact_match(example: Example, pred: Prediction, frac: float = 0.90, *_args, **_kwargs) -> bool:
+ if not isinstance(example.answer, (str, list)):


fine to leave as original assert statement?

arnavsinghvi11 · 2024-05-06T00:49:24Z

dspy/evaluate/metrics.py

+
+
+def answer_passage_match(example: Example, pred: Prediction, *_args, **_kwargs):
+ if not isinstance(example.answer, (str, list)):


same comment as above

arnavsinghvi11 · 2024-05-06T00:51:43Z

dspy/evaluate/metrics.py

- return dsp.answer_match(pred.answer, [example.answer], frac=frac)
- else: # type(example.answer) is list
- return dsp.answer_match(pred.answer, example.answer, frac=frac)
+def em(prediction: str, answers_list: list[str]) -> bool:


missing asserts?

arnavsinghvi11 · 2024-05-06T00:51:56Z

dspy/evaluate/metrics.py

- return dsp.passage_match(pred.context, [example.answer])
- else: # type(example.answer) is list
- return dsp.passage_match(pred.context, example.answer)
+def f1(prediction: str, answers_list: list[str]) -> float:


arnavsinghvi11 · 2024-05-06T00:58:06Z

Thanks for this PR @KCaverly ! As we're moving the metrics from dsp to dspy, it seems like we are missing some metrics in there (novel_f1_score, precision_score, hotpot_f1_score, HotPotF1, nF1). Granted, these are not used in the repo but maybe we can keep them for now until a greater refactor (similar to the case with datasets).

We can probably revisit this later, but with this change, should be good to remove them from the dsp/ folder now.

Also feel free to check PR #935 since there may be some overlap/changes.

thomasahle · 2024-06-14T00:46:01Z

LGTM

KCaverly added 5 commits April 17, 2024 12:07

chore: add type hints to Evaluate

f6ea4a4

chore: moved evaluate utils to dspy

9e376a1

chore: ruff fixes

c324ab7

merged with main

33d1861

feat: make Evaluate pydantic valid and dspy only

969bf9f

KCaverly marked this pull request as ready for review April 22, 2024 21:23

arnavsinghvi11 reviewed May 6, 2024

View reviewed changes

KCaverly closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make dspy.Evaluate Typed and DSPy Only (No dsp) #889

Make dspy.Evaluate Typed and DSPy Only (No dsp) #889

KCaverly commented Apr 22, 2024

arnavsinghvi11 May 6, 2024

arnavsinghvi11 May 6, 2024

arnavsinghvi11 May 6, 2024

arnavsinghvi11 May 6, 2024

arnavsinghvi11 May 6, 2024

arnavsinghvi11 May 6, 2024

arnavsinghvi11 commented May 6, 2024

thomasahle commented Jun 14, 2024



		class Evaluate(BaseModel):
		# Example is not yet Pydantic valid



		def answer_exact_match(example: Example, pred: Prediction, frac: float = 0.90, _args, *_kwargs) -> bool:
		if not isinstance(example.answer, (str, list)):



		def answer_passage_match(example: Example, pred: Prediction, _args, *_kwargs):
		if not isinstance(example.answer, (str, list)):

Make dspy.Evaluate Typed and DSPy Only (No dsp) #889

Make dspy.Evaluate Typed and DSPy Only (No dsp) #889

Conversation

KCaverly commented Apr 22, 2024

arnavsinghvi11 May 6, 2024

Choose a reason for hiding this comment

arnavsinghvi11 May 6, 2024

Choose a reason for hiding this comment

arnavsinghvi11 May 6, 2024

Choose a reason for hiding this comment

arnavsinghvi11 May 6, 2024

Choose a reason for hiding this comment

arnavsinghvi11 May 6, 2024

Choose a reason for hiding this comment

arnavsinghvi11 May 6, 2024

Choose a reason for hiding this comment

arnavsinghvi11 commented May 6, 2024

thomasahle commented Jun 14, 2024