Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docstrings and type hints to Evaluate class. #935

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rpgoldman
Copy link
Contributor

Add docstrings and type hints for the Evaluate class.

Needs review!

@rpgoldman
Copy link
Contributor Author

Still some typing errors:

dspy/evaluate/evaluate.py:17: error: Cannot find implementation or library stub for module named "IPython.display"  [import-not-found]
dspy/evaluate/evaluate.py:147: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [overload-overlap]
dspy/evaluate/evaluate.py:147: error: Overloaded function signatures 1 and 3 overlap with incompatible return types  [overload-overlap]
dspy/evaluate/evaluate.py:161: error: Overloaded function signatures 2 and 3 overlap with incompatible return types  [overload-overlap]
dspy/evaluate/evaluate.py:264: error: "type[tqdm[Any]]" has no attribute "_instances"  [attr-defined]
dspy/evaluate/evaluate.py:303: error: "Series[Any]" not callable  [operator]

Will check these as type permits.

With kind help from the mypy folks, fixed the overload hints for the
`__call__` method on Evaluate.
@rpgoldman rpgoldman changed the title Add docstrings and type hints. Add docstrings and type hints to Evaluate class. May 2, 2024
@rpgoldman
Copy link
Contributor Author

With help from the mypy folks, the type errors are now fixed, and mypy likes the file.

@rpgoldman rpgoldman marked this pull request as ready for review May 2, 2024 16:45

# noinspection PyUnresolvedReferences
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these comments needed? prefer to remove non-code related comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used to quash linter complaints. I'd prefer to keep them so that I don't see the same linter complaints over and over.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't maintain these comments throughout the repo and it's actually fine to flag the linter and add in the changes within the PR. would prefer to remove them.


try:
from IPython.display import HTML
from IPython.display import display as ipython_display
# noinspection PyPackageRequirements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well

except ImportError:
ipython_display = print

def HTML(x):
def HTML(x): # noqa - linters dislike upper case function name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this as well

@@ -29,17 +34,40 @@ def HTML(x):


class Evaluate:
"""
Reification of the process of evaluating a model. Invoked using its
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the evaulate class is for a DSPy program, not a model. Also, can the wording be simplified? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by simplifying the wording. I can't think of a synonym for "reification," if that's what you mean.

Class representing the process of evaluating a DSpy program.

Would that be better?

dspy/evaluate/evaluate.py Outdated Show resolved Hide resolved
score: float, results: list of Prediction
if `return_all_scores` is False and `return_outputs` is True.
score: float
if both flags are false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep consistent with phrasing in 222

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed this. Also one of the lines was wrong, and I added a check (to produce a more understandable error message when required arguments are missing).

@@ -132,8 +257,10 @@ def wrapped_program(example_idx, example):

# increment assert and suggest failures to program's attributes
if hasattr(program, "_assert_failures"):
# noinspection PyProtectedMember
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments

@@ -151,11 +278,14 @@ def wrapped_program(example_idx, example):
if creating_new_thread:
del thread_stacks[threading.get_ident()]

devset = list(enumerate(devset))
tqdm.tqdm._instances.clear()
devset = list(enumerate(devset)) # This actually changes the type of `devset`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, but renamed. See if you are ok with the change, please.

df = df.map(truncate_cell)
else:
df = df.applymap(truncate_cell)
df = df.applymap(truncate_cell) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments

dspy/evaluate/evaluate.py Show resolved Hide resolved
Docstring for the return value had inconsistent wording and one error.
Also added explicit check for missing `metric` and `devset` arguments,
  since these do not have defaults.
@cdowellmdb
Copy link

@arnavsinghvi11 @rpgoldman what is left for this to be merged?

@rpgoldman
Copy link
Contributor Author

@cdowellmdb I think it's good to go. @arnavsinghvi11 didn't like the comments I put in to muffle linters; I prefer to keep them. Is that a show-stopper?

If not, do you need me to merge in the changes since this was pushed?


# noinspection PyUnresolvedReferences
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't maintain these comments throughout the repo and it's actually fine to flag the linter and add in the changes within the PR. would prefer to remove them.

==========
devset: Iterable[Example]
An iterable of Examples
metric: Callable, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link was for L53 above, corresponding to the metric description

dspy/evaluate/evaluate.py Show resolved Hide resolved
self,
program,
metric: Optional[Callable] = None,
devset: Optional[Iterable] = None, # Needs more specific type, if possible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latterUnion[List[bool], List[float]] makes sense for this!

program._assert_failures += dspy.settings.get("assert_failures")
if hasattr(program, "_suggest_failures"):
# noinspection PyProtectedMember
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flagging the comments

devset = list(enumerate(devset))
tqdm.tqdm._instances.clear()
ndevset: List[int, Example] = list(enumerate(devset))
# noinspection PyProtectedMember
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flagging the comments

@arnavsinghvi11
Copy link
Collaborator

Thanks for circling back to this @rpgoldman @cdowellmdb ! Left a few comments back on the review and it should be good to merge once the last parts are resolved! (There's also a merge conflict but I believe rebasing to main should fix this).

The in-line comments are definitely not a show-stopper, but I'd prefer to leave them out from committed code to ensure they catch any changes needed.

@rpgoldman
Copy link
Contributor Author

The in-line comments are definitely not a show-stopper, but I'd prefer to leave them out from committed code to ensure they catch any changes needed.

Not sure I understand this last. What's "they" here? I assume you are saying that we should continue to see these linter warnings. Here's an example of why I disagree:

E.g., I put in # noinspection PyProtectedMember because that piece of code uses a private member:

program._assert_failures += dspy.settings.get("assert_failures")

One of 2 things should happen, IMO:

  1. We add a method like increment_assert_failures to the sort of thing that counts assertion and suggestion failures and use that instead.
  2. We decide we want to continue to use the protected member, and as I do, quash the warning.

I'm ok with either solution, but a third solution, where we just have linters continue shout at us, seems generally bad. Bad because if we end up with lots of these linter shoutings then it's just human nature to start ignoring the linter warnings altogether. And we certainly cannot put "must build cleanly" into our testing process, if we do this.

I have been bitten many times by code repositories that emit lots of warnings, training their developers to ignore linter warnings. Because sooner or later there's a warning that's really important, but that signal is lost in the noise. I'm a strong believer in building clean. If there's a warning, either it's real and you fix it, or it's not real and you quash it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants