-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mypy, fix types for mypy #93
Conversation
elk/extraction/prompt_dataset.py
Outdated
assert isinstance(ds_dict, DatasetDict) | ||
_ds_dict = load_dataset(*data_path) # type: ignore | ||
assert isinstance(_ds_dict, DatasetDict) | ||
ds_dict: DatasetDict = _ds_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately mypy needs this "dance"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this seems sufficiently ugly that I'm not sure I want it in the code. Is there really no cleaner way of doing the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ds_dict["train"].filter(lambda ex: ex[label_column] == i) | ||
for i in range(self.num_classes) | ||
] | ||
else: | ||
self.fewshot_strata: list[Dataset] = [] | ||
self.fewshot_strata = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise mypy will complain that the type already got declared. so the first branch must be the first the declare it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thejaminator Maybe we should just introduce a wrapper around
typing.cast
that also runtime-asserts the right type? That would actually be cleaner than what we're doing right now in a lot of places (sticking an assert statement right after a declaration). If it's supported by both Mypy and Pyright that'd be great.
ah yeah, basically the assert_is_instance i implement here https://github.com/EleutherAI/elk/pull/93/files/245859cb3d5a74e569e0800c191ee7ef4a02feb7..51db6995fe583d29ae5e3a863fda75724af6ce76#r1113128336
elk/training/reporter.py
Outdated
return float(loss) | ||
return ( | ||
float(loss) if loss is not None else raise_assertion_error("Loss is None") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm raising an assertion error here, otherwise we can keep the original implementation of returning torch.inf if num_epochs is somehow 0. I think its better to raise earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably just raise before the for-loop even starts if num_epochs < 1
. Would that fix the mypy error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think i can be fixed. because there is no "type-system" thing that says num_epochs will be a positive integer >= 1. so even if you raise before that, you can't prove to the type system that "this will definitely loop one time".
IDK why pyright doesn't complain though. Probably because it previously was torch.inf, and pyright infered that float works with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok actually i'll try it out by reverting to torch.inf
mypy.ini
Outdated
ignore_missing_imports = True | ||
|
||
[mypy-promptsource.*] | ||
ignore_missing_imports = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config for mypy
Thanks for the PR, I think we definitely do want to stick with Pyright as our primary type checker but it would be a bonus if the repo also type checks under Mypy. I just have mild code style concerns about having to re-declare variables in order to narrow the type. If there's some way around that then I'd be happy to merge this PR I think. |
@thejaminator Maybe we should just introduce a wrapper around |
assert isinstance(_ds_dict, DatasetDict) | ||
ds_dict: DatasetDict = _ds_dict | ||
|
||
ds_dict: DatasetDict = assert_is_instance(load_dataset(*data_path), DatasetDict) # type: ignore[arg-type] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@norabelrose another way is to make the assertion a oneliner. (the # type: ignore[arg-type] isn't coming from assert_is_instance, its coming from load_dataset(*data_path)
)
logit1: Tensor, | ||
confidence: float, | ||
base: float, | ||
) -> Tensor: | ||
p0, neg_p1 = logit0.sigmoid(), 1 - logit1.sigmoid() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix so that js_loss has the exact same type signature
I would hesitate against then making mypy in the pipeline because it adds more load for PR creators to having both pyright and mypy happy :D In my experience, pyright is more polished (less bugs). But the downside is that you can't ´extend it with type plugins for it. But I don't think you need plugins in this repo. |
Closes #72
I understand that pyright is already added. If you guys choose to go ahead with pyright instead of mypy, this MR can be closed.