Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mypy, fix types for mypy #93

Closed
wants to merge 9 commits into from

Conversation

thejaminator
Copy link
Collaborator

@thejaminator thejaminator commented Feb 21, 2023

Closes #72

I understand that pyright is already added. If you guys choose to go ahead with pyright instead of mypy, this MR can be closed.

assert isinstance(ds_dict, DatasetDict)
_ds_dict = load_dataset(*data_path) # type: ignore
assert isinstance(_ds_dict, DatasetDict)
ds_dict: DatasetDict = _ds_dict
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately mypy needs this "dance"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this seems sufficiently ugly that I'm not sure I want it in the code. Is there really no cleaner way of doing the same thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ds_dict["train"].filter(lambda ex: ex[label_column] == i)
for i in range(self.num_classes)
]
else:
self.fewshot_strata: list[Dataset] = []
self.fewshot_strata = []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise mypy will complain that the type already got declared. so the first branch must be the first the declare it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thejaminator Maybe we should just introduce a wrapper around typing.cast that also runtime-asserts the right type? That would actually be cleaner than what we're doing right now in a lot of places (sticking an assert statement right after a declaration). If it's supported by both Mypy and Pyright that'd be great.

ah yeah, basically the assert_is_instance i implement here https://github.com/EleutherAI/elk/pull/93/files/245859cb3d5a74e569e0800c191ee7ef4a02feb7..51db6995fe583d29ae5e3a863fda75724af6ce76#r1113128336

return float(loss)
return (
float(loss) if loss is not None else raise_assertion_error("Loss is None")
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm raising an assertion error here, otherwise we can keep the original implementation of returning torch.inf if num_epochs is somehow 0. I think its better to raise earlier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just raise before the for-loop even starts if num_epochs < 1. Would that fix the mypy error?

Copy link
Collaborator Author

@thejaminator thejaminator Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think i can be fixed. because there is no "type-system" thing that says num_epochs will be a positive integer >= 1. so even if you raise before that, you can't prove to the type system that "this will definitely loop one time".

IDK why pyright doesn't complain though. Probably because it previously was torch.inf, and pyright infered that float works with that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok actually i'll try it out by reverting to torch.inf

mypy.ini Outdated
ignore_missing_imports = True

[mypy-promptsource.*]
ignore_missing_imports = True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config for mypy

@norabelrose
Copy link
Member

Thanks for the PR, I think we definitely do want to stick with Pyright as our primary type checker but it would be a bonus if the repo also type checks under Mypy. I just have mild code style concerns about having to re-declare variables in order to narrow the type. If there's some way around that then I'd be happy to merge this PR I think.

@norabelrose
Copy link
Member

@thejaminator Maybe we should just introduce a wrapper around typing.cast that also runtime-asserts the right type? That would actually be cleaner than what we're doing right now in a lot of places (sticking an assert statement right after a declaration). If it's supported by both Mypy and Pyright that'd be great.

assert isinstance(_ds_dict, DatasetDict)
ds_dict: DatasetDict = _ds_dict

ds_dict: DatasetDict = assert_is_instance(load_dataset(*data_path), DatasetDict) # type: ignore[arg-type]

Copy link
Collaborator Author

@thejaminator thejaminator Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@norabelrose another way is to make the assertion a oneliner. (the # type: ignore[arg-type] isn't coming from assert_is_instance, its coming from load_dataset(*data_path) )

logit1: Tensor,
confidence: float,
base: float,
) -> Tensor:
p0, neg_p1 = logit0.sigmoid(), 1 - logit1.sigmoid()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix so that js_loss has the exact same type signature

@thejaminator
Copy link
Collaborator Author

thejaminator commented Feb 21, 2023

Thanks for the PR, I think we definitely do want to stick with Pyright as our primary type checker but it would be a bonus if the repo also type checks under Mypy. I just have mild code style concerns about having to re-declare variables in order to narrow the type. If there's some way around that then I'd be happy to merge this PR I think.

I would hesitate against then making mypy in the pipeline because it adds more load for PR creators to having both pyright and mypy happy :D

In my experience, pyright is more polished (less bugs). But the downside is that you can't ´extend it with type plugins for it. But I don't think you need plugins in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add automatic type checking to the linting pipeline
2 participants