Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Lack of Default __len__ in Python Abstract Base Classes #122410

Open
kuraga opened this issue Mar 21, 2024 · 1 comment
Open

On Lack of Default __len__ in Python Abstract Base Classes #122410

kuraga opened this issue Mar 21, 2024 · 1 comment
Labels
module: data torch.utils.data module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@kuraga
Copy link
Contributor

kuraga commented Mar 21, 2024

Thoughts about # NOTE [ Lack of Default __len__ in Python Abstract Base Classes ].

TL;DR:

  1. We don't have the __len__ ("abstract") methods but seems like they could be (via a TypeError or an abstract method implementations).
  2. We don't have the __len__ docstrings but they could be (without the method itself).
  3. We don't point in the documentation datasets have the size. (Or they don't?)

We have https://github.com/pytorch/pytorch/blob/v2.2.1/torch/utils/data/sampler.py#L70-L95 (introduced in #19228):

# NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]
#
# Many times we have an abstract class representing a collection/iterable of
# data, e.g., `torch.utils.data.Sampler`, with its subclasses optionally
# implementing a `__len__` method. In such cases, we must make sure to not
# provide a default implementation, because both straightforward default
# implementations have their issues:
#
#   + `return NotImplemented`:
#     Calling `len(subclass_instance)` raises:
#       TypeError: 'NotImplementedType' object cannot be interpreted as an integer

But what should return len(subclass_instance) without an implemented length value?
No matter, on other hand, NotImplemented is for binary operators only.
Let's remove this part of the comment?

#
#   + `raise NotImplementedError()`:
#     This prevents triggering some fallback behavior. E.g., the built-in
#     `list(X)` tries to call `len(X)` first, and executes a different code
#     path if the method is not found or `NotImplemented` is returned, while
#     raising a `NotImplementedError` will propagate and make the call fail
#     where it could have used `__iter__` to complete the call.

This comment says we should write list(iter(X)) instead of list(X). (See the list cconstructor documentation.)

#
# Thus, the only two sensible things to do are
#
#   + **not** provide a default `__len__`.
#
#   + raise a `TypeError` instead, which is what Python uses when users call
#     a method that is not defined on an object.
#     (@ssnl verifies that this works on at least Python 3.7.)

It is a good idea, isn't it? This comment says it is the correct one.

More options:

This comment suggest to make an abstract method.

Are we able to add the documentation stub (without the method itself)?

Thanks!

cc @svekars @brycebortree @VitalyFedyunin @ejguan @dzhulgakov @ssnl

@soulitzer soulitzer added module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: data torch.utils.data labels Mar 22, 2024
@kuraga
Copy link
Contributor Author

kuraga commented Apr 2, 2024

Didn't see but #47055 is related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: data torch.utils.data module: docs Related to our documentation, both in docs/ and docblocks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants