On Dataset
, IterableDataset
inheritance
#120139
Labels
module: data
torch.utils.data
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Analysis
In PyTorch
We have (https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html):
So:
1.
Dataset
s should implement__getitem__
.IterableDataset
s should implement__iter__
and__len__
.In Python
Now, let's look at the Python's Collections Abstract Base Classes documentation.
What does Python offer?
Iterable
requires__iter__
.__len__
introduced inSized
.Iterable
s doesn't have__len__
.Container
introduces__contains__
.Reversible
introduces__reversed__
.Collection
just join the superclasses.Sequence
has real implementations of__iter__
,__contains__
,__reversed__
but requires__len__
,__getitem__
.Applying back to PyTorch
What do I see?
IterableDataset
can't be a subclass ofDataset
. (It could be vice versa.)IterableDataset
s shouldn't implement__len__
butDataset
could.Proposition
Variant 1
Ok, we could have:
IterableDataset
__iter__
Dataset
__len__
,__iter__
,__contains__
,__getitem__
,__reversed__
Variant 2
Or even subclass of
Sequence
forDataset
:IterableDataset
__iter__
Dataset
__len__
,__getitem__
See Also
Sequence
could beMapping
I thought a
Dataset
as aSequence
(list
), i.e.int
indices.But strictly saying indices are untyped. So, it could be a
Mapping
(dict
).Type Annotations
I'd appreciate to annotate all the code (see above).
NOTE [ Lack of Default __len__ in Python Abstract Base Classes ]
Thinking on this could solve the nasty https://github.com/pytorch/pytorch/blob/v2.2.1/torch/utils/data/sampler.py#L70-L95.
On that, see also #122410, #47055.
Related Discussions, Issues and Commits
Thoughts
Thoughts? :)
cc @VitalyFedyunin @ejguan @dzhulgakov @ssnl
The text was updated successfully, but these errors were encountered: