Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IterableDataset #19228

Closed
wants to merge 18 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address comments
  • Loading branch information
ssnl committed Jun 20, 2019
commit 6a908d0d933833681a7650eba78d3e87232e24d4
19 changes: 7 additions & 12 deletions torch/utils/data/dataloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,17 +95,12 @@ class DataLoader(object):
:ref:`multiprocessing-best-practices` on more details related
to multiprocessing in PyTorch.

.. note:: ``len(dataloader)`` is determined by length of the sampler used.
When :attr:`dataset` is an :class:`~torch.utils.data.IterableDataset`,
.. note:: ``len(dataloader)`` heuristic based on the length of the sampler used.
When :attr:`dataset` is a subclass of :class:`~torch.utils.data.IterableDataset`,
an infinite sampler is used, whose :meth:`__len__` is not
implemented. With :class:`~torch.utils.data.IterableDataset` the
actual iterator size depends on :attr:`num_workers`.
If :attr:`num_workers > 0` (i.e., multi-process loading), each
worker gets a copy of the same iterable dataset object and can
return duplicate data, unless the dataset copies and/or the
workers are configured differently (e.g., in :meth:`__iter__`).
See :class:`~torch.utils.data.IterableDataset` for more details
and examples.
implemented. So one should not query this method unless they work
with a map-style dataset. See `Dataset Types`_ for more details on
these two types of dataset.
"""

__initialized = False
Expand Down Expand Up @@ -612,9 +607,9 @@ def __init__(self, loader):

self.index_queues = []
self.workers = []
# A list of booleans representing whether each worker still has word to
# A list of booleans representing whether each worker still has work to
# do, i.e., not having exhausted its iterable dataset object. It always
# contains all `True`s if not using an iterable dataset
# contains all `True`s if not using an iterable-style dataset
# (i.e., if kind != Iterable).
self.workers_status = []
for i in range(self.num_workers):
Expand Down