Skip to content

Commit

Permalink
Add IterableDataset (pytorch#19228)
Browse files Browse the repository at this point in the history
Summary:
This is a modified version of pytorch#14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes pytorch#17909, pytorch#18096, pytorch#19946, and some of pytorch#13023
Pull Request resolved: pytorch#19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
  • Loading branch information
ssnl authored and facebook-github-bot committed Jun 21, 2019
1 parent d4119f8 commit 058beae
Show file tree
Hide file tree
Showing 15 changed files with 1,752 additions and 443 deletions.
399 changes: 398 additions & 1 deletion docs/source/data.rst

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/source/notes/cuda.rst
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ also preserve :class:`torch.device` and :class:`torch.dtype` of a Tensor).
y_cpu = torch.ones_like(x_cpu)
y_gpu = torch.zeros_like(x_gpu)

.. _cuda-memory-pinning:

Use pinned memory buffers
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
4 changes: 4 additions & 0 deletions docs/source/notes/multiprocessing.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _multiprocessing-best-practices:

Multiprocessing best practices
==============================

Expand All @@ -20,6 +22,8 @@ memory and will only send a handle to another process.
This allows to implement various training methods, like Hogwild, A3C, or any
others that require asynchronous operation.

.. _multiprocessing-cuda-note:

CUDA in multiprocessing
-----------------------

Expand Down
Loading

0 comments on commit 058beae

Please sign in to comment.