Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add length and padding keyworks to DistributedSampler #28841

Commits on Oct 29, 2019

  1. Add length and padding keyworks to DistributedSampler

    Current implementation of `DistributedSampler` is ideal for distributed
    training using map datasets, as they fit in memory and have known size.
    However, it doesn't support distributed training using `IterableDataset`
    datasets, as these classes do not implement `__len__`.
    To fix that, a `length` keyword was added to `DistributedSampler`, which
    has precedence when set.
    
    An extra `padding=True` parameter was also added was give finer control
    on whether the (returned) index list should be padded by the sampler.
    This is useful for preventing duplicate reading on `IterableDataset`
    datasets that do not fit in memory or which data reading or transformation
    are expensive.
    
    Finally, set_rank method was added, similarly the existing `set_epoch`,
    to ease distributed training. When `DataLoader` is created with
    `num_workers` > 0 and `dataset` is an instance of `ChunkDataset`,
    a copy of `DistributedSampler` on each worker needs to be configured
    with their new rank.
    
    There is no back compatibility with this change.
    Thiago Crepaldi committed Oct 29, 2019
    Configuration menu
    Copy the full SHA
    44b631e View commit details
    Browse the repository at this point in the history
  2. Fix sampler __len__ method

    Thiago Crepaldi committed Oct 29, 2019
    Configuration menu
    Copy the full SHA
    e2ea293 View commit details
    Browse the repository at this point in the history
  3. Address comments

    Thiago Crepaldi committed Oct 29, 2019
    Configuration menu
    Copy the full SHA
    77c2046 View commit details
    Browse the repository at this point in the history
  4. Improve error msg

    Thiago Crepaldi committed Oct 29, 2019
    Configuration menu
    Copy the full SHA
    a1ce4d8 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2019

  1. Fix assert for padding scenarios

    Thiago Crepaldi committed Nov 8, 2019
    Configuration menu
    Copy the full SHA
    2a96465 View commit details
    Browse the repository at this point in the history