Add pin_device_id option to Gluon DataLoader #14136

yuxihu · 2019-02-12T21:32:28Z

This PR adds a new option pin_device_id to Gluon DataLoader. The pin_device_id will be used to allocate pinned memory if pin_memory is True. This option is needed if we want to use pinned memory in DataLoader for distributed training with MXNet and Horovod. Otherwise, multiple training processes will allocate memory on a single device and then cause out of memory error. The default value for pin_device_id is 0 which is the same with the current behavior.

yuxihu · 2019-02-12T21:33:39Z

@apeforest @eric-haibin-lin @zhreshold Please review. Thanks.

yuxihu · 2019-02-12T21:34:24Z

@mxnet-label-bot add [Gluon, pr-awaiting-review]

roywei

Thanks for the contribution, I recently had some issue with Gluon DataLoader and found pin_memory useful. Could you share how to use pin_memory and pin_device_id option together? Thanks!

zhreshold · 2019-02-12T22:13:58Z

@yuxihu Does it make any difference when you specify device_id for cpu_pinned context?

yuxihu · 2019-02-12T22:14:25Z

@roywei If your script runs with a single process, you can just set pin_memory=True in your script without worrying about the pin_device_id, which default value is 0, as of now. If you have multiple processes, you'd better set pin_device_id to use different devices for each process to avoid out of memory error. One such use case is distributed training using MXNet with Horovod. You can set pin_device_id=hvd.local_rank(), similar to the usage of ImageRecordIter here.

yuxihu · 2019-02-12T22:21:14Z

@zhreshold In Horovod case, each training process is attached to a GPU. If we do not specify device_id for the cpu_pinned context, all processes will use the memory in GPU 0 (because the default device_id for cpu_pinned context is 0) and cause out of memory error. I had a similar enhancement for ImageRecordIter.

eric-haibin-lin

Can we have a basic unit test to check the output context?

zhreshold · 2019-02-12T22:24:28Z

@yuxihu Thanks for the clarification and refer link to the other merged feature. LGTM

yuxihu · 2019-02-12T22:29:27Z

@eric-haibin-lin Let me try to add one.

ChaiBapchya

Thanks for your contribution.
After adding the unit-test subject to it passing all CI tests, LGTM!.
Also thanks for the explanation on how to use pin_device_id, wondering if this can be documented somewhere for easy reference.

yuxihu · 2019-02-13T06:05:42Z

@marcoabreu Looks like the windows-gpu run passed but hang. Do I have to retrigger the CI?

yuxihu · 2019-02-13T17:31:30Z

@eric-haibin-lin please help review and merge. thanks.

yuxihu · 2019-02-13T19:34:08Z

@mxnet-label-bot update [Gluon, pr-awaiting-merge]

eric-haibin-lin · 2019-02-13T21:13:55Z

lgtm

apeforest

LGTM

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

yuxihu requested a review from szha as a code owner February 12, 2019 21:32

marcoabreu added Gluon pr-awaiting-review PR is waiting for code review labels Feb 12, 2019

szha requested a review from zhreshold February 12, 2019 21:53

roywei reviewed Feb 12, 2019

View reviewed changes

eric-haibin-lin reviewed Feb 12, 2019

View reviewed changes

yuxihu added 2 commits February 12, 2019 19:53

add pin_device_id option to DataLoader

d01136b

add unit test to check output context

f1a5aa6

yuxihu force-pushed the data_loader branch from 18badaa to f1a5aa6 Compare February 13, 2019 03:53

ChaiBapchya approved these changes Feb 13, 2019

View reviewed changes

trigger CI

894cb90

zhreshold approved these changes Feb 13, 2019

View reviewed changes

marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Feb 13, 2019

apeforest approved these changes Feb 13, 2019

View reviewed changes

zhreshold merged commit 0b1761f into apache:master Feb 13, 2019

yuxihu deleted the data_loader branch February 13, 2019 21:56

apeforest mentioned this pull request Feb 15, 2019

[Flaky Test] Master Python2: MKLDNN-GPU test_operator_gpu.test_multinomial_generator #14158

Closed

stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019

Add pin_device_id option to Gluon DataLoader (apache#14136)

ab9f922

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

jessr92 pushed a commit to jessr92/incubator-mxnet that referenced this pull request Feb 19, 2019

Add pin_device_id option to Gluon DataLoader (apache#14136)

1ab2cac

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

drivanov pushed a commit to drivanov/incubator-mxnet that referenced this pull request Mar 4, 2019

Add pin_device_id option to Gluon DataLoader (apache#14136)

4bb6f3f

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019

Add pin_device_id option to Gluon DataLoader (apache#14136)

e8bdc37

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

yuxihu added a commit to yuxihu/incubator-mxnet that referenced this pull request Apr 22, 2019

Add pin_device_id option to Gluon DataLoader (apache#14136)

5fd2a96

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

szha pushed a commit that referenced this pull request Apr 23, 2019

Add pin_device_id option to Gluon DataLoader (#14136) (#14771)

b7372d3

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

Add pin_device_id option to Gluon DataLoader (apache#14136)

01aabd5

* add pin_device_id option to DataLoader * add unit test to check output context * trigger CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pin_device_id option to Gluon DataLoader #14136

Add pin_device_id option to Gluon DataLoader #14136

yuxihu commented Feb 12, 2019

yuxihu commented Feb 12, 2019

yuxihu commented Feb 12, 2019

roywei left a comment

zhreshold commented Feb 12, 2019

yuxihu commented Feb 12, 2019

yuxihu commented Feb 12, 2019

eric-haibin-lin left a comment

zhreshold commented Feb 12, 2019

yuxihu commented Feb 12, 2019

ChaiBapchya left a comment

yuxihu commented Feb 13, 2019 •

edited

Loading

yuxihu commented Feb 13, 2019

yuxihu commented Feb 13, 2019

eric-haibin-lin commented Feb 13, 2019

apeforest left a comment

Add pin_device_id option to Gluon DataLoader #14136

Add pin_device_id option to Gluon DataLoader #14136

Conversation

yuxihu commented Feb 12, 2019

yuxihu commented Feb 12, 2019

yuxihu commented Feb 12, 2019

roywei left a comment

Choose a reason for hiding this comment

zhreshold commented Feb 12, 2019

yuxihu commented Feb 12, 2019

yuxihu commented Feb 12, 2019

eric-haibin-lin left a comment

Choose a reason for hiding this comment

zhreshold commented Feb 12, 2019

yuxihu commented Feb 12, 2019

ChaiBapchya left a comment

Choose a reason for hiding this comment

yuxihu commented Feb 13, 2019 • edited Loading

yuxihu commented Feb 13, 2019

yuxihu commented Feb 13, 2019

eric-haibin-lin commented Feb 13, 2019

apeforest left a comment

Choose a reason for hiding this comment

yuxihu commented Feb 13, 2019 •

edited

Loading