Performance improvement in ToTensor GPU Kernel #14099

sandeep-krishnamurthy · 2019-02-08T22:01:31Z

Description

Earlier, we were using Kernel Launch/Map way of launching kernel to write common CPU and GPU code for ToTensor operator. However, I observed there are too many threads and blocks being launched with kernel causing significant performance implication.

To overcome, I wrote a separate CUDA kernel for GPU and moved out of Kernel launch/map.
Benchmarks below.

Benchmarks

Ran 1000 ToTensor operation on (512, 512, 3)

GPU
Before

('Average time per ToTensor 512,512,3 - ', 39.17948246002197)

After

('Average time per ToTensor 512,512,3 - ', 0.44632863998413086)

CPU

Before
('Average time per ToTensor 512,512,3 - ', 3.7258052825927734)
After
('Averagetime per ToTensor 512,512,3 - ', 1.8473007678985596)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Code is well-documented:
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Remove Kernel Launch/Map for ToTensor operator
Make independent kernel for CPU ToTensor
Add 2 separate CUDA kernel for ToTensor operator (3D and 4D) inputs.

@zhreshold @stu1130

vandanavk · 2019-02-08T22:18:13Z

@mxnet-label-bot add [pr-work-in-progress, Operator, Performance]

src/operator/image/image_random-inl.h

zhreshold · 2019-02-11T19:54:57Z

LGTM now, thanks for the efforts!

* CPU implementation without Kernel launch/map * Optimal CUDA support for 3D ToTensor operator * Add CUDA kernel for 4D inputs * Fix failing CPU tests for totensor * disable warning on windows * try fix in instance norm windows build failure * Guard omp parallel collapse for windows * Remove warning supression to check if it is ok * fix lint issues * Address code review comments

marcoabreu added Operator Performance pr-work-in-progress PR is still work in progress labels Feb 8, 2019

sandeep-krishnamurthy changed the title ~~[WIP] Performance improvement in ToTensor GPU Kernel~~ Performance improvement in ToTensor GPU Kernel Feb 9, 2019

sandeep-krishnamurthy added pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress labels Feb 9, 2019

sandeep-krishnamurthy added 4 commits February 9, 2019 20:44

CPU implementation without Kernel launch/map

7f09cd1

Optimal CUDA support for 3D ToTensor operator

68eb95d

Add CUDA kernel for 4D inputs

d1c6faa

Fix failing CPU tests for totensor

caa489f

sandeep-krishnamurthy force-pushed the totensor_gpu_perf branch from 00d95bb to caa489f Compare February 10, 2019 04:45

sandeep-krishnamurthy added 5 commits February 9, 2019 22:59

disable warning on windows

d58f50c

try fix in instance norm windows build failure

41b7a8c

Guard omp parallel collapse for windows

5b8732a

Remove warning supression to check if it is ok

9562f57

fix lint issues

fc630eb

zhreshold suggested changes Feb 11, 2019

View reviewed changes

src/operator/image/image_random-inl.h Show resolved Hide resolved

Address code review comments

58c6801

zhreshold approved these changes Feb 11, 2019

View reviewed changes

sandeep-krishnamurthy merged commit ab5a0cf into apache:master Feb 11, 2019

sandeep-krishnamurthy mentioned this pull request Feb 12, 2019

Performance improvement in Normalize GPU Kernel #14139

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement in ToTensor GPU Kernel #14099

Performance improvement in ToTensor GPU Kernel #14099

sandeep-krishnamurthy commented Feb 8, 2019 •

edited

Loading

vandanavk commented Feb 8, 2019

zhreshold commented Feb 11, 2019

Performance improvement in ToTensor GPU Kernel #14099

Performance improvement in ToTensor GPU Kernel #14099

Conversation

sandeep-krishnamurthy commented Feb 8, 2019 • edited Loading

Description

Checklist

Essentials

Changes

vandanavk commented Feb 8, 2019

zhreshold commented Feb 11, 2019

sandeep-krishnamurthy commented Feb 8, 2019 •

edited

Loading