[v1.7.x] cherry pick #17741 to v1.7.x #18113

ptrendx · 2020-04-20T16:12:45Z

Vectorized loads for binary elemwise kernel
More generalization
Add backwardusenone
Remove the unused _backward_add op
Add vectorized backwardusein
Extending vectorization to more binary ops, binary ops with scalar and
unary ops
Handling ElementwiseSum
Get rid of half2 in mshadow
Remove backward_elemwiseaddex
Revert "Remove the unused _backward_add op"

This reverts commit f86da86.

Revert "Remove backward_elemwiseaddex"

This reverts commit 7729114.

Add back the backward_add since C++ test relies on it
Test bcast implementations
First version of vecotrized bcast
Adding single side vectorized bcast kernel
Removing debug prints
Actually run the single side kernel
Move the default implementation of bcast to the vectorized one
Limit the new implementation to GPU only
Enabling vectorization when broadcast does not actually do broadcast
Cleaning
Cleaning part 2
Fix for numpy ops using stuff from broadcast
Fix
Fix lint
Try to debug pinv numpy test
Fix
Fix the vectorized broadcast implementation for misaligned input
pointers
Added tests
Added docs to cuda_vectorization.cuh
Another fix for broadcast and fix INT64 compilation
Optimize for aligned=true
1 more addition to test
Reverting the change to Numpy op test
Trying mcmodel=medium to fix the failure in CMake static build
Revert "Trying mcmodel=medium to fix the failure in CMake static build"

This reverts commit 1af684c.

Limiting the PR to just elementwise ops

@ciyongch

* Vectorized loads for binary elemwise kernel * More generalization * Add backwardusenone * Remove the unused _backward_add op * Add vectorized backwardusein * Extending vectorization to more binary ops, binary ops with scalar and unary ops * Handling ElementwiseSum * Get rid of half2 in mshadow * Remove backward_elemwiseaddex * Revert "Remove the unused _backward_add op" This reverts commit f86da86. * Revert "Remove backward_elemwiseaddex" This reverts commit 7729114. * Add back the backward_add since C++ test relies on it * Test bcast implementations * First version of vecotrized bcast * Adding single side vectorized bcast kernel * Removing debug prints * Actually run the single side kernel * Move the default implementation of bcast to the vectorized one * Limit the new implementation to GPU only * Enabling vectorization when broadcast does not actually do broadcast * Cleaning * Cleaning part 2 * Fix for numpy ops using stuff from broadcast * Fix * Fix lint * Try to debug pinv numpy test * Fix * Fix the vectorized broadcast implementation for misaligned input pointers * Added tests * Added docs to cuda_vectorization.cuh * Another fix for broadcast and fix INT64 compilation * Optimize for aligned=true * 1 more addition to test * Reverting the change to Numpy op test * Trying mcmodel=medium to fix the failure in CMake static build * Revert "Trying mcmodel=medium to fix the failure in CMake static build" This reverts commit 1af684c. * Limiting the PR to just elementwise ops

mxnet-bot · 2020-04-20T16:12:50Z

Hey @ptrendx , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, centos-gpu, website, edge, windows-gpu, centos-cpu, windows-cpu, unix-cpu, sanity, miscellaneous, unix-gpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

ptrendx · 2020-04-20T20:41:21Z

@mxnet-bot run ci [windows-gpu]

mxnet-bot · 2020-04-20T20:41:30Z

Jenkins CI successfully triggered : [windows-gpu]

ciyongch · 2020-04-21T02:07:33Z

@ptrendx thanks for backporting the PR to v1.7.x branch, suppose the original PR is #18095 in v1.x, and #17767 in master, right?

BTW, do you have any other pending PRs that would like to be included in 1.7.0 release?

ptrendx · 2020-04-21T02:10:12Z

I don't have anything else, I know @samskalicky wanted to include his custom graph pass PR.

ptrendx · 2020-04-21T02:11:19Z

Just noticed that I made a mistake in PR name about hich PR I'm backporting, my bad :-P

ciyongch · 2020-04-21T02:18:05Z

Thanks @ptrendx , just want to make sure we've all we need in 1.7.0 :)
If custom graph pass PR refer to #18069, then it's already merged in both v1.x and v1.7.x.

samskalicky · 2020-04-21T04:13:19Z

If custom graph pass PR refer to #18069, then it's already merged in both v1.x and v1.7.x.

Almost done with #17885, as soon as CI passes and I get one more review i'll backport to 1.x and 1.7.x

ChaiBapchya · 2020-09-20T03:02:15Z

@ptrendx can we please rename the PR since its incorrect? What should be the correct name?

ptrendx merged commit 7c63f56 into apache:v1.7.x Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.7.x] cherry pick #17741 to v1.7.x #18113

[v1.7.x] cherry pick #17741 to v1.7.x #18113

ptrendx commented Apr 20, 2020

mxnet-bot commented Apr 20, 2020

ptrendx commented Apr 20, 2020

mxnet-bot commented Apr 20, 2020

ciyongch commented Apr 21, 2020

ptrendx commented Apr 21, 2020

ptrendx commented Apr 21, 2020

ciyongch commented Apr 21, 2020

samskalicky commented Apr 21, 2020

ChaiBapchya commented Sep 20, 2020

[v1.7.x] cherry pick #17741 to v1.7.x #18113

[v1.7.x] cherry pick #17741 to v1.7.x #18113

Conversation

ptrendx commented Apr 20, 2020

mxnet-bot commented Apr 20, 2020

ptrendx commented Apr 20, 2020

mxnet-bot commented Apr 20, 2020

ciyongch commented Apr 21, 2020

ptrendx commented Apr 21, 2020

ptrendx commented Apr 21, 2020

ciyongch commented Apr 21, 2020

samskalicky commented Apr 21, 2020

ChaiBapchya commented Sep 20, 2020