[numpy] add op matmul #16990

JiangZhaoh · 2019-12-06T05:51:00Z

Description

unary op: matmul
Because matmul has a special signature (i.e. (n?,k),(k,m?)->(n?,m?)) in official numpy, I comment out the following code segment in python/mxnet/numpy/multiarray.py :
if ufunc.signature is not None:
# we don't support generalised-ufuncs (gufuncs)
return NotImplemented

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

src/operator/numpy/np_matmul_op-inl.h

sxjscience · 2019-12-09T03:30:59Z

I think you may call the BatchedGEMMStrided Kernel directly: https://devblogs.nvidia.com/cublas-strided-batched-matrix-multiply/ Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: JiangZhaoh <[email protected]> Sent: Sunday, December 8, 2019 6:04:32 PM To: apache/incubator-mxnet <[email protected]> Cc: Xingjian SHI <[email protected]>; Comment <[email protected]> Subject: Re: [apache/incubator-mxnet] [numpy] add op matmul (#16990) @JiangZhaoh commented on this pull request.

________________________________ In src/operator/numpy/np_matmul_op-inl.h<#16990 (comment)>:

+ It is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

+ * \param out - output: insert 'value' to 'arr' according to 'index'. + * \param a - input: the first argument. + * \param b - input: the second argument. + * \param ndim - ndim of a, b and output. Because of broadcast, regard their ndim as equal. + */ + template<typename DType> + MSHADOW_XINLINE static void Map(int i, DType* out, + const DType* a, const DType* b, + const mshadow::Shape<10> a_stride, + const mshadow::Shape<10> b_stride, + const mshadow::Shape<10> out_stride, + const mshadow::Shape<10> a_shape, + const mshadow::Shape<10> b_shape, + const mshadow::Shape<10> out_shape, + const size_t ndim){ You may refer to the implementation of batch_dot. There is no need to store the strides and shapes as Shape<10>. You can reshape the array to a 3D array and just dot the last two dimensions: https://github.com/apache/incubator-mxnet/blob/1aa1b5a9ab53bb57a3c653793fb824d01f2d5e81/src/operator/tensor/dot-inl.h#L1348-L1409 Thanks for your advice. But, may I ask how could I broadcast the shape if I don't store strides and shapes? e.g. Matrix A in shape (2, 1, 3, 4, 5) and matrix B in shape (3, 1, 5, 2), C=np.matmul(A, B) would broadcast A and B to shape (2, 3, 3, 4, 5) and (3, 3, 5, 2) respectively, and C's shape would be (2, 3, 3, 4, 2). If I didn't store shape and stride, I think I should copy the content in each array to get the consistent shape firstly, and use your method then. Is this your mean? — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#16990>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHQH3R4ZQCQQAG34ABFOA3QXWRTBANCNFSM4JWMPQEA>.

JiangZhaoh · 2019-12-09T08:53:12Z

I think you may call the BatchedGEMMStrided Kernel directly: https://devblogs.nvidia.com/cublas-strided-batched-matrix-multiply/ Get Outlook for iOShttps://aka.ms/o0ukef
________________________________ From: JiangZhaoh [email protected] Sent: Sunday, December 8, 2019 6:04:32 PM To: apache/incubator-mxnet [email protected] Cc: Xingjian SHI [email protected]; Comment [email protected] Subject: Re: [apache/incubator-mxnet] [numpy] add op matmul (#16990) @JiangZhaoh commented on this pull request.
________________________________ In src/operator/numpy/np_matmul_op-inl.h<#16990 (comment)>:

It is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

\param out - output: insert 'value' to 'arr' according to 'index'. + * \param a - input: the first argument. + * \param b - input: the second argument. + * \param ndim - ndim of a, b and output. Because of broadcast, regard their ndim as equal. + / + template + MSHADOW_XINLINE static void Map(int i, DType out, + const DType* a, const DType* b, + const mshadow::Shape<10> a_stride, + const mshadow::Shape<10> b_stride, + const mshadow::Shape<10> out_stride, + const mshadow::Shape<10> a_shape, + const mshadow::Shape<10> b_shape, + const mshadow::Shape<10> out_shape, + const size_t ndim){ You may refer to the implementation of batch_dot. There is no need to store the strides and shapes as Shape<10>. You can reshape the array to a 3D array and just dot the last two dimensions: https://github.com/apache/incubator-mxnet/blob/1aa1b5a9ab53bb57a3c653793fb824d01f2d5e81/src/operator/tensor/dot-inl.h#L1348-L1409 Thanks for your advice. But, may I ask how could I broadcast the shape if I don't store strides and shapes? e.g. Matrix A in shape (2, 1, 3, 4, 5) and matrix B in shape (3, 1, 5, 2), C=np.matmul(A, B) would broadcast A and B to shape (2, 3, 3, 4, 5) and (3, 3, 5, 2) respectively, and C's shape would be (2, 3, 3, 4, 2). If I didn't store shape and stride, I think I should copy the content in each array to get the consistent shape firstly, and use your method then. Is this your mean? — You are receiving this because you commented. Reply to this email directly, view it on GitHub<[numpy] add op matmul #16990?email_source=notifications&email_token=ABHQH3VQJPJ2ICENM5STTDTQXWRTBA5CNFSM4JWMPQEKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCOLORVI#discussion_r355237851>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHQH3R4ZQCQQAG34ABFOA3QXWRTBANCNFSM4JWMPQEA.

May I confirm your mean again?
Firstly, I broadcast the input arrays into a consistent shape manually by copy their content.
Then, I use the kernel you mentioned above to implement forward compute.
Right?

sxjscience

After offline discussion with @haojin2 , we think that we could add transpose flags (like in batch_dot) to matmul and make it an internal operator. So we can implement the backward pass of matmul by constructing new matmul nodes.

Resolved

python/mxnet/numpy/multiarray.py

python/mxnet/symbol/numpy/_symbol.py

src/operator/numpy/np_matmul_op-inl.h

tests/python/unittest/test_numpy_op.py

src/operator/numpy/np_matmul_op-inl.h

tests/python/unittest/test_numpy_op.py

src/operator/numpy/np_matmul_op-inl.h

src/operator/numpy/np_matmul_op.cc

python/mxnet/numpy/multiarray.py

haojin2

Please fix the sanity problems, and address open comments

src/operator/numpy/np_matmul_op-inl.h

tests/python/unittest/test_numpy_op.py

fix interoperability error and some unimportant details remove needless addition

resolve comment

* add op matmul - for interoperability error fix interoperability error and some unimportant details remove needless addition * fix website error * use BatchedGEMM * check req type / sanity / logic etc. * resolve identation mistakes * fix doc mistakes resolve comment

haojin2 added the Numpy label Dec 6, 2019

JiangZhaoh force-pushed the add_op_matmul branch from a8317b7 to be196c2 Compare December 6, 2019 10:14

sxjscience previously requested changes Dec 6, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

haojin2 added this to In progress in numpy via automation Dec 6, 2019

JiangZhaoh marked this pull request as ready for review December 9, 2019 02:08

JiangZhaoh requested a review from szha as a code owner December 9, 2019 02:08

sxjscience reviewed Dec 12, 2019

View reviewed changes

JiangZhaoh force-pushed the add_op_matmul branch from be196c2 to 7088599 Compare December 16, 2019 03:27

haojin2 requested a review from sxjscience December 16, 2019 06:39

numpy automation moved this from In progress to Needs review Dec 16, 2019

haojin2 reviewed Dec 17, 2019

View reviewed changes

python/mxnet/numpy/multiarray.py Show resolved Hide resolved

haojin2 reviewed Dec 17, 2019

View reviewed changes

python/mxnet/symbol/numpy/_symbol.py Outdated Show resolved Hide resolved

haojin2 reviewed Dec 17, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

haojin2 reviewed Dec 17, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

sxjscience reviewed Dec 17, 2019

View reviewed changes

tests/python/unittest/test_numpy_op.py Outdated Show resolved Hide resolved

sxjscience reviewed Dec 18, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

sxjscience reviewed Dec 18, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

sxjscience reviewed Dec 18, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

sxjscience reviewed Dec 18, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

sxjscience reviewed Dec 18, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

haojin2 reviewed Dec 18, 2019

View reviewed changes

tests/python/unittest/test_numpy_op.py Outdated Show resolved Hide resolved

haojin2 reviewed Dec 18, 2019

View reviewed changes

tests/python/unittest/test_numpy_op.py Outdated Show resolved Hide resolved

haojin2 reviewed Dec 20, 2019

View reviewed changes

tests/python/unittest/test_numpy_op.py Outdated Show resolved Hide resolved

haojin2 reviewed Dec 20, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

haojin2 reviewed Dec 20, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Show resolved Hide resolved

haojin2 reviewed Dec 20, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

haojin2 reviewed Dec 20, 2019

View reviewed changes

src/operator/numpy/np_matmul_op.cc Outdated Show resolved Hide resolved

haojin2 reviewed Dec 20, 2019

View reviewed changes

python/mxnet/numpy/multiarray.py Outdated Show resolved Hide resolved

haojin2 reviewed Dec 20, 2019

View reviewed changes

JiangZhaoh force-pushed the add_op_matmul branch from 7088599 to 7900894 Compare December 25, 2019 07:39

haojin2 reviewed Dec 26, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

haojin2 reviewed Dec 26, 2019

View reviewed changes

src/operator/numpy/np_matmul_op-inl.h Outdated Show resolved Hide resolved

JiangZhaoh force-pushed the add_op_matmul branch 2 times, most recently from e2e90f4 to b4bf7b4 Compare December 30, 2019 04:09

haojin2 reviewed Jan 7, 2020

View reviewed changes

tests/python/unittest/test_numpy_op.py Outdated Show resolved Hide resolved

JiangZhaoh force-pushed the add_op_matmul branch from b4bf7b4 to 7df0e89 Compare January 21, 2020 06:02

JiangZhaoh force-pushed the add_op_matmul branch 2 times, most recently from df65e77 to 66e932e Compare February 11, 2020 02:43

JiangZhaoh added 6 commits February 12, 2020 17:08

add op matmul - for interoperability error

b8eeb12

fix interoperability error and some unimportant details remove needless addition

fix website error

c1b0a48

use BatchedGEMM

99c02e2

check req type / sanity / logic etc.

cb02311

resolve identation mistakes

657508c

fix doc mistakes

aba8816

resolve comment

JiangZhaoh force-pushed the add_op_matmul branch from 66e932e to aba8816 Compare February 12, 2020 17:09

haojin2 merged commit f5a1014 into apache:master Feb 13, 2020

numpy automation moved this from Needs review to Done Feb 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[numpy] add op matmul #16990

[numpy] add op matmul #16990

JiangZhaoh commented Dec 6, 2019 •

edited

Loading

sxjscience commented Dec 9, 2019 via email

JiangZhaoh commented Dec 9, 2019

sxjscience left a comment

haojin2 left a comment

[numpy] add op matmul #16990

[numpy] add op matmul #16990

Conversation

JiangZhaoh commented Dec 6, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

sxjscience commented Dec 9, 2019 via email

JiangZhaoh commented Dec 9, 2019

sxjscience left a comment

Choose a reason for hiding this comment

haojin2 left a comment

Choose a reason for hiding this comment

JiangZhaoh commented Dec 6, 2019 •

edited

Loading