use mkl sparse matrix to improve performance #14492

triplekings · 2019-03-21T14:31:29Z

Description

(Brief description on what this PR is about)
use mkl sparse matrix replace origin to improve performance

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

wkcn

Great! Thanks for your contribution!
Is there any test case for the sparse matrix?
Could you please provide the comparison on performance?

pinaraws · 2019-03-25T01:15:20Z

@mxnet-label-bot add[pr-awaiting-response]

pinaraws · 2019-03-25T01:17:41Z

@mxnet-label-bot add[Sparse, MKL]

triplekings · 2019-03-28T08:51:51Z

Great! Thanks for your contribution!
Is there any test case for the sparse matrix?
Could you please provide the comparison on performance?

Thanks for your comments.

testcase: tests/python/unittest/test_sparse_ndarray.py
tests/python/unittest/test_sparse_operator.py

using mkl , sparse dot will speedup 1.3X

With mkl sparse
INFO:logger:Run [7812] Batchs Speed: 941008.89 samples/sec
(mxnet_venv) [root@VNNI-SDP86 scripts_zhenlin]# python myProfileParser.py
Time of each OP:
_contrib_quantized_fully_connected, 4162.701, ms, 0.1776199436763953 , ms/call, 23436, calls, 54.45 %
Concat , 870.905 , ms, 0.11148297491039426 , ms/call, 7812 , calls, 11.39 %
dot , 554.544 , ms, 0.07098617511520737 , ms/call, 7812 , calls, 7.25 %
elemwise_add , 394.928 , ms, 0.05055401945724526 , ms/call, 7812 , calls, 5.17 %
_contrib_quantize , 321.782 , ms, 0.04119073220686124 , ms/call, 7812 , calls, 4.21 %
SoftmaxOutput , 309.333 , ms, 0.03959715821812596 , ms/call, 7812 , calls, 4.05 %
ParallelEmbedding , 266.09 , ms, 0.03406169994879672 , ms/call, 7812 , calls, 3.48 %
CopyCPU2CPU , 244.173 , ms, 0.010418714797747057 , ms/call, 23436, calls, 3.19 %
broadcast_add , 177.62 , ms, 0.022736815156169994 , ms/call, 7812 , calls, 2.32 %
SliceChannel , 151.811 , ms, 0.01943305171530978 , ms/call, 7812 , calls, 1.99 %
slice , 124.944 , ms, 0.007996927803379416 , ms/call, 15624, calls, 1.63 %
DeleteVariable , 66.723 , ms, 0.0021352726574500767, ms/call, 31248, calls, 0.87 %

With mxnet sparse
INFO:logger:Run [7812] Batchs Speed: 915862.99 samples/sec
(mxnet_venv) [root@VNNI-SDP86 scripts_zhenlin]# python myProfileParser.py
Time of each OP:
_contrib_quantized_fully_connected, 4235.11, ms, 0.1807095920805598 , ms/call, 23436, calls, 53.72 %
Concat , 879.878, ms, 0.112631592421915 , ms/call, 7812 , calls, 11.16 %
dot , 724.61 , ms, 0.09275601638504864 , ms/call, 7812 , calls, 9.19 %
elemwise_add , 404.946, ms, 0.05183640552995392 , ms/call, 7812 , calls, 5.14 %
_contrib_quantize , 332.088, ms, 0.0425099846390169 , ms/call, 7812 , calls, 4.21 %
SoftmaxOutput , 280.663, ms, 0.03592716333845366 , ms/call, 7812 , calls, 3.56 %
ParallelEmbedding , 271.335, ms, 0.034733102918586785 , ms/call, 7812 , calls, 3.44 %
CopyCPU2CPU , 244.757, ms, 0.01044363372589179 , ms/call, 23436, calls, 3.10 %
broadcast_add , 181.36 , ms, 0.02321556579621096 , ms/call, 7812 , calls, 2.30 %
SliceChannel , 153.897, ms, 0.019700076804915513 , ms/call, 7812 , calls, 1.95 %
slice , 124.163, ms, 0.007946940604198668 , ms/call, 15624, calls, 1.57 %
DeleteVariable , 50.809 , ms, 0.0021679894179894178, ms/call, 23436, calls, 0.64 %

Total OP Time: 7883.61600000 ms

wkcn

LGTM : ) Thanks for your contribution!
Don't forget to add your name into CONTRIBUTORS.md : )

wkcn · 2019-03-28T09:31:17Z

@pengzhao-intel @TaoLv
Would you mind taking a review for this PR? Thank you!

pengzhao-intel · 2019-03-29T01:17:48Z

cc @eric-haibin-lin who is the author of the sparse operator.

wkcn · 2019-04-04T08:29:54Z

@eric-haibin-lin Hi! Would you mind reviewing this PR? Thank you!

3rdparty/sparse-matrix/sparse_matrix.cc

eric-haibin-lin · 2019-04-08T16:29:06Z

3rdparty/sparse-matrix/sparse_matrix.cc

+
+
+
+bool mkl_DotCsrDnsDns(SP_INT64* rows_start, SP_INT64* col_indx,


Why does it land in the 3rdparty directory, instead of src/operator/?

Is there future plan to support other sparse matrix ops, like dense + csr, csr + csr, csr * csr (elemwise)?

@eric-haibin-lin
If you requests, e.g. mkl has
mkl_sparse_spmm s, d, c, z Computes the product of two sparse matrices and stores
the result as a sparse matrix
why this patch is merge first is because Wide & Deep use sparse dot dense matrix

Why does it land in the 3rdparty directory, instead of src/operator/?

This is the quick solution for the wide deep. To move the code into src/operator, need to change code in mshadow and more tests are needed. I will do it as the next step.

Thanks for the explanation. Why mshadow has to be changed?

mshadw use 32bit interface , below is the link for mkl settings
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
and the make should change
diff --git a/make/mshadow.mk b/make/mshadow.mk
index 86155ea..d0b731e 100644
--- a/make/mshadow.mk
+++ b/make/mshadow.mk
@@ -82,7 +82,7 @@ ifneq ($(USE_INTEL_PATH), NONE)
MSHADOW_LDFLAGS += -L$(USE_INTEL_PATH)/compiler/lib/intel64
MSHADOW_LDFLAGS += -L$(USE_INTEL_PATH)/lib/intel64
endif

MSHADOW_CFLAGS += -I$(USE_INTEL_PATH)/mkl/include

MSHADOW_CFLAGS += -I$(USE_INTEL_PATH)/mkl/include -DMKL_ILP64 -m64
endif
ifneq ($(USE_STATIC_MKL), NONE)
ifeq ($(USE_INTEL_PATH), NONE)
@@ -90,7 +90,7 @@ ifeq ($(USE_INTEL_PATH), NONE)
else
MKLROOT = $(USE_INTEL_PATH)/mkl
endif

MSHADOW_LDFLAGS += -L${MKLROOT}/../compiler/lib/intel64 -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -liomp5 -ldl -lpthread -lm

MSHADOW_LDFLAGS += -L${MKLROOT}/../compiler/lib/intel64 -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -liomp5 -ldl -lpthread -lm
else

Look forward to the followup PR

src/operator/tensor/dot-inl.h

eric-haibin-lin · 2019-04-08T16:41:27Z

@triplekings could you also share the data shape and nnz distribution of the input you benchmarked with? (whether nonzeros are power-law distributed, or uniformly)

glingyan · 2019-04-09T08:09:40Z

the model is wide & deep
https://github.com/apache/incubator-mxnet/blob/master/example/sparse/wide_deep/model.py
dot = mx.symbol.sparse.dot(csr_data, weight)
dense_matrix(26000, 256)
sparse_matrix(1024,26000)
the distribution is depend on dataset

…to new_sparse

eric-haibin-lin

is the windows failure related?

triplekings · 2019-04-12T08:29:07Z

window fails due to unstable CI but not this pr. I have retrigger it again.

This reverts commit c437d5b.

…pache#14806) This reverts commit c437d5b.

* use mkl sparse matrix to improve performance * fix build fail issue * add 3rdparty/sparse matrix in Makefile * add macro for variable * fix lib not find error * fix gpu R test error * fix Mac build error * add lib/libsparse_matrix.so to CI * fix indentation * retrigger CI

…pache#14806) This reverts commit c437d5b.

use mkl sparse matrix to improve performance

6d91df0

triplekings requested a review from szha as a code owner March 21, 2019 14:31

triplekings added 6 commits March 22, 2019 10:04

fix build fail issue

5628528

add 3rdparty/sparse matrix in Makefile

6ca37b4

add macro for variable

be07f89

fix lib not find error

67960a4

fix gpu R test error

d6e5e21

fix Mac build error

7ed707d

wkcn reviewed Mar 23, 2019

View reviewed changes

marcoabreu added the pr-awaiting-response PR is reviewed and waiting for contributor to respond label Mar 25, 2019

marcoabreu added MKL Sparse labels Mar 25, 2019

add lib/libsparse_matrix.so to CI

eb9c883

wkcn approved these changes Mar 28, 2019

View reviewed changes

wkcn requested review from pengzhao-intel and TaoLv March 28, 2019 09:34

szha requested a review from eric-haibin-lin April 3, 2019 01:56

eric-haibin-lin reviewed Apr 8, 2019

View reviewed changes

pengzhao-intel requested review from eric-haibin-lin and removed request for TaoLv and pengzhao-intel April 9, 2019 03:46

fix indentation

5fbd8a2

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

124959f

…to new_sparse

eric-haibin-lin reviewed Apr 12, 2019

View reviewed changes

retrigger CI

f0db599

eric-haibin-lin merged commit c437d5b into apache:master Apr 13, 2019

pengzhao-intel added a commit that referenced this pull request Apr 25, 2019

Revert "use mkl sparse matrix to improve performance (#14492)"

a4ea352

This reverts commit c437d5b.

NeoZhangJianyu mentioned this pull request Apr 28, 2019

[Build]Integrate compile to cmake as only, remove makefile #14820

Open

TaoLv pushed a commit that referenced this pull request Apr 30, 2019

Revert "use mkl sparse matrix to improve performance (#14492)" (#14806)

1238aa0

This reverts commit c437d5b.

access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019

Revert "use mkl sparse matrix to improve performance (apache#14492)" (a…

a315551

…pache#14806) This reverts commit c437d5b.

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

Revert "use mkl sparse matrix to improve performance (apache#14492)" (a…

211f794

…pache#14806) This reverts commit c437d5b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use mkl sparse matrix to improve performance #14492

use mkl sparse matrix to improve performance #14492

triplekings commented Mar 21, 2019

wkcn left a comment

pinaraws commented Mar 25, 2019

pinaraws commented Mar 25, 2019

triplekings commented Mar 28, 2019 •

edited

Loading

wkcn left a comment •

edited

Loading

wkcn commented Mar 28, 2019

pengzhao-intel commented Mar 29, 2019

wkcn commented Apr 4, 2019

eric-haibin-lin Apr 8, 2019

eric-haibin-lin Apr 8, 2019

glingyan Apr 9, 2019

triplekings Apr 9, 2019

eric-haibin-lin Apr 11, 2019

glingyan Apr 11, 2019

eric-haibin-lin Apr 13, 2019

eric-haibin-lin commented Apr 8, 2019

glingyan commented Apr 9, 2019

eric-haibin-lin left a comment

triplekings commented Apr 12, 2019




		bool mkl_DotCsrDnsDns(SP_INT64* rows_start, SP_INT64* col_indx,

use mkl sparse matrix to improve performance #14492

use mkl sparse matrix to improve performance #14492

Conversation

triplekings commented Mar 21, 2019

Description

Checklist

Essentials

Changes

Comments

wkcn left a comment

Choose a reason for hiding this comment

pinaraws commented Mar 25, 2019

pinaraws commented Mar 25, 2019

triplekings commented Mar 28, 2019 • edited Loading

wkcn left a comment • edited Loading

Choose a reason for hiding this comment

wkcn commented Mar 28, 2019

pengzhao-intel commented Mar 29, 2019

wkcn commented Apr 4, 2019

eric-haibin-lin Apr 8, 2019

Choose a reason for hiding this comment

eric-haibin-lin Apr 8, 2019

Choose a reason for hiding this comment

glingyan Apr 9, 2019

Choose a reason for hiding this comment

triplekings Apr 9, 2019

Choose a reason for hiding this comment

eric-haibin-lin Apr 11, 2019

Choose a reason for hiding this comment

glingyan Apr 11, 2019

Choose a reason for hiding this comment

eric-haibin-lin Apr 13, 2019

Choose a reason for hiding this comment

eric-haibin-lin commented Apr 8, 2019

glingyan commented Apr 9, 2019

eric-haibin-lin left a comment

Choose a reason for hiding this comment

triplekings commented Apr 12, 2019

triplekings commented Mar 28, 2019 •

edited

Loading

wkcn left a comment •

edited

Loading