Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

use mkl sparse matrix to improve performance #14492

Merged
merged 11 commits into from
Apr 13, 2019

Conversation

triplekings
Copy link
Contributor

Description

(Brief description on what this PR is about)
use mkl sparse matrix replace origin to improve performance

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@triplekings triplekings requested a review from szha as a code owner March 21, 2019 14:31
Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for your contribution!
Is there any test case for the sparse matrix?
Could you please provide the comparison on performance?

@pinaraws
Copy link

@mxnet-label-bot add[pr-awaiting-response]

@marcoabreu marcoabreu added the pr-awaiting-response PR is reviewed and waiting for contributor to respond label Mar 25, 2019
@pinaraws
Copy link

@mxnet-label-bot add[Sparse, MKL]

@triplekings
Copy link
Contributor Author

triplekings commented Mar 28, 2019

Great! Thanks for your contribution!
Is there any test case for the sparse matrix?
Could you please provide the comparison on performance?

Thanks for your comments.

testcase: tests/python/unittest/test_sparse_ndarray.py
tests/python/unittest/test_sparse_operator.py

using mkl , sparse dot will speedup 1.3X

With mkl sparse
INFO:logger:Run [7812] Batchs Speed: 941008.89 samples/sec
(mxnet_venv) [root@VNNI-SDP86 scripts_zhenlin]# python myProfileParser.py
Time of each OP:
_contrib_quantized_fully_connected, 4162.701, ms, 0.1776199436763953 , ms/call, 23436, calls, 54.45 %
Concat , 870.905 , ms, 0.11148297491039426 , ms/call, 7812 , calls, 11.39 %
dot , 554.544 , ms, 0.07098617511520737 , ms/call, 7812 , calls, 7.25 %
elemwise_add , 394.928 , ms, 0.05055401945724526 , ms/call, 7812 , calls, 5.17 %
_contrib_quantize , 321.782 , ms, 0.04119073220686124 , ms/call, 7812 , calls, 4.21 %
SoftmaxOutput , 309.333 , ms, 0.03959715821812596 , ms/call, 7812 , calls, 4.05 %
ParallelEmbedding , 266.09 , ms, 0.03406169994879672 , ms/call, 7812 , calls, 3.48 %
CopyCPU2CPU , 244.173 , ms, 0.010418714797747057 , ms/call, 23436, calls, 3.19 %
broadcast_add , 177.62 , ms, 0.022736815156169994 , ms/call, 7812 , calls, 2.32 %
SliceChannel , 151.811 , ms, 0.01943305171530978 , ms/call, 7812 , calls, 1.99 %
slice , 124.944 , ms, 0.007996927803379416 , ms/call, 15624, calls, 1.63 %
DeleteVariable , 66.723 , ms, 0.0021352726574500767, ms/call, 31248, calls, 0.87 %

With mxnet sparse
INFO:logger:Run [7812] Batchs Speed: 915862.99 samples/sec
(mxnet_venv) [root@VNNI-SDP86 scripts_zhenlin]# python myProfileParser.py
Time of each OP:
_contrib_quantized_fully_connected, 4235.11, ms, 0.1807095920805598 , ms/call, 23436, calls, 53.72 %
Concat , 879.878, ms, 0.112631592421915 , ms/call, 7812 , calls, 11.16 %
dot , 724.61 , ms, 0.09275601638504864 , ms/call, 7812 , calls, 9.19 %
elemwise_add , 404.946, ms, 0.05183640552995392 , ms/call, 7812 , calls, 5.14 %
_contrib_quantize , 332.088, ms, 0.0425099846390169 , ms/call, 7812 , calls, 4.21 %
SoftmaxOutput , 280.663, ms, 0.03592716333845366 , ms/call, 7812 , calls, 3.56 %
ParallelEmbedding , 271.335, ms, 0.034733102918586785 , ms/call, 7812 , calls, 3.44 %
CopyCPU2CPU , 244.757, ms, 0.01044363372589179 , ms/call, 23436, calls, 3.10 %
broadcast_add , 181.36 , ms, 0.02321556579621096 , ms/call, 7812 , calls, 2.30 %
SliceChannel , 153.897, ms, 0.019700076804915513 , ms/call, 7812 , calls, 1.95 %
slice , 124.163, ms, 0.007946940604198668 , ms/call, 15624, calls, 1.57 %
DeleteVariable , 50.809 , ms, 0.0021679894179894178, ms/call, 23436, calls, 0.64 %

Total OP Time: 7883.61600000 ms

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM : ) Thanks for your contribution!
Don't forget to add your name into CONTRIBUTORS.md : )

@wkcn
Copy link
Member

wkcn commented Mar 28, 2019

@pengzhao-intel @TaoLv
Would you mind taking a review for this PR? Thank you!

@pengzhao-intel
Copy link
Contributor

cc @eric-haibin-lin who is the author of the sparse operator.

@wkcn
Copy link
Member

wkcn commented Apr 4, 2019

@eric-haibin-lin Hi! Would you mind reviewing this PR? Thank you!

3rdparty/sparse-matrix/sparse_matrix.cc Outdated Show resolved Hide resolved
3rdparty/sparse-matrix/sparse_matrix.cc Outdated Show resolved Hide resolved



bool mkl_DotCsrDnsDns(SP_INT64* rows_start, SP_INT64* col_indx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it land in the 3rdparty directory, instead of src/operator/?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there future plan to support other sparse matrix ops, like dense + csr, csr + csr, csr * csr (elemwise)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin
If you requests, e.g. mkl has
mkl_sparse_spmm s, d, c, z Computes the product of two sparse matrices and stores
the result as a sparse matrix
why this patch is merge first is because Wide & Deep use sparse dot dense matrix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it land in the 3rdparty directory, instead of src/operator/?

This is the quick solution for the wide deep. To move the code into src/operator, need to change code in mshadow and more tests are needed. I will do it as the next step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Why mshadow has to be changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mshadw use 32bit interface , below is the link for mkl settings
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
and the make should change
diff --git a/make/mshadow.mk b/make/mshadow.mk
index 86155ea..d0b731e 100644
--- a/make/mshadow.mk
+++ b/make/mshadow.mk
@@ -82,7 +82,7 @@ ifneq ($(USE_INTEL_PATH), NONE)
MSHADOW_LDFLAGS += -L$(USE_INTEL_PATH)/compiler/lib/intel64
MSHADOW_LDFLAGS += -L$(USE_INTEL_PATH)/lib/intel64
endif

  • MSHADOW_CFLAGS += -I$(USE_INTEL_PATH)/mkl/include
  • MSHADOW_CFLAGS += -I$(USE_INTEL_PATH)/mkl/include -DMKL_ILP64 -m64
    endif
    ifneq ($(USE_STATIC_MKL), NONE)
    ifeq ($(USE_INTEL_PATH), NONE)
    @@ -90,7 +90,7 @@ ifeq ($(USE_INTEL_PATH), NONE)
    else
    MKLROOT = $(USE_INTEL_PATH)/mkl
    endif
  • MSHADOW_LDFLAGS += -L${MKLROOT}/../compiler/lib/intel64 -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -liomp5 -ldl -lpthread -lm
  • MSHADOW_LDFLAGS += -L${MKLROOT}/../compiler/lib/intel64 -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -liomp5 -ldl -lpthread -lm
    else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look forward to the followup PR

src/operator/tensor/dot-inl.h Show resolved Hide resolved
@eric-haibin-lin
Copy link
Member

@triplekings could you also share the data shape and nnz distribution of the input you benchmarked with? (whether nonzeros are power-law distributed, or uniformly)

@pengzhao-intel pengzhao-intel requested review from eric-haibin-lin and removed request for TaoLv and pengzhao-intel April 9, 2019 03:46
@glingyan
Copy link
Contributor

glingyan commented Apr 9, 2019

the model is wide & deep
https://github.com/apache/incubator-mxnet/blob/master/example/sparse/wide_deep/model.py
dot = mx.symbol.sparse.dot(csr_data, weight)
dense_matrix(26000, 256)
sparse_matrix(1024,26000)
the distribution is depend on dataset

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the windows failure related?

@triplekings
Copy link
Contributor Author

window fails due to unstable CI but not this pr. I have retrigger it again.

@eric-haibin-lin eric-haibin-lin merged commit c437d5b into apache:master Apr 13, 2019
pengzhao-intel added a commit that referenced this pull request Apr 25, 2019
TaoLv pushed a commit that referenced this pull request Apr 30, 2019
access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* use mkl sparse matrix to improve performance

* fix build fail issue

* add 3rdparty/sparse matrix  in Makefile

* add macro for variable

* fix lib not find error

* fix gpu R test error

* fix Mac build error

* add lib/libsparse_matrix.so to CI

* fix indentation

* retrigger CI
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
MKL pr-awaiting-response PR is reviewed and waiting for contributor to respond Sparse
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants