Interleaved MHA for CPU path #17138

TaoLv · 2019-12-21T08:46:39Z

Description

This PR fills the CPU counterpart of PR #16408.
Only FP32 is supported yet.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…to mha

TaoLv · 2019-12-21T08:49:53Z

@eric-haibin-lin @pengzhao-intel @Caenorst Please help to review. I copied the UTs from test_operator_gpu.py to test_operator.py but didn't remove the original ones since it seems they have minimal cuda version requirement. Let me know if it's not appropriate. Thanks!

…to mha

eric-haibin-lin

could you elaborate "it seems they have minimal cuda version requirement" and why we duplicate the test?

src/operator/contrib/transformer.cc

TaoLv · 2019-12-23T02:07:51Z

could you elaborate "it seems they have minimal cuda version requirement" and why we duplicate the test?

Because of this line: https://github.com/apache/incubator-mxnet/blob/master/tests/python/gpu/test_operator_gpu.py#L2863
But since the oldest cuda version in CI is 9.2 so I think it's safe to remove this check and copy tests to test_operator.py. I will remove the duplicated part in test_operator_gpu.py.

eric-haibin-lin · 2019-12-23T02:16:27Z

Thanks for the reply. Look forward to the update

…to mha

pengzhao-intel · 2020-01-03T01:00:30Z

Is this PR good to merge? @TaoLv @eric-haibin-lin

TaoLv · 2020-01-03T08:15:29Z

@eric-haibin-lin Code is re-based. memset is needed to pass the unit tests. We can revisit the kWriteInplace check in a follow up PR if we notice any performance problem there. As the FInplaceOption is not enabled for these operators, I think it's safe for not checking it. Let me know what do you think. Thnanks!

…to mha

eric-haibin-lin

comments addressed

* init qk attention * qk: fake backward * cpu selfatt and encdec; move tests to test_operator.py * coding style * fix lint * use random seed in tests * remove ut in test_operator_gpu.py * coding style * retrigger ci

* init qk attention * qk: fake backward * cpu selfatt and encdec; move tests to test_operator.py * coding style * fix lint * use random seed in tests * remove ut in test_operator_gpu.py * coding style * retrigger ci Co-authored-by: Tao Lv <[email protected]>

szha · 2020-01-13T06:29:09Z

But since the oldest cuda version in CI is 9.2 so I think it's safe to remove this check and copy tests to test_operator.py. I will remove the duplicated part in test_operator_gpu.py.

@TaoLv This is not true for the CD pipeline for cu90. See this failure

TaoLv · 2020-01-13T09:03:53Z

@szha Any suggestion for this case? It's shared for both CPU and GPU test, can we add the cuda requirement decorator to it? Also if we're still releasing cu90 flavor, why it was removed from CI?

TaoLv added 6 commits November 26, 2019 17:09

init qk attention

14eb31c

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

1ed6597

…to mha

qk: fake backward

29c0623

cpu selfatt and encdec; move tests to test_operator.py

795de79

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

eecdcfe

…to mha

coding style

2f0181a

pengzhao-intel added the MKLDNN label Dec 21, 2019

pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Dec 21, 2019

TaoLv added 3 commits December 22, 2019 01:04

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

f2dc112

…to mha

fix lint

544bd25

use random seed in tests

1c639eb

eric-haibin-lin reviewed Dec 22, 2019

View reviewed changes

src/operator/contrib/transformer.cc Show resolved Hide resolved

src/operator/contrib/transformer.cc Show resolved Hide resolved

src/operator/contrib/transformer.cc Show resolved Hide resolved

remove ut in test_operator_gpu.py

fa9db43

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

aa154b9

…to mha

eric-haibin-lin self-assigned this Jan 3, 2020

TaoLv added 2 commits January 4, 2020 00:23

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

8381361

…to mha

coding style

0037639

CPU Performance and Quantization automation moved this from In progress to Reviewer approved Jan 3, 2020

eric-haibin-lin approved these changes Jan 3, 2020

View reviewed changes

eric-haibin-lin added the R1.6.0 label Jan 3, 2020

eric-haibin-lin merged commit 55e222b into apache:master Jan 3, 2020

CPU Performance and Quantization automation moved this from Reviewer approved to Done Jan 3, 2020

eric-haibin-lin mentioned this pull request Jan 3, 2020

[1.6.x] Cherry-pick Interleaved MHA for CPU path (#17138) #17211

Merged

retrigger ci

2f22f68

TaoLv mentioned this pull request Jan 12, 2020

Add MXNet Ops for fast multihead attention #16408

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interleaved MHA for CPU path #17138

Interleaved MHA for CPU path #17138

TaoLv commented Dec 21, 2019 •

edited by junrushao

Loading

TaoLv commented Dec 21, 2019

eric-haibin-lin left a comment

TaoLv commented Dec 23, 2019

eric-haibin-lin commented Dec 23, 2019

pengzhao-intel commented Jan 3, 2020

TaoLv commented Jan 3, 2020

eric-haibin-lin left a comment

szha commented Jan 13, 2020

TaoLv commented Jan 13, 2020

Interleaved MHA for CPU path #17138

Interleaved MHA for CPU path #17138

Conversation

TaoLv commented Dec 21, 2019 • edited by junrushao Loading

Description

Checklist

Essentials

Changes

Comments

TaoLv commented Dec 21, 2019

eric-haibin-lin left a comment

Choose a reason for hiding this comment

TaoLv commented Dec 23, 2019

eric-haibin-lin commented Dec 23, 2019

pengzhao-intel commented Jan 3, 2020

TaoLv commented Jan 3, 2020

eric-haibin-lin left a comment

Choose a reason for hiding this comment

szha commented Jan 13, 2020

TaoLv commented Jan 13, 2020

TaoLv commented Dec 21, 2019 •

edited by junrushao

Loading