Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Tensor cores used only for fp16 in interleaved multihead attention #17994

Merged
merged 1 commit into from
Apr 16, 2020

Conversation

blchu
Copy link
Contributor

@blchu blchu commented Apr 7, 2020

Description

Fixed issue where fp32 inputs use tensor cores for the interleaved multihead attention operators, resulting in lower precision calculations and potential reduction in accuracy.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Set interleaved multihead attention GEMM default to not use tensor cores, and only use if input data type is fp16
  • No longer checks for tensor input shape divisibility by 8

Comments

@mxnet-bot
Copy link

Hey @blchu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, unix-cpu, edge, centos-gpu, miscellaneous, windows-gpu, clang, sanity, unix-gpu, website, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu
Copy link
Contributor

leezu commented Apr 8, 2020

Is this tested on cuda architectures < 5?

@blchu
Copy link
Contributor Author

blchu commented Apr 14, 2020

I've tested on cuda architecture < 5 (specifically, k80), there's no issue running the operator

@blchu
Copy link
Contributor Author

blchu commented Apr 14, 2020

@mxnet-bot run ci [all]

@ptrendx
Copy link
Member

ptrendx commented Apr 14, 2020

@ChaiBapchya is something happening to CI right now? bot did not trigger the CI. Thanks!

@leezu
Copy link
Contributor

leezu commented Apr 14, 2020

@mxnet-bot run ci [all]

@leezu
Copy link
Contributor

leezu commented Apr 14, 2020

@blchu can you rebase on master and force push?

@ptrendx
Copy link
Member

ptrendx commented Apr 15, 2020

@mxnet-bot run ci [unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu, unix-cpu]

@ptrendx
Copy link
Member

ptrendx commented Apr 15, 2020

@mxnet-bot run ci [unix-cpu]

Download of cifar failed in the Perl test.

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@ptrendx ptrendx merged commit afae030 into apache:master Apr 16, 2020
blchu added a commit to blchu/incubator-mxnet that referenced this pull request Apr 16, 2020
leezu pushed a commit that referenced this pull request Apr 16, 2020
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants