No tensor cores for fp32 interleaved attention, remove div by 8 restiction (#17994) #18085

blchu · 2020-04-16T17:42:51Z

(cherry picked from commit afae030)

Description

Fixed issue where fp32 inputs use tensor cores for the interleaved multihead attention operators, resulting in lower precision calculations and potential reduction in accuracy.

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Set interleaved multihead attention GEMM default to not use tensor cores, and only use if input data type is fp16
No longer checks for tensor input shape divisibility by 8

Comments

…iction (apache#17994) (cherry picked from commit afae030)

mxnet-bot · 2020-04-16T17:42:57Z

Hey @blchu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, website, centos-gpu, edge, miscellaneous, unix-gpu, windows-gpu, clang, centos-cpu, unix-cpu, windows-cpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

No tensor cores for fp32 interleaved attention, remove div by 8 restr…

efba4bc

…iction (apache#17994) (cherry picked from commit afae030)

ptrendx approved these changes Apr 16, 2020

View reviewed changes

leezu merged commit 8cfc64a into apache:v1.x Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No tensor cores for fp32 interleaved attention, remove div by 8 restiction (#17994) #18085

No tensor cores for fp32 interleaved attention, remove div by 8 restiction (#17994) #18085

blchu commented Apr 16, 2020 •

edited

Loading

mxnet-bot commented Apr 16, 2020

No tensor cores for fp32 interleaved attention, remove div by 8 restiction (#17994) #18085

No tensor cores for fp32 interleaved attention, remove div by 8 restiction (#17994) #18085

Conversation

blchu commented Apr 16, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

mxnet-bot commented Apr 16, 2020

blchu commented Apr 16, 2020 •

edited

Loading