cpu: aarch64: conv: Update direct vs indirect conv heuristics #1948

fadara01 · 2024-06-06T14:26:15Z

Description

Update direct vs indirect conv heuristics

Remove fall through to direct conv for low thread counts: the previous heuristic is outdated and no longer optimal
Do not fall though to direct conv for small convolutions when the datatype is BF16: indirect conv is faster when source, weight, destination are of type BF16

Fixes # (github issue)

Checklist

General

[ YES ] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
[ YES ] Have you formatted the code using clang-format?

Performance improvements

Have you submitted performance data that demonstrates performance improvements?

New features

Have you published an RFC for the new feature?
Was the RFC approved?
Have you added relevant tests?

Bug fixes

Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
Have you added relevant regression tests?

RFC PR

Does RFC document follow the template?
Have you added a link to the rendered document?

… counts The "Indirect is slower than gemm for low thread counts" heuristic is outdated and no longer holds.

jondea

LGTM

jondea

LGTM

Indirect conv is faster than direct conv when source, weight and destination are of type BF16

cpu: aarch64: conv: Remove fall through to direct conv for low thread…

5263f9a

… counts The "Indirect is slower than gemm for low thread counts" heuristic is outdated and no longer holds.

vpirogov added this to the v3.6 milestone Jun 6, 2024

dzarukin requested a review from jondea June 6, 2024 16:05

dzarukin approved these changes Jun 6, 2024

View reviewed changes

jondea added the platform:aarch64 label Jun 6, 2024

jondea approved these changes Jun 6, 2024

View reviewed changes

jondea mentioned this pull request Jun 6, 2024

src: cpu: conv: Use acl_indirect_gemm for bf16 convolutions #1933

Merged

3 tasks

cpu: aarch64: conv: Do not fall through to direct conv for BF16

8506d93

Indirect conv is faster than direct conv when source, weight and destination are of type BF16

fadara01 changed the title ~~cpu: aarch64: conv: Remove fall through to direct conv for low thread…~~ cpu: aarch64: conv: Update direct vs indirect conv heuristics Jun 7, 2024

fadara01 mentioned this pull request Jun 7, 2024

Backport cpu: aarch64: conv: Update direct vs indirect conv heuristics #1956

Merged

8 tasks

vpirogov merged commit 390d34c into oneapi-src:main Jun 24, 2024
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: aarch64: conv: Update direct vs indirect conv heuristics #1948

cpu: aarch64: conv: Update direct vs indirect conv heuristics #1948

fadara01 commented Jun 6, 2024 •

edited

Loading

jondea left a comment

jondea left a comment

cpu: aarch64: conv: Update direct vs indirect conv heuristics #1948

cpu: aarch64: conv: Update direct vs indirect conv heuristics #1948

Conversation

fadara01 commented Jun 6, 2024 • edited Loading

Description

Checklist

General

Performance improvements

New features

Bug fixes

RFC PR

jondea left a comment

Choose a reason for hiding this comment

jondea left a comment

Choose a reason for hiding this comment

fadara01 commented Jun 6, 2024 •

edited

Loading