Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: aarch64: conv: Update direct vs indirect conv heuristics #1948

Merged

Conversation

fadara01
Copy link
Contributor

@fadara01 fadara01 commented Jun 6, 2024

Description

Update direct vs indirect conv heuristics

  • Remove fall through to direct conv for low thread counts: the previous heuristic is outdated and no longer optimal
  • Do not fall though to direct conv for small convolutions when the datatype is BF16: indirect conv is faster when source, weight, destination are of type BF16

Fixes # (github issue)

Checklist

General

  • [ YES ] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • [ YES ] Have you formatted the code using clang-format?

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements?

New features

  • Have you published an RFC for the new feature?
  • Was the RFC approved?
  • Have you added relevant tests?

Bug fixes

  • Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
  • Have you added relevant regression tests?

RFC PR

  • Does RFC document follow the template?
  • Have you added a link to the rendered document?

… counts

The "Indirect is slower than gemm for low thread counts" heuristic
is outdated and no longer holds.
@vpirogov vpirogov added this to the v3.6 milestone Jun 6, 2024
@dzarukin dzarukin requested a review from jondea June 6, 2024 16:05
Copy link
Contributor

@jondea jondea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jondea jondea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Indirect conv is faster than direct conv when source, weight
and destination are of type BF16
@fadara01 fadara01 changed the title cpu: aarch64: conv: Remove fall through to direct conv for low thread… cpu: aarch64: conv: Update direct vs indirect conv heuristics Jun 7, 2024
@vpirogov vpirogov merged commit 390d34c into oneapi-src:main Jun 24, 2024
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants