Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: cpu: aarch64: add support for s8:s8:s8 in ACL lowp matmul #1966

Merged
merged 2 commits into from
Jun 24, 2024

Conversation

michalowski-arm
Copy link
Contributor

Performance results:

matrix scale | speed-up

 128x128   |  x0.82
 256x256   |  x2.63
 512x512   |  x33.3
1024x1024  |  x60.7
2048x2048  |  x60.3

To select the correct s8->s8 ACL kernel, we need to send
all quantization info at configuration but oneDNN does
not make these available until execution. This change
goes around this issue by first performing s8->f32 matmul
and then requantizing back to s8.

Copy link
Contributor

@jondea jondea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you @michalowski-arm. This has also been reviewed internally.

@vpirogov vpirogov added this to the v3.6 milestone Jun 21, 2024
@jondea
Copy link
Contributor

jondea commented Jun 24, 2024

If there are no more comments, would it be possible to get this merged please? The failures look common

@vpirogov vpirogov merged commit 5806809 into oneapi-src:main Jun 24, 2024
7 of 10 checks passed
@vpirogov
Copy link
Member

@jondea, the failures are caused by an MSVC bug.

Thanks for the code review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants