Skip to content

Implement asm gemm on vega64 for 4096x4096 fp32 matrix

Notifications You must be signed in to change notification settings

aditya4d1/gemm-vega64

Repository files navigation

gemm-vega64

Implement asm gemm on vega64 for 4096x4096 fp32 matrix

Results

v10 is the highest performing kernel

Card Shader Clock Memory Clock TFLOPs (fp32) TFLOPs Peak (fp32) Efficiency
MI25 1500Mhz 945Mhz 10.6 12.3 86%
RXVega64 1630Mhz 945Mhz 11.2 13.35 84%
VegaFE

As the code is not written in pure isa, compiler tries to move instructions around causing drop in performance.

About

Implement asm gemm on vega64 for 4096x4096 fp32 matrix

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages