-
Notifications
You must be signed in to change notification settings - Fork 89
Insights: ROCm/hipBLASLt
Overview
-
0 Active issues
-
- 22 Merged pull requests
- 14 Open pull requests
- 0 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
22 Pull requests merged by 15 people
-
Update 35 Equality logic yaml sizes.
#1378 merged
Nov 23, 2024 -
update 38 Equality logic yaml sizes
#1372 merged
Nov 23, 2024 -
gfx942 38cu F8BS NN TN NT grid tune
#1345 merged
Nov 22, 2024 -
gfx942 38cu HHS BBS NN TN NT grid tune
#1331 merged
Nov 22, 2024 -
Fix invalid stream-k test case, make dynamic grid the default
#1359 merged
Nov 22, 2024 -
update gfx942 xf32 freesize
#1375 merged
Nov 22, 2024 -
Add gfx942 xf32 NN/NT/TN Equality yamls for 1105 xf32
#1376 merged
Nov 22, 2024 -
Fix: incorrect required workspace size for singleKernel GSU
#1371 merged
Nov 22, 2024 -
gridbased search for batched gemm
#1362 merged
Nov 22, 2024 -
Add profiling to TensileCreateLibrary
#1329 merged
Nov 22, 2024 -
Set Python_ROOT virtual.env
#1344 merged
Nov 21, 2024 -
Fixing and adding test for DepthU=48
#1303 merged
Nov 20, 2024 -
[Hotfix] Disable setOccupancyLimit for gfx120X
#1368 merged
Nov 20, 2024 -
Remove alias for MirrorDims in logic yaml
#1361 merged
Nov 20, 2024 -
GFX942 equality tuning for F8HS and F8B8HS for TN,NT,NN
#1302 merged
Nov 19, 2024 -
Add setOccupancyLimit
#1364 merged
Nov 19, 2024 -
Remove Min/Max/TotalVgprNumber in Common.py
#1355 merged
Nov 19, 2024 -
gfx12 - change to use byte_sel modifier for v_cvt_f32_fp8 and v_cvt_f…
#1172 merged
Nov 19, 2024 -
Add profile logging and standardize
scaleA
andscaleB
datatypes#1275 merged
Nov 18, 2024 -
Fix F8/BF8 failed cases for GWVW=8 and Beta != 0
#1333 merged
Nov 18, 2024 -
adding bpl64 support to addLdsLoad (for Bias and scaleAlphaVector)
#1336 merged
Nov 18, 2024 -
Add sgpr occupancy
#1349 merged
Nov 18, 2024
14 Pull requests opened by 13 people
-
Modify to check if alpha is in host memory.
#1356 opened
Nov 18, 2024 -
Refactoy the pack scheduling for scheduleIterAlg = 3.
#1358 opened
Nov 18, 2024 -
Fix F32 FMAC Perf Bugs for gfx11/12
#1360 opened
Nov 19, 2024 -
Bump rocm-docs-core from 1.8.3 to 1.8.5 in /docs/sphinx
#1363 opened
Nov 19, 2024 -
Library Logic Format Simplification
#1365 opened
Nov 19, 2024 -
Remove PackageLibrary option
#1367 opened
Nov 20, 2024 -
[Sparse] fix sparse kernel generation failure
#1369 opened
Nov 20, 2024 -
Update 12 Equality logic yamls.
#1370 opened
Nov 20, 2024 -
Avoid divide by 0 when calculating predicted performance with streamk
#1373 opened
Nov 21, 2024 -
Code object compression via bundling
#1374 opened
Nov 22, 2024 -
[Experimental] hipBLASLt tensor swizzling integration
#1377 opened
Nov 22, 2024 -
Fp8 tuning upstream
#1380 opened
Nov 22, 2024 -
Find python
#1381 opened
Nov 22, 2024 -
Logic fix to exclude streamk by default
#1382 opened
Nov 22, 2024
14 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Adding HostLibraryTests back to hipBLASLt
#1147 commented on
Nov 20, 2024 • 10 new comments -
Add support for fallback from compute type f16 to f32
#1263 commented on
Nov 19, 2024 • 8 new comments -
feature: DTV with Swizzling (tensorA)
#1246 commented on
Nov 22, 2024 • 1 new comment -
Enable variant builds via device ID and cu count
#1222 commented on
Nov 18, 2024 • 0 new comments -
dot2 fp16 mac kernel for gfx942
#1258 commented on
Nov 23, 2024 • 0 new comments -
Tune Aquavanjaram942X F8F8S Equality TN 1 GEMM size
#1270 commented on
Nov 20, 2024 • 0 new comments -
Static build
#1283 commented on
Nov 23, 2024 • 0 new comments -
Check arguments in yaml file, abort if not recognized.
#1294 commented on
Nov 19, 2024 • 0 new comments -
Tune Aldebaran BF16 NN TN NT GEMM sizes
#1323 commented on
Nov 21, 2024 • 0 new comments -
Add --experimental flag to TensileCreateLibrary
#1328 commented on
Nov 22, 2024 • 0 new comments -
Tune Aquavanjaram 942 20CU HHS NN and TN GEMM sizes tuning in equality library
#1330 commented on
Nov 21, 2024 • 0 new comments -
[CQE only] gfx12 - change to use byte_sel modifier for v_cvt_f32_fp8 and v_cvt_f…
#1339 commented on
Nov 18, 2024 • 0 new comments -
Add initial optional stream-k libraries
#1347 commented on
Nov 21, 2024 • 0 new comments -
[OPT] Optimize tail loop
#1353 commented on
Nov 22, 2024 • 0 new comments