Skip to content

Implement WMMA#871

Merged
pxl-th merged 3 commits intomasterfrom
pxl-th/wmma
Dec 25, 2025
Merged

Implement WMMA#871
pxl-th merged 3 commits intomasterfrom
pxl-th/wmma

Conversation

@pxl-th
Copy link
Copy Markdown
Member

@pxl-th pxl-th commented Dec 22, 2025

Very primitive matmul WMMA kernel benchmarking:

Size               WMMA-Ptr    rocBLAS      Naive       WMMA    rocBLAS
                       (ms)       (ms)       (ms)     TFLOPS     TFLOPS
-----------------------------------------------------------------------
256x256x256           0.043      0.056      0.178       0.78       0.60
512x512x512           0.061      0.106      0.857       4.40       2.52
1024x1024x1024        0.670      0.739      1.110       3.21       2.91
2048x2048x2048        1.384      1.128      4.206      12.41      15.23
4096x4096x4096        8.606      5.571     15.224      15.97      24.67

Closes #560.
Closes #483.

@pxl-th pxl-th added enhancement New feature or request float16 intrinsics labels Dec 25, 2025
@pxl-th pxl-th merged commit 2cdc4d4 into master Dec 25, 2025
3 checks passed
@pxl-th pxl-th deleted the pxl-th/wmma branch December 25, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rocWMMA support [Question] AI accelerators

1 participant