[CK_TILE][FMHA] Enable wholek_prefetch #3026

LJ-underdog · 2025-10-15T10:36:00Z

Proposed changes

Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

poyenc · 2025-10-30T03:11:25Z

@LJ-underdog are you still working on this?

LJ-underdog · 2025-10-30T04:50:28Z

@LJ-underdog are you still working on this?

yes， this pr is prepare for mate

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

Copilot

Pull request overview

This PR enables a new whole-K prefetch pipeline variant for the FMHA (Flash Multi-Head Attention) operations. The change introduces a new pipeline type to optimize performance through whole-K prefetching, specifically for the hdim=128, hdim_v=128 configuration.

Changes:

Added new pipeline enum QRKSVS_WHOLEK_PREFETCH with corresponding string mappings
Updated C++ pipeline implementation name field to match the new pipeline identifier
Extended Python codegen to generate kernels for the new pipeline with specific constraints
Added symbol mappings for the new pipeline in codegen infrastructure

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
block_fmha_pipeline_qr_ks_vs_whole_k_prefetch.hpp	Updated static name field to identify the pipeline implementation
block_fmha_pipeline_enum.hpp	Added new enum value and string mapping template specialization
fmha_fwd.py	Added pipeline tag to validation checks, tile size configuration, and generation logic with constraints
cpp_symbol_map.py	Added C++ class and enum mappings for the new pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs_whole_k_prefetch.hpp

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…_whole_k_prefetch.hpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

poyenc · 2026-01-15T07:17:45Z

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

            for tile, pipeline in itertools.product(
                tiles, factory.get_pipelines(dtype, hdim, hdim_v, receipt, mask_impl)
            ):
+                if pipeline.tag == "qr_wholek_prefetch" and (


Please move the check to the CompatibilityRuleFactoryGfx9

Refactor seqtune method for better readability.

poyenc · 2026-01-15T07:25:57Z

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

+                                FmhaFwdPipeline(
+                                    "qr_wholek_prefetch",
+                                    "row",
+                                    "f",


Why don't add padding for both of the seqlen_q & seqlen_k dimensions?

illsilin · 2026-01-20T20:34:51Z

Hi @LJ-underdog , please make sure this thing compiles and passes tests!

enable wholek_prefetch

496c693

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

LJ-underdog and others added 6 commits October 30, 2025 13:00

Merge branch 'develop' into lj/wholek

4da3fe1

Update fmha_fwd.py

8799f16

Update block_fmha_pipeline_enum.hpp

b47650c

Merge branch 'develop' into lj/wholek

eae516c

Merge branch 'develop' into lj/wholek

cfee37f

Merge branch 'develop' into lj/wholek

dbd9727

LJ-underdog marked this pull request as ready for review January 13, 2026 02:33

LJ-underdog requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners January 13, 2026 02:33

Update fmha_fwd.py

6c7e463

LJ-underdog requested review from Snektron and vpietila-amd as code owners January 15, 2026 02:00

LJ-underdog changed the title ~~enable wholek_prefetch~~ [CK_TILE][FMHA] Enable wholek_prefetch Jan 15, 2026

LJ-underdog added 3 commits January 15, 2026 10:04

Merge branch 'develop' into lj/wholek

f8de278

replace squant to qscale

e41098e

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

Update fmha_fwd.py

d43ea4a

poyenc requested a review from Copilot January 15, 2026 06:43

poyenc assigned LJ-underdog Jan 15, 2026

Copilot started reviewing on behalf of poyenc January 15, 2026 06:44 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs_whole_k_prefetch.hpp Outdated Show resolved Hide resolved

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Outdated Show resolved Hide resolved

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Outdated Show resolved Hide resolved

LJ-underdog and others added 3 commits January 15, 2026 15:14

Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

7c44495

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs…

8ec3ca3

…_whole_k_prefetch.hpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

ecc0d6e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

poyenc reviewed Jan 15, 2026

View reviewed changes

Improve readability of seqtune method

19a3ab1

Refactor seqtune method for better readability.

poyenc reviewed Jan 15, 2026

View reviewed changes

Merge branch 'develop' into lj/wholek

20cc3f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_TILE][FMHA] Enable wholek_prefetch #3026

[CK_TILE][FMHA] Enable wholek_prefetch #3026

LJ-underdog commented Oct 15, 2025

Uh oh!

poyenc commented Oct 30, 2025

Uh oh!

LJ-underdog commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poyenc Jan 15, 2026

Uh oh!

poyenc Jan 15, 2026 •

edited

Loading

Uh oh!

illsilin commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[CK_TILE][FMHA] Enable wholek_prefetch #3026

Are you sure you want to change the base?

[CK_TILE][FMHA] Enable wholek_prefetch #3026

Conversation

LJ-underdog commented Oct 15, 2025

Proposed changes

Checklist

Discussion

Uh oh!

poyenc commented Oct 30, 2025

Uh oh!

LJ-underdog commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poyenc Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

poyenc Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

illsilin commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

poyenc Jan 15, 2026 •

edited

Loading