Skip to content

Conversation

@LJ-underdog
Copy link
Contributor

Proposed changes

Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Signed-off-by: JL-underdog <Jun.Lin@amd.com>
@poyenc
Copy link
Contributor

poyenc commented Oct 30, 2025

@LJ-underdog are you still working on this?

@LJ-underdog
Copy link
Contributor Author

@LJ-underdog are you still working on this?

yes, this pr is prepare for mate

@LJ-underdog LJ-underdog changed the title enable wholek_prefetch [CK_TILE][FMHA] Enable wholek_prefetch Jan 15, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables a new whole-K prefetch pipeline variant for the FMHA (Flash Multi-Head Attention) operations. The change introduces a new pipeline type to optimize performance through whole-K prefetching, specifically for the hdim=128, hdim_v=128 configuration.

Changes:

  • Added new pipeline enum QRKSVS_WHOLEK_PREFETCH with corresponding string mappings
  • Updated C++ pipeline implementation name field to match the new pipeline identifier
  • Extended Python codegen to generate kernels for the new pipeline with specific constraints
  • Added symbol mappings for the new pipeline in codegen infrastructure

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
block_fmha_pipeline_qr_ks_vs_whole_k_prefetch.hpp Updated static name field to identify the pipeline implementation
block_fmha_pipeline_enum.hpp Added new enum value and string mapping template specialization
fmha_fwd.py Added pipeline tag to validation checks, tile size configuration, and generation logic with constraints
cpp_symbol_map.py Added C++ class and enum mappings for the new pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LJ-underdog and others added 3 commits January 15, 2026 15:14
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…_whole_k_prefetch.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for tile, pipeline in itertools.product(
tiles, factory.get_pipelines(dtype, hdim, hdim_v, receipt, mask_impl)
):
if pipeline.tag == "qr_wholek_prefetch" and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the check to the CompatibilityRuleFactoryGfx9

Refactor seqtune method for better readability.
FmhaFwdPipeline(
"qr_wholek_prefetch",
"row",
"f",
Copy link
Contributor

@poyenc poyenc Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't add padding for both of the seqlen_q & seqlen_k dimensions?

@illsilin
Copy link
Collaborator

Hi @LJ-underdog , please make sure this thing compiles and passes tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants