[CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD #3615

DDEle · 2026-01-20T05:35:16Z

Proposed changes

This PR fixes an integer overflow issue in the deterministic FMHA backward implementation by changing batch/nhead stride variables from index_t (32-bit) to long_index_t (64-bit) for the dq_acc. See also ROCm/aiter#1873.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Copilot

Pull request overview

This PR fixes an integer overflow issue in the deterministic FMHA (Fused Multi-Head Attention) backward pass implementation by changing stride variables from index_t (32-bit) to long_index_t (64-bit) for the dq_acc accumulator tensor.

Changes:

Changed nhead_stride_dq_acc and batch_stride_dq_acc from ck_tile::index_t to ck_tile::long_index_t across kernel definitions and argument structures
Restructured the dq_acc tensor layout to be [batch, nhead, nsplits, seqlen_q, hdim_q] instead of conditional layouts
Updated stride calculations to use proper casting and correct dimension ordering

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
include/ck_tile/ops/fmha/kernel/fmha_bwd_kernel.hpp	Changed type of `nhead_stride_dq_acc` and `batch_stride_dq_acc` from `index_t` to `long_index_t` in kernel structs and function signatures
example/ck_tile/01_fmha/fmha_bwd_runner.hpp	Simplified `dq_acc` tensor layout, corrected stride calculations with proper casting, and reordered computation of stride variables
example/ck_tile/01_fmha/fmha_bwd.hpp	Updated argument structure to use `long_index_t` for `nhead_stride_dq_acc` and `batch_stride_dq_acc`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

example/ck_tile/01_fmha/fmha_bwd_runner.hpp

DDEle requested a review from Copilot January 20, 2026 05:35

Copilot AI reviewed Jan 20, 2026

View reviewed changes

example/ck_tile/01_fmha/fmha_bwd_runner.hpp Show resolved Hide resolved

example/ck_tile/01_fmha/fmha_bwd_runner.hpp Show resolved Hide resolved

[CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD

d321f3a

DDEle force-pushed the fix-fmha-det-overflow branch from f157fc9 to d321f3a Compare January 20, 2026 05:45

DDEle mentioned this pull request Jan 20, 2026

[CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD ROCm/aiter#1873

Open

1 task

DDEle marked this pull request as ready for review January 20, 2026 06:45

DDEle requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent, vidyasagar-amd and vpietila-amd as code owners January 20, 2026 06:45

DDEle enabled auto-merge (squash) January 20, 2026 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD #3615

[CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD #3615

DDEle commented Jan 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD #3615

Are you sure you want to change the base?

[CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD #3615

Conversation

DDEle commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DDEle commented Jan 20, 2026 •

edited

Loading