[WIP] [KDA] support GVA for delta_h and fwd_o by KevinZeng08 · Pull Request #73 · inclusionAI/cuLA

KevinZeng08 · 2026-05-20T04:53:32Z

📌 Description

add GVA support for delta_h and fwd_o in SM10X
based on PR [Feat] upgrade FLA to v0.5.0 #72

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to cuLA! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing.

⚡ Performance

Reviewer Notes

…st param

…upgrade-fla

gemini-code-assist

Code Review

This pull request updates the flash-linear-attention baseline to v0.5.0 and introduces support for Grouped Value Attention (GVA) across the chunk_delta_h and fwd_o kernels. The implementation includes updated indexing logic to map value heads to QK heads and extends benchmark scripts to support configurable head counts. Documentation and benchmark results have been refreshed to reflect performance improvements on Blackwell and Hopper architectures. Feedback was provided to include explicit assertions validating that the number of value heads is a multiple of the QK heads and that head dimensions are restricted to 128, as required by the current kernel tiling logic.

gemini-code-assist · 2026-05-20T04:57:56Z

+    HV = u.shape[2]
    V_dim = u.shape[3]
    BT = chunk_size
    is_varlen = cu_seqlens is not None


For Grouped Value Attention (GVA) to work correctly with the current head-mapping logic (i_h = hidx // (HV // H)), the number of value heads (HV) must be a multiple of the number of QK heads (H). Additionally, since the kernel tiling is hardcoded for specific dimensions, we should also validate that V_dim matches the expected 128.

Suggested change

is_varlen = cu_seqlens is not None

is_varlen = cu_seqlens is not None

assert HV >= H and HV % H == 0, f"HV ({HV}) must be >= H ({H}) and divisible by H"

assert K_dim == 128 and V_dim == 128, f"current kernel only supports head_dim=128, got K={K_dim}, V={V_dim}"

KevinZeng08 and others added 13 commits May 9, 2026 10:15

upgrade fla and update b200 bench, update readme and fix lightning te…

ba756f4

…st param

update h200 bench result with fla bug fixed

d9dfe73

update b200 bench

3aacc99

Merge branch 'main' of https://github.com/inclusionAI/cuLA into feat/…

ab71168

…upgrade-fla

update b200 bench

eef500e

fix readme

3ca1ffb

remove useless repeat_interleave for fla

8e65f1b

fix readme

e9e2c39

gva for delta_h

eaacb71

bench for delta_h

bd696a5

fix ref delta_h

e4013cb

add gva for fwd_o

3da1452

code lint

d019c42

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [KDA] support GVA for delta_h and fwd_o#73

[WIP] [KDA] support GVA for delta_h and fwd_o#73
KevinZeng08 wants to merge 13 commits into
mainfrom
feat/gva-cutedsl

KevinZeng08 commented May 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KevinZeng08 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

⚡ Performance

Reviewer Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KevinZeng08 commented May 20, 2026 •

edited

Loading