Skip to content

Fix LRU cache pollution causing BLOCK_SIZE_S3 KeyError in gemm_afp4wfp4#2169

Merged
k50112113 merged 3 commits intoROCm:mainfrom
Duyi-Wang:fix_block_size_3_error
Mar 12, 2026
Merged

Fix LRU cache pollution causing BLOCK_SIZE_S3 KeyError in gemm_afp4wfp4#2169
k50112113 merged 3 commits intoROCm:mainfrom
Duyi-Wang:fix_block_size_3_error

Conversation

@Duyi-Wang
Copy link
Copy Markdown
Contributor

Motivation

Fix KeyError: 'Keyword argument BLOCK_SIZE_S3 was specified but unrecognised' crash in gemm_afp4wfp4 triggered when using quark_w4a4_mxfp4 quantization in SGLang.

Technical Details

get_gemm_config() in aiter/ops/triton/utils/gemm_config_utils.py uses @functools.lru_cache, which caches the returned (dict, bool) tuple by reference. The fused kernel fused_gemm_afp4wfp4_split_cat mutates the cached dict in-place by injecting BLOCK_SIZE_S3:

config["BLOCK_SIZE_S3"] = triton.next_power_of_2(...)

Subsequent calls to gemm_afp4wfp4 with the same (M, N, K) receive the same polluted dict from the cache. When unpacked via **config into a Triton kernel that doesn't accept BLOCK_SIZE_S3, it raises a KeyError.

Fix: Added config.pop("BLOCK_SIZE_S3", None) in the three affected functions in gemm_afp4wfp4.py (gemm_afp4wfp4_, gemm_afp4wfp4_preshuffled_scales, gemm_afp4wfp4_preshuffle) immediately after obtaining the config dict. This strips the polluted key before it reaches the kernel call, regardless of whether the cache was polluted or not.

This is a minimal workaround. A more thorough fix would be to have get_gemm_config() return defensive copies to prevent cache mutation entirely.

Test Plan

Test Result

Submission Checklist

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses a Triton kernel launch crash (KeyError: '... BLOCK_SIZE_S3 ... unrecognised') caused by BLOCK_SIZE_S3 being injected into (and then reused from) a cached GEMM config when using MXFP4 quantization paths.

Changes:

  • Strip BLOCK_SIZE_S3 from the GEMM config dict after config retrieval in gemm_afp4wfp4_.
  • Strip BLOCK_SIZE_S3 from the GEMM config dict after config retrieval in gemm_afp4wfp4_preshuffled_scales.
  • Strip BLOCK_SIZE_S3 from the GEMM config dict after config retrieval in gemm_afp4wfp4_preshuffle.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aiter/ops/triton/gemm/basic/gemm_afp4wfp4.py Outdated
Comment thread aiter/ops/triton/gemm/basic/gemm_afp4wfp4.py Outdated
@Duyi-Wang Duyi-Wang changed the title Walk around "BLOCK_SIZE_S3" error Fix LRU cache pollution causing BLOCK_SIZE_S3 KeyError in gemm_afp4wfp4 Mar 4, 2026
@azaidy azaidy requested a review from k50112113 March 4, 2026 19:00
@k50112113
Copy link
Copy Markdown
Contributor

Instead of popping BLOCK_SIZE_S3 with a default to None in the gemm_afp4wfp4 API, my suggestion would be to add .copy() at _get_config() at /root/aiter/aiter/ops/triton/_triton_kernels/gemm/basic/gemm_afp4wfp4.py

like the following:

def _get_config(
    M: int,
    N: int,
    K: int,
    shuffle: bool = False,
):
    # Note: Config files use K=2*K in their naming
    K = 2 * K
    if shuffle:
        return get_gemm_config("GEMM-AFP4WFP4_PRESHUFFLED", M, N, K).copy()
    else:
        return get_gemm_config("GEMM-AFP4WFP4", M, N, K).copy()

to copy the config dict

so that the configs of both the gemm_afp4wfp4 and the fused_gemm_afp4wfp4_split_cat

Please let me know if this works

Thanks,
Shao-Chun Lee

@Duyi-Wang
Copy link
Copy Markdown
Contributor Author

Instead of popping BLOCK_SIZE_S3 with a default to None in the gemm_afp4wfp4 API, my suggestion would be to add .copy() at _get_config() at /root/aiter/aiter/ops/triton/_triton_kernels/gemm/basic/gemm_afp4wfp4.py

like the following:

def _get_config(
    M: int,
    N: int,
    K: int,
    shuffle: bool = False,
):
    # Note: Config files use K=2*K in their naming
    K = 2 * K
    if shuffle:
        return get_gemm_config("GEMM-AFP4WFP4_PRESHUFFLED", M, N, K).copy()
    else:
        return get_gemm_config("GEMM-AFP4WFP4", M, N, K).copy()

to copy the config dict

so that the configs of both the gemm_afp4wfp4 and the fused_gemm_afp4wfp4_split_cat

Please let me know if this works

Thanks, Shao-Chun Lee

Returning a deep copy in _get_config() would be a better solution to the cache pollution issue and it works, but I’m unsure about its impact on performance or other components. Therefore, I’m using this as a workaround for my current workload.
The fix based on deep-copy has been pushed.

@Duyi-Wang Duyi-Wang force-pushed the fix_block_size_3_error branch from cb05b0b to a0d87c6 Compare March 6, 2026 03:15
@Duyi-Wang
Copy link
Copy Markdown
Contributor Author

Similar one merged #2173. However, this is the same workaround and is only applicable to gemm_afp4wfp4. @k50112113

Copy link
Copy Markdown
Contributor

@k50112113 k50112113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks good, thanks for the work

@k50112113 k50112113 merged commit 7d29fce into ROCm:main Mar 12, 2026
36 checks passed
Duyi-Wang added a commit to Duyi-Wang/aiter that referenced this pull request Mar 13, 2026
…p4 (ROCm#2169)

* Walk around "BLOCK_SIZE_S3" error

* Remove workaround for "BLOCK_SIZE_S3" key in GEMM configuration functions

* Revert "Copy config before mutate (ROCm#2173)"

This reverts commit 213b76f.
valarLip pushed a commit that referenced this pull request Mar 18, 2026
…p4 (#2169)

* Walk around "BLOCK_SIZE_S3" error

* Remove workaround for "BLOCK_SIZE_S3" key in GEMM configuration functions

* Revert "Copy config before mutate (#2173)"

This reverts commit fe41d1e.
AMD-yanfeiwang pushed a commit to AMD-yanfeiwang/aiter that referenced this pull request Mar 18, 2026
…p4 (ROCm#2169)

* Walk around "BLOCK_SIZE_S3" error

* Remove workaround for "BLOCK_SIZE_S3" key in GEMM configuration functions

* Revert "Copy config before mutate (ROCm#2173)"

This reverts commit 000ba6c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants