Fix LRU cache pollution causing BLOCK_SIZE_S3 KeyError in gemm_afp4wfp4#2169
Fix LRU cache pollution causing BLOCK_SIZE_S3 KeyError in gemm_afp4wfp4#2169
Conversation
There was a problem hiding this comment.
Pull request overview
Addresses a Triton kernel launch crash (KeyError: '... BLOCK_SIZE_S3 ... unrecognised') caused by BLOCK_SIZE_S3 being injected into (and then reused from) a cached GEMM config when using MXFP4 quantization paths.
Changes:
- Strip
BLOCK_SIZE_S3from the GEMM config dict after config retrieval ingemm_afp4wfp4_. - Strip
BLOCK_SIZE_S3from the GEMM config dict after config retrieval ingemm_afp4wfp4_preshuffled_scales. - Strip
BLOCK_SIZE_S3from the GEMM config dict after config retrieval ingemm_afp4wfp4_preshuffle.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Instead of popping BLOCK_SIZE_S3 with a default to None in the like the following: to copy the config dict so that the configs of both the Please let me know if this works Thanks, |
Returning a deep copy in |
cb05b0b to
a0d87c6
Compare
|
Similar one merged #2173. However, this is the same workaround and is only applicable to gemm_afp4wfp4. @k50112113 |
k50112113
left a comment
There was a problem hiding this comment.
Yes, this looks good, thanks for the work
Motivation
Fix
KeyError: 'Keyword argument BLOCK_SIZE_S3 was specified but unrecognised'crash ingemm_afp4wfp4triggered when usingquark_w4a4_mxfp4quantization in SGLang.Technical Details
get_gemm_config()inaiter/ops/triton/utils/gemm_config_utils.pyuses@functools.lru_cache, which caches the returned(dict, bool)tuple by reference. The fused kernelfused_gemm_afp4wfp4_split_catmutates the cached dict in-place by injectingBLOCK_SIZE_S3:Subsequent calls to
gemm_afp4wfp4with the same(M, N, K)receive the same polluted dict from the cache. When unpacked via**configinto a Triton kernel that doesn't acceptBLOCK_SIZE_S3, it raises aKeyError.Fix: Added
config.pop("BLOCK_SIZE_S3", None)in the three affected functions ingemm_afp4wfp4.py(gemm_afp4wfp4_,gemm_afp4wfp4_preshuffled_scales,gemm_afp4wfp4_preshuffle) immediately after obtaining the config dict. This strips the polluted key before it reaches the kernel call, regardless of whether the cache was polluted or not.This is a minimal workaround. A more thorough fix would be to have
get_gemm_config()return defensive copies to prevent cache mutation entirely.Test Plan
Test Result
Submission Checklist