[Common] Use specialized unfused MXFP8 cast kernels by default#2958
[Common] Use specialized unfused MXFP8 cast kernels by default#2958Oleg-Goncharov wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Greptile SummaryThis PR makes the specialized unfused MXFP8 cast kernels the default code path by removing the
Confidence Score: 5/5Safe to merge — the specialized kernels are now always enabled for supported type combinations, and correctness is preserved by the The change is narrow and well-contained: env-var removal plus a runtime dispatch guard. The alignment check at The alignment constant in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[quantize called] --> B{hasSpec AND\nnot GEMM-swizzled?}
B -- No --> G[Generic kernel path]
B -- Yes --> C{scaling_type_has_\nspecialized_support?}
C -- No\n(COLWISE or partial row) --> G
C -- Yes --> D{ScalingType?}
D -- ROWWISE\ncols%128==0 --> E[specialized rowwise\ncast-only kernel]
D -- BIDIMENSIONAL --> F[specialized bidimensional\ncast-only kernel with TMA]
D -- default --> H[NVTE_ERROR]
E --> I[return]
F --> I
Reviews (4): Last reviewed commit: "Merge branch 'main' into pr_fast_default..." | Re-trigger Greptile |
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
|
/te-ci |
Description
This PR enables the fast unfused MXFP8 cast kernels by default.
Previously, these kernels were gated behind an environment variable and therefore were not used unless explicitly enabled. This change makes the specialized cast-only path the default behavior.
Type of change
Changes
Checklist: