Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
c7c1a76
Implemented the kernel with split dbias
Oleg-Goncharov Feb 11, 2026
7abbc7b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 11, 2026
f820b21
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 12, 2026
0c05632
Relaxed constraints on the last dimension
Oleg-Goncharov Feb 13, 2026
4a85dea
Added notes on group tensor restrictions into documentation
Oleg-Goncharov Feb 13, 2026
aedd53d
Fixes per the review
Oleg-Goncharov Feb 27, 2026
38288b1
Fixed pointer
Oleg-Goncharov Feb 27, 2026
ce3a137
More fixes
Oleg-Goncharov Feb 27, 2026
bddd804
Fixed kernel grid size
Oleg-Goncharov Mar 2, 2026
a894d1a
Merge branch 'main' into pr_split_dbias
Oleg-Goncharov Mar 2, 2026
87352bd
Enabled persistency with WorkID Query feature
Oleg-Goncharov Mar 4, 2026
e23f553
Added a struct with tunable parameters
Oleg-Goncharov Mar 4, 2026
d185299
Added persistency with static scheduling
Oleg-Goncharov Mar 4, 2026
5e15f57
Fixed test cases
Oleg-Goncharov Mar 4, 2026
98e9558
Ready for benchmarking
Oleg-Goncharov Mar 4, 2026
ab816cb
Fixed out-of-boundary error
Oleg-Goncharov Mar 4, 2026
8a429ad
Tuned kernel parameters
Oleg-Goncharov Mar 4, 2026
ab3f911
Refactoring
Oleg-Goncharov Mar 4, 2026
92720ac
Refactoring 2
Oleg-Goncharov Mar 4, 2026
46d9811
Refactoring 3
Oleg-Goncharov Mar 4, 2026
7172400
Removed the dynamic (WorkID Query) persistency
Oleg-Goncharov Mar 5, 2026
4344627
Ready for PR
Oleg-Goncharov Mar 5, 2026
ede33b4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 5, 2026
219e925
Merge branch 'main' into pr_persistent_grouped_mxfp8_kernel
Oleg-Goncharov Mar 5, 2026
325181b
Fixes per the review
Oleg-Goncharov Mar 6, 2026
04609b1
Merge branch 'main' into pr_persistent_grouped_mxfp8_kernel
Oleg-Goncharov Mar 6, 2026
5815335
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2026
0bd837c
Added the test suite
Oleg-Goncharov Mar 6, 2026
0c5849c
Initial kernel draft
Oleg-Goncharov Mar 6, 2026
178a7c4
Refactoring
Oleg-Goncharov Mar 6, 2026
b035b43
Added the kernel to the quantization dispatcher
Oleg-Goncharov Mar 6, 2026
9d72757
Isolated only the Group Quantize NVFP4 for compilation
Oleg-Goncharov Mar 6, 2026
da8da89
Fixed test suite and bug in scaling factors padding
Oleg-Goncharov Mar 6, 2026
46fdb93
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion tests/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
if (CUDAToolkit_VERSION VERSION_GREATER_EQUAL 12.8)
set(CMAKE_CUDA_ARCHITECTURES 75 80 89 90 100 120)
else ()
set(CMAKE_CUDA_ARCHITECTURES 75 80 89 90)
# set(CMAKE_CUDA_ARCHITECTURES 75 80 89 90)
set(CMAKE_CUDA_ARCHITECTURES 100)
Comment on lines +11 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Development-only architecture restriction should not be merged

The old multi-architecture fallback (75 80 89 90) has been replaced with a Blackwell-only build (100). The original line is left behind as a commented-out breadcrumb. This means on any CI runner with CUDA < 12.8 the test binary will only compile for sm_100, silently skipping all Volta/Ampere/Ada/Hopper targets. This is clearly a local development shortcut and must be reverted before merging.

Suggested change
# set(CMAKE_CUDA_ARCHITECTURES 75 80 89 90)
set(CMAKE_CUDA_ARCHITECTURES 100)
set(CMAKE_CUDA_ARCHITECTURES 75 80 89 90)

endif()
endif()

Expand Down
59 changes: 30 additions & 29 deletions tests/cpp/operator/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,36 @@
# See LICENSE for license information.

add_executable(test_operator
test_cast.cu
test_cast_current_scaling.cu
test_cast_dbias.cu
test_cast_dbias_dgelu.cu
test_cast_gated_swiglu.cu
test_cast_mxfp8_gated_swiglu.cu
test_qdq.cu
test_cast_mxfp8.cu
test_cast_mxfp8_grouped.cu
test_cast_nvfp4_transpose.cu
test_cast_float8blockwise.cu
test_dequantize_mxfp8.cu
test_transpose.cu
test_cast_transpose.cu
test_cast_transpose_current_scaling.cu
test_cast_transpose_dbias.cu
test_cast_transpose_dbias_dgelu.cu
test_cast_transpose_dgeglu.cu
test_act.cu
test_normalization.cu
test_normalization_mxfp8.cu
test_memset.cu
test_multi_cast_transpose.cu
test_multi_padding.cu
test_multi_unpadding.cu
test_causal_softmax.cu
test_swizzle.cu
test_swap_first_dims.cu
test_grouped_gemm.cu
# test_cast.cu
# test_cast_current_scaling.cu
# test_cast_dbias.cu
# test_cast_dbias_dgelu.cu
# test_cast_gated_swiglu.cu
# test_cast_mxfp8_gated_swiglu.cu
# test_qdq.cu
# test_cast_mxfp8.cu
# test_cast_mxfp8_grouped.cu
# test_cast_nvfp4_transpose.cu
test_cast_nvfp4_transpose_grouped.cu
# test_cast_float8blockwise.cu
# test_dequantize_mxfp8.cu
# test_transpose.cu
# test_cast_transpose.cu
# test_cast_transpose_current_scaling.cu
# test_cast_transpose_dbias.cu
# test_cast_transpose_dbias_dgelu.cu
# test_cast_transpose_dgeglu.cu
# test_act.cu
# test_normalization.cu
# test_normalization_mxfp8.cu
# test_memset.cu
# test_multi_cast_transpose.cu
# test_multi_padding.cu
# test_multi_unpadding.cu
# test_causal_softmax.cu
# test_swizzle.cu
# test_swap_first_dims.cu
# test_grouped_gemm.cu
Comment on lines +6 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing operator tests commented out

Every pre-existing test (test_cast.cu, test_cast_mxfp8.cu, test_cast_mxfp8_grouped.cu, test_normalization.cu, etc.) has been commented out, leaving only test_cast_nvfp4_transpose_grouped.cu in the build. This completely disables the existing regression suite and hides any breakage introduced by the changes in quantize.cuh and group_quantize_mxfp8.cuh. These comment-outs appear to be a local development convenience and must be reverted before merging.

The new test file should simply be added to the existing list, not substituted for it.

../test_common.cu)

# Find required packages
Expand Down
162 changes: 88 additions & 74 deletions tests/cpp/operator/test_cast_mxfp8_grouped.cu

Large diffs are not rendered by default.

Loading