feat: implement moe fused topk layer. by XuZhang99 · Pull Request #983 · jd-opensource/xllm

XuZhang99 · 2026-03-03T12:36:07Z

No description provided.

gemini-code-assist

Code Review

This pull request implements a fused top-k layer for Mixture of Experts, including CUDA kernels for both softmax and sigmoid scoring functions. A critical security vulnerability has been identified where several CUDA kernels use 32-bit signed integers (int) for global memory offset calculations and to represent the number of tokens. This can lead to integer overflows when processing very large batches of tokens, resulting in out-of-bounds memory access on the GPU, potentially causing crashes, information leakage, or memory corruption. Additionally, the review found critical issues that will prevent compilation or cause incorrect behavior, such as a wrong include path and a bug where a parameter is ignored in the softmax path. Improvements are also needed for const correctness, type safety, and coding style, like missing newlines at the end of files.

xllm/core/kernels/cuda/moe/moe_fused_topk.cpp

gemini-code-assist · 2026-03-03T12:38:26Z

xllm/core/kernels/cuda/moe/moe_topk_sigmoid_kernels.cuh

+
+  // We finally start setting up the read pointers for each thread. First, each
+  // thread jumps to the start of the row it will read.
+  const T* thread_row_ptr = input + thread_row * ELTS_PER_ROW;


Pointer arithmetic using a 32-bit signed integer for the offset (thread_row * ELTS_PER_ROW) can overflow, causing thread_row_ptr to point to an invalid memory location. This should be calculated using 64-bit integers.

const T* thread_row_ptr = input + static_cast<int64_t>(thread_row) * ELTS_PER_ROW;

gemini-code-assist · 2026-03-03T12:38:26Z

xllm/core/kernels/cuda/moe/moe_topk_sigmoid_kernels.cuh

+
+  const int num_experts = static_cast<int>(gating_output.size(-1));
+  const int num_tokens = static_cast<int>(gating_output.size(0));
+  const int topk = static_cast<int>(topk_weights.size(-1));


The number of tokens is explicitly cast to a 32-bit signed integer. In large-scale inference scenarios or high-throughput batch processing, the number of tokens (gating_output.size(0)) can exceed INT_MAX (2,147,483,647). This will cause an integer overflow, leading to incorrect behavior and potential out-of-bounds access in the kernels that receive this value.

gemini-code-assist · 2026-03-03T12:38:26Z

xllm/core/kernels/cuda/moe/moe_topk_softmax_kernels.cuh

+  __shared__ float normalizing_factor;
+  __shared__ float float_max;
+
+  const int thread_row_offset = blockIdx.x * num_cols;


The calculation of thread_row_offset is vulnerable to integer overflow if the product of the token index and the number of columns exceeds INT_MAX. This will lead to out-of-bounds memory access during the softmax computation.

const int64_t thread_row_offset = static_cast<int64_t>(blockIdx.x) * num_cols;

gemini-code-assist · 2026-03-03T12:38:27Z

xllm/core/kernels/cuda/moe/moe_topk_sigmoid_kernels.cuh

+                    float* output,
+                    const int num_cols,
+                    const float* correction_bias) {
+  const int thread_row_offset = blockIdx.x * num_cols;


The calculation of thread_row_offset using 32-bit signed integers can overflow if blockIdx.x * num_cols exceeds 2,147,483,647. Since blockIdx.x represents the token index and num_cols is the number of experts, this overflow is plausible in high-throughput scenarios. An overflow here will result in out-of-bounds reads and writes on the GPU.

const int64_t thread_row_offset = static_cast<int64_t>(blockIdx.x) * num_cols;

xllm/core/kernels/cuda/moe/moe_fused_topk.cpp

xllm/core/kernels/cuda/moe/moe_topk_sigmoid_kernels.cuh

xllm/core/kernels/cuda/moe/moe_topk_softmax_kernels.cuh

xllm/core/layers/common/moe_fused_topk.h

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

XuZhang99 force-pushed the feature/topk branch 3 times, most recently from 880e028 to 48e0763 Compare March 4, 2026 05:43

feat: implement moe fused topk layer.

e15eafa

XuZhang99 force-pushed the feature/topk branch from 48e0763 to e15eafa Compare March 4, 2026 06:18

XuZhang99 marked this pull request as ready for review March 4, 2026 06:18

XuZhang99 requested review from DongheJin, JimHsiung, RobbieLeung, liutongxuan, walsonyang and yq33victor as code owners March 4, 2026 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement moe fused topk layer.#983

feat: implement moe fused topk layer.#983
XuZhang99 wants to merge 1 commit intojd-opensource:mainfrom
XuZhang99:feature/topk

XuZhang99 commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XuZhang99 commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant