feat: support Qwen3-next on npu device. by JC-ut0 · Pull Request #989 · jd-opensource/xllm

JC-ut0 · 2026-03-04T02:16:00Z

Support Qwen3-next on NPU device, add linear attention cache.
Add triton kernel api, which depends on the merging of feat: adapt for CANN 8.5 and PyTorch 2.7.1 for npu device. #891 .
Modified from feat: support qwen3-next on npu device. #945, to resolve merging conflicts and bugs.

gemini-code-assist

Code Review

This pull request introduces support for the 'Qwen next' model, involving extensive changes across the build system, environment setup, and core C++ components, including new layers, kernels, and model arguments. A critical security vulnerability has been identified where user-supplied data in RPC requests is validated using CHECK macros, creating a Denial of Service (DoS) attack vector by allowing malformed requests to crash worker processes. It is strongly recommended to replace these CHECK macros with proper error validation and return error statuses. Furthermore, a critical issue exists in the KV cache capacity estimation logic where variable names for key and value head dimensions are swapped, potentially leading to incorrect memory allocation and runtime failures.

JC-ut0 · 2026-03-05T12:37:39Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for the "Qwen3-next" model on NPU devices. A high-severity Denial of Service (DoS) vulnerability has been identified in the RPC handlers of the "WorkerService", where "CHECK" macros used for input validation can cause the worker process to abort on invalid input, allowing remote attackers to crash the worker. Additionally, two critical bugs were found in the cache allocation logic: a typo in the "SSM" cache shape definition and a copy-paste error when handling cache shapes in the worker service. These issues need to be addressed to ensure both correctness and security, specifically by replacing "CHECK" macros with graceful error handling.

JC-ut0 · 2026-03-14T07:41:19Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for the Qwen3-next model on NPU devices, which includes adding a linear attention cache. The changes are extensive, involving new model layers, kernels, and updates to the build system and data structures. My review identified a critical compilation error related to incorrect pointer access and a couple of high-severity issues where function signatures could lead to unexpected side effects by modifying input tensors. I have provided code suggestions to address these problems.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…all phases

yingxudeng · 2026-03-23T03:15:44Z

CI/CD jobs keep hanging then getting skipped. Re-triggering manually, hoping it runs clean this time.

yq33victor

LGTM

JC-ut0 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners March 4, 2026 02:16

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

yingxudeng marked this pull request as draft March 4, 2026 02:43

JC-ut0 changed the title ~~[Draft] Support Qwen next~~ [Draft] Support Qwen3-next on NPU device Mar 4, 2026

JC-ut0 commented Mar 4, 2026

View reviewed changes

Comment thread xllm/core/layers/npu/fused_moe.cpp Outdated

JC-ut0 commented Mar 4, 2026

View reviewed changes

Comment thread CMakeLists.txt Outdated

JC-ut0 commented Mar 4, 2026

View reviewed changes

Comment thread CMakeLists.txt

JC-ut0 force-pushed the qwen-next branch 4 times, most recently from a3e3901 to 0bc39a0 Compare March 5, 2026 09:11

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

Comment thread xllm/core/distributed_runtime/llm_engine.cpp Outdated

Comment thread xllm/core/distributed_runtime/worker_service.cpp Outdated

Comment thread xllm/core/distributed_runtime/worker_service.cpp Outdated

JC-ut0 changed the title ~~[Draft] Support Qwen3-next on NPU device~~ feat: Support Qwen3-next on NPU device Mar 10, 2026

JC-ut0 marked this pull request as ready for review March 10, 2026 01:45

XuZhang99 changed the title ~~feat: Support Qwen3-next on NPU device~~ feat: support Qwen3-next on npu device. Mar 12, 2026

JC-ut0 force-pushed the qwen-next branch from 7dc7f5c to 8515dcf Compare March 14, 2026 07:39

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

Comment thread xllm/core/distributed_runtime/comm_channel.cpp Outdated

Comment thread xllm/core/layers/common/qwen3_next_rms_norm.cpp

Comment thread xllm/core/layers/common/rms_norm_gated.cpp Outdated

JC-ut0 force-pushed the qwen-next branch from 37a5766 to e1f3b0c Compare March 14, 2026 10:23

yingxudeng reviewed Mar 16, 2026

View reviewed changes

Comment thread xllm/models/llm/qwen3_next.h Outdated

Comment thread xllm/models/llm/qwen3_next.h Outdated

Comment thread xllm/models/llm/qwen3_next.h Outdated

Comment thread xllm/core/kernels/CMakeLists.txt Outdated

Comment thread CMakeLists.txt Outdated

JC-ut0 force-pushed the qwen-next branch from e0e1698 to dd2d6d3 Compare March 16, 2026 08:21

yingxudeng force-pushed the qwen-next branch from 3f847ff to 3272d53 Compare March 16, 2026 08:51

XuZhang99 reviewed Mar 16, 2026

View reviewed changes

JC-ut0 and others added 12 commits March 18, 2026 17:29

BugFix: reduce device memory usage

1f2aaf5

Bugfix: qwen3-next only need to update self-attention's params

4ce2021

rebase to main

308e895

Apply suggestions from code review

628cd82

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix compile error

ee683e8

fix format issues

ef9f1ce

refactor: simplify qwen3-next model implementation for npu torch.

2c406b9

refactor: move qwen3-next npu torch specific layers to npu_torch.

4eb31ea

bugfix: fix compile error caused by triton ops api include.

ff05c14

fix aclgraph update problem

e10efc3

refactor: clean up qwen3-next hybrid attention code.

8e70a11

need to refactor CMake file later

f82db08

JC-ut0 force-pushed the qwen-next branch from 5f0cd42 to f82db08 Compare March 18, 2026 09:30

walsonyang reviewed Mar 18, 2026

View reviewed changes

yq33victor reviewed Mar 19, 2026

View reviewed changes

Clean up qwen3-next codes

3255a54

JC-ut0 force-pushed the qwen-next branch from 3a73dd0 to 3255a54 Compare March 20, 2026 01:24

Merge branch 'main' into qwen-next

5f27d55

DragonFive previously approved these changes Mar 20, 2026

View reviewed changes

bugfix: Ensure per-sequence lengths are available for NPU kernels in …

6f35722

…all phases

JC-ut0 dismissed DragonFive’s stale review via 6f35722 March 20, 2026 09:35

Add GDN attention log

61840dc

JC-ut0 requested review from DragonFive, walsonyang and yq33victor March 20, 2026 09:39

yq33victor approved these changes Mar 23, 2026

View reviewed changes

JimHsiung approved these changes Mar 23, 2026

View reviewed changes

yingxudeng approved these changes Mar 23, 2026

View reviewed changes

yingxudeng merged commit d135444 into jd-opensource:main Mar 24, 2026
49 of 81 checks passed

Conversation

JC-ut0 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JC-ut0 commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JC-ut0 commented Mar 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yingxudeng commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yq33victor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

JC-ut0 commented Mar 4, 2026 •

edited

Loading

yingxudeng commented Mar 23, 2026 •

edited

Loading