feat: support qwen3-next on npu device. by liyu119 · Pull Request #945 · jd-opensource/xllm

liyu119 · 2026-02-26T11:51:04Z

support qwen3-next model on npu
add linear attention cache
add triton kernel api, which depends on triton kernel ops merged in torch_npu_ops

gemini-code-assist

Code Review

This pull request adds support for the qwen3-next model on NPU devices, introducing new model architecture files, custom kernels, and updates to KV cache management for linear attention. The changes are extensive and well-structured. However, I've identified a few critical issues related to incorrect memory allocation for the new caches and a constructor signature mismatch that would lead to compilation failure. These issues need to be addressed to ensure correctness and allow the code to compile.

gemini-code-assist · 2026-02-26T11:53:11Z

+    int64_t head_k_dim = args_.linear_value_head_dim();
+    int64_t head_v_dim = args_.linear_key_head_dim();


There appears to be a variable naming swap here. head_k_dim is being initialized with linear_value_head_dim, and head_v_dim with linear_key_head_dim. This is likely to cause incorrect calculations for linear_ssm_slot_size and linear_conv_slot_size, leading to memory allocation errors or incorrect behavior. Please swap the initializations to match the variable names.

Suggested change

int64_t head_k_dim = args_.linear_value_head_dim();

int64_t head_v_dim = args_.linear_key_head_dim();

int64_t head_k_dim = args_.linear_key_head_dim();

int64_t head_v_dim = args_.linear_value_head_dim();

gemini-code-assist · 2026-02-26T11:53:11Z

+        args_.linear_key_head_dim() * n_local_linear_v_heads_,  args_.linear_conv_kernel_dim() - 1});
+      kv_cache_shape.emplace_back(std::vector<int64_t>{
+        kv_cache_cap.n_blocks,  n_local_linear_v_heads_, args_.linear_key_head_dim(),
+         args_.linear_key_head_dim()});


The shape for the SSM cache appears to be incorrect. Both the third and fourth dimensions are set to args_.linear_key_head_dim(). The SSM state typically has dimensions corresponding to key and value head dimensions (k_dim, v_dim). It should likely be args_.linear_value_head_dim() for the last dimension to correctly represent the state.

Suggested change

args_.linear_key_head_dim()});

args_.linear_value_head_dim()});

gemini-code-assist · 2026-02-26T11:53:11Z

+  KVCache(torch::Tensor key_cache,
+          torch::Tensor value_cache,
+          torch::Tensor conv_cache,
+          torch::Tensor ssm_cache);


This new constructor for KVCache takes four torch::Tensor arguments. However, it is being called with five arguments (key_cache, value_cache, index_cache, conv_cache, ssm_cache) in xllm/core/runtime/worker_impl.cpp on line 148. This will cause a compilation error. The constructor should be updated to accept all five tensors to correctly initialize all cache types. The implementation in kv_cache.cpp will also need to be updated to initialize all five members.

KVCache(torch::Tensor key_cache, torch::Tensor value_cache, torch::Tensor index_cache, torch::Tensor conv_cache, torch::Tensor ssm_cache);

gemini-code-assist · 2026-02-26T11:53:11Z

    }
 #endif
-    kv_caches_.emplace_back(key_cache, value_cache, index_cache);
+    kv_caches_.emplace_back(key_cache, value_cache, index_cache, conv_cache, ssm_cache);


This line attempts to construct a KVCache object with five arguments. However, there is no matching constructor defined for KVCache that accepts five tensors. The newly added constructor in kv_cache.h only takes four arguments. This will result in a compilation error. Please ensure the KVCache class has a constructor that matches this call.

…liyu119/xllm into feat-qwen3-next-pr

XuZhang99 · 2026-02-26T11:58:21Z

@@ -0,0 +1,44 @@
+/* Copyright 2025 The xLLM Authors. All Rights Reserved.


XuZhang99 · 2026-02-26T12:03:14Z

place this file to models/llm/npu.

放在 models/llm/ 下面吧，这个是 torch 组图通用的，不是atb 组图

JC-ut0 · 2026-02-27T03:11:34Z


+  // qwen3 next
+  PROPERTY(bool, attn_output_gate) = true;
+  PROPERTY(int32_t, full_attention_interval) = 4;


default value of full_attention_interval should be set to 1, in case other models don't have this config can behave correctly

JC-ut0 · 2026-02-27T09:21:34Z

+      return padded_qkvz;
+    }
+    std::vector<torch::Tensor> valid_batches;
+    int64_t bs = attn_metadata.query_start_loc.size(0);


qwen3_next_gated_delta_net.cpp:418:32: error: ‘const struct xllm::layer::AttentionMetadata’ has no member named ‘query_start_loc’
418 | int64_t bs = attn_metadata.query_start_loc.size(0);

liyu119 · 2026-02-27T09:20:16Z

                        torch::Tensor& weight,
                        bool& weight_is_loaded);

+void load_merged_weight_v2(const StateDict& state_dict,


#define DEFINE_MERGED_WEIGHT_V2(name) \

JC-ut0 · 2026-02-27T09:22:21Z

+    std::vector<torch::Tensor> valid_batches;
+    int64_t bs = attn_metadata.query_start_loc.size(0);
+    int64_t max_len = attn_metadata.max_query_len;
+    const auto& ori_seq_lens = attn_metadata.query_start_loc;


qwen3_next_gated_delta_net.cpp:420:46: error: ‘const struct xllm::layer::AttentionMetadata’ has no member named ‘query_start_loc’
420 | const auto& ori_seq_lens = attn_metadata.query_start_loc;

JC-ut0 · 2026-02-27T09:25:07Z

+  }
+
+ private:
+  layer::Qwen3NextDecoderLayer decoder_layer_{nullptr};


Qwen3NextDecoderLayer’ in namespace ‘xllm::layer’ does not name a type;

yingxudeng · 2026-02-27T13:10:45Z

上一个moe的pr，
xllm/core/layers/npu/fused_moe.cpp 这个文件放错地方了，
应该放在 xllm/xllm/core/layers/npu_torch/fused_moe.cpp 中，
后面我挪下

JC-ut0 · 2026-02-28T03:01:07Z

    }
 #endif
-    kv_caches_.emplace_back(key_cache, value_cache, index_cache);
+    kv_caches_.emplace_back(key_cache, value_cache, index_cache, conv_cache, ssm_cache);


why there have five arguments?

ext.wangxiaochi1 added 3 commits February 26, 2026 15:10

add qwen3-next decoder layer

32da9d6

support linear attention cache

128dee8

add triton kernel api

eba95b2

liyu119 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners February 26, 2026 11:51

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

liyu119 and others added 3 commits February 26, 2026 19:54

Merge branch 'main' into feat-qwen3-next-pr

5470f8d

add rope atb ops

073c775

Merge branch 'feat-qwen3-next-pr' of https://gh-proxy.com/github.com/…

85aba7b

…liyu119/xllm into feat-qwen3-next-pr

XuZhang99 reviewed Feb 26, 2026

View reviewed changes

yingxudeng marked this pull request as draft February 26, 2026 12:35

JC-ut0 reviewed Feb 27, 2026

View reviewed changes

yingxudeng changed the title ~~feat: support qwen3-next on npu device~~ feat: support qwen3-next on npu device. Feb 27, 2026

JC-ut0 reviewed Feb 27, 2026

View reviewed changes

liyu119 commented Feb 27, 2026

View reviewed changes

JC-ut0 reviewed Feb 27, 2026

View reviewed changes

fix bug

975f0eb

JC-ut0 reviewed Feb 28, 2026

View reviewed changes

JC-ut0 mentioned this pull request Mar 4, 2026

feat: support Qwen3-next on npu device. #989

Merged

		int64_t head_k_dim = args_.linear_value_head_dim();
		int64_t head_v_dim = args_.linear_key_head_dim();

	args_.linear_key_head_dim()});
	args_.linear_value_head_dim()});

		@@ -0,0 +1,44 @@
		/* Copyright 2025 The xLLM Authors. All Rights Reserved.

Conversation

liyu119 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JC-ut0 Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yingxudeng commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liyu119 commented Feb 26, 2026 •

edited

Loading

JC-ut0 Feb 27, 2026 •

edited

Loading