Skip to content

[OOT]enable qwen3next to oot impl#406

Open
ganyi1996ppo wants to merge 3 commits intomainfrom
ganyi/qwen3next_oot
Open

[OOT]enable qwen3next to oot impl#406
ganyi1996ppo wants to merge 3 commits intomainfrom
ganyi/qwen3next_oot

Conversation

@ganyi1996ppo
Copy link
Contributor

@ganyi1996ppo ganyi1996ppo commented Mar 25, 2026

Motivation

Technical Details

Test Plan

Test Result

qwen3next 80B fp8

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8446|±  |0.0100|
|     |       |strict-match    |     5|exact_match|↑  |0.8135|±  |0.0107|

qwen3next 80B bf16

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8613|±  |0.0095|
|     |       |strict-match    |     5|exact_match|↑  |0.8423|±  |0.0100|

Submission Checklist

Signed-off-by: ganyi <ygan@amd.com>
Copilot AI review requested due to automatic review settings March 25, 2026 07:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables Qwen3Next support in the ATOM OOT (vLLM plugin) integration by registering the architecture and adding vLLM/hybrid-specific glue for Qwen3Next’s gated-delta-net (Mamba-style) components.

Changes:

  • Register Qwen3NextForCausalLM for vLLM plugin mode and map it to the appropriate ATOM implementation/wrapper.
  • Extend qwen3_next with vLLM-specific hybrid/Mamba state helpers and adjust QKVZ/BA projection handling based on dtype/quantization.
  • Update MergedColumnParallelLinear.weight_loader to accept loaded_shard_id=None to support loading fused tensors directly from checkpoints.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
atom/plugin/vllm/register.py Registers Qwen3Next architecture override to a vLLM-capable wrapper class.
atom/plugin/vllm/model_wrapper.py Adds Qwen3Next to the ATOM model class lookup for vLLM plugin mode.
atom/models/qwen3_next.py Adds vLLM hybrid/Mamba integration and projection-path changes for Qwen3Next.
atom/model_ops/linear.py Enhances merged-column weight loading to support fused-on-disk weights without shard IDs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mixed_ba: torch.Tensor,
):
"""
Derives `query`, `key` and `value` tensors from `mixed_qkvzba`.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for fix_query_key_value_ordering still refers to a single mixed_qkvzba tensor, but the function now takes mixed_qkvz and mixed_ba separately. Updating the docstring would avoid confusion for future maintenance/debugging.

Suggested change
Derives `query`, `key` and `value` tensors from `mixed_qkvzba`.
Derives the `query`, `key`, `value`, `z`, `b`, and `a` tensors from
the projected inputs `mixed_qkvz` and `mixed_ba`.

Copilot uses AI. Check for mistakes.
Comment on lines +595 to +610
if loaded_shard_id is None:
# Loaded weight is already fused on disk
# Split it and load each shard individually.
param_data = param.data
# Check if this is weight or weight_scale
is_scale_param = param is getattr(
self, "weight_scale", None
) or param is getattr(self, "input_scale", None)

# For fused weight, need to match param shape
if param_data.shape == loaded_weight.shape:
# Shapes match - direct copy
param.weight_loader_process(param_data, loaded_weight)
return

# Otherwise, split the fused weight and load each output shard
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MergedColumnParallelLinear.weight_loader, the new loaded_shard_id is None path only does a direct load when param_data.shape == loaded_weight.shape. This bypasses weight_loader_process's built-in reshape logic (it can reshape when numel matches but shapes differ). For scale tensors in particular (e.g., (n,) vs (n, 1)), this can incorrectly fall through into the shard-splitting logic and likely crash. Consider attempting weight_loader_process when loaded_weight.numel() == param_data.numel() (or always using weight_loader_process in the "shapes match" case too) before trying to split by output_sizes.

Copilot uses AI. Check for mistakes.
Signed-off-by: ganyi <ygan@amd.com>
valarLip
valarLip previously approved these changes Mar 25, 2026
@wuhuikx
Copy link
Contributor

wuhuikx commented Mar 25, 2026

can you also add the recipe?

Signed-off-by: ganyi <ygan@amd.com>
Copilot AI review requested due to automatic review settings March 25, 2026 09:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 687 to 689
"""
Derives `query`, `key` and `value` tensors from `mixed_qkvzba`.
"""
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring still refers to mixed_qkvzba, but this method now takes mixed_qkvz and mixed_ba separately. Please update the docstring (and any referenced tensor layout) so it matches the current arguments/behavior.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants