[OOT]enable qwen3next to oot impl by ganyi1996ppo · Pull Request #406 · ROCm/ATOM

ganyi1996ppo · 2026-03-25T07:16:12Z

Motivation

Technical Details

Test Plan

Test Result

qwen3next 80B fp8

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8446|±  |0.0100|
|     |       |strict-match    |     5|exact_match|↑  |0.8135|±  |0.0107|

qwen3next 80B bf16

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8613|±  |0.0095|
|     |       |strict-match    |     5|exact_match|↑  |0.8423|±  |0.0100|

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

This PR enables Qwen3Next support in the ATOM OOT (vLLM plugin) integration by registering the architecture and adding vLLM/hybrid-specific glue for Qwen3Next’s gated-delta-net (Mamba-style) components.

Changes:

Register Qwen3NextForCausalLM for vLLM plugin mode and map it to the appropriate ATOM implementation/wrapper.
Extend qwen3_next with vLLM-specific hybrid/Mamba state helpers and adjust QKVZ/BA projection handling based on dtype/quantization.
Update MergedColumnParallelLinear.weight_loader to accept loaded_shard_id=None to support loading fused tensors directly from checkpoints.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`atom/plugin/vllm/register.py`	Registers Qwen3Next architecture override to a vLLM-capable wrapper class.
`atom/plugin/vllm/model_wrapper.py`	Adds Qwen3Next to the ATOM model class lookup for vLLM plugin mode.
`atom/models/qwen3_next.py`	Adds vLLM hybrid/Mamba integration and projection-path changes for Qwen3Next.
`atom/model_ops/linear.py`	Enhances merged-column weight loading to support fused-on-disk weights without shard IDs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atom/models/qwen3_next.py

Copilot · 2026-03-25T07:26:37Z

atom/models/qwen3_next.py

+        mixed_ba: torch.Tensor,
    ):
        """
        Derives `query`, `key` and `value` tensors from `mixed_qkvzba`.


The docstring for fix_query_key_value_ordering still refers to a single mixed_qkvzba tensor, but the function now takes mixed_qkvz and mixed_ba separately. Updating the docstring would avoid confusion for future maintenance/debugging.

Suggested change

Derives `query`, `key` and `value` tensors from `mixed_qkvzba`.

Derives the `query`, `key`, `value`, `z`, `b`, and `a` tensors from

the projected inputs `mixed_qkvz` and `mixed_ba`.

Copilot · 2026-03-25T07:26:38Z

atom/model_ops/linear.py

+        if loaded_shard_id is None:
+            # Loaded weight is already fused on disk
+            # Split it and load each shard individually.
+            param_data = param.data
+            # Check if this is weight or weight_scale
+            is_scale_param = param is getattr(
+                self, "weight_scale", None
+            ) or param is getattr(self, "input_scale", None)
+
+            # For fused weight, need to match param shape
+            if param_data.shape == loaded_weight.shape:
+                # Shapes match - direct copy
+                param.weight_loader_process(param_data, loaded_weight)
+                return
+
+            # Otherwise, split the fused weight and load each output shard


In MergedColumnParallelLinear.weight_loader, the new loaded_shard_id is None path only does a direct load when param_data.shape == loaded_weight.shape. This bypasses weight_loader_process's built-in reshape logic (it can reshape when numel matches but shapes differ). For scale tensors in particular (e.g., (n,) vs (n, 1)), this can incorrectly fall through into the shard-splitting logic and likely crash. Consider attempting weight_loader_process when loaded_weight.numel() == param_data.numel() (or always using weight_loader_process in the "shapes match" case too) before trying to split by output_sizes.

Signed-off-by: ganyi <ygan@amd.com>

wuhuikx · 2026-03-25T09:26:11Z

can you also add the recipe?

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T09:35:44Z

atom/models/qwen3_next.py

        """
        Derives `query`, `key` and `value` tensors from `mixed_qkvzba`.
        """


The docstring still refers to mixed_qkvzba, but this method now takes mixed_qkvz and mixed_ba separately. Please update the docstring (and any referenced tensor layout) so it matches the current arguments/behavior.

atom/models/qwen3_next.py

enable qwen3next to oot impl

5cf5eb6

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings March 25, 2026 07:16

Copilot started reviewing on behalf of ganyi1996ppo March 25, 2026 07:17 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

fix circular import issue

48c902e

Signed-off-by: ganyi <ygan@amd.com>

valarLip previously approved these changes Mar 25, 2026

View reviewed changes

black

7b1b1dc

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings March 25, 2026 09:27

ganyi1996ppo dismissed valarLip’s stale review via 7b1b1dc March 25, 2026 09:27

Copilot started reviewing on behalf of ganyi1996ppo March 25, 2026 09:29 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OOT]enable qwen3next to oot impl#406

[OOT]enable qwen3next to oot impl#406
ganyi1996ppo wants to merge 3 commits intomainfrom
ganyi/qwen3next_oot

ganyi1996ppo commented Mar 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

wuhuikx commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	Derives `query`, `key` and `value` tensors from `mixed_qkvzba`.
	Derives the `query`, `key`, `value`, `z`, `b`, and `a` tensors from
	the projected inputs `mixed_qkvz` and `mixed_ba`.

Conversation

ganyi1996ppo commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

wuhuikx commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ganyi1996ppo commented Mar 25, 2026 •

edited

Loading