Conversation
Signed-off-by: ganyi <ygan@amd.com>
There was a problem hiding this comment.
Pull request overview
This PR enables Qwen3Next support in the ATOM OOT (vLLM plugin) integration by registering the architecture and adding vLLM/hybrid-specific glue for Qwen3Next’s gated-delta-net (Mamba-style) components.
Changes:
- Register
Qwen3NextForCausalLMfor vLLM plugin mode and map it to the appropriate ATOM implementation/wrapper. - Extend
qwen3_nextwith vLLM-specific hybrid/Mamba state helpers and adjust QKVZ/BA projection handling based on dtype/quantization. - Update
MergedColumnParallelLinear.weight_loaderto acceptloaded_shard_id=Noneto support loading fused tensors directly from checkpoints.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
atom/plugin/vllm/register.py |
Registers Qwen3Next architecture override to a vLLM-capable wrapper class. |
atom/plugin/vllm/model_wrapper.py |
Adds Qwen3Next to the ATOM model class lookup for vLLM plugin mode. |
atom/models/qwen3_next.py |
Adds vLLM hybrid/Mamba integration and projection-path changes for Qwen3Next. |
atom/model_ops/linear.py |
Enhances merged-column weight loading to support fused-on-disk weights without shard IDs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mixed_ba: torch.Tensor, | ||
| ): | ||
| """ | ||
| Derives `query`, `key` and `value` tensors from `mixed_qkvzba`. |
There was a problem hiding this comment.
The docstring for fix_query_key_value_ordering still refers to a single mixed_qkvzba tensor, but the function now takes mixed_qkvz and mixed_ba separately. Updating the docstring would avoid confusion for future maintenance/debugging.
| Derives `query`, `key` and `value` tensors from `mixed_qkvzba`. | |
| Derives the `query`, `key`, `value`, `z`, `b`, and `a` tensors from | |
| the projected inputs `mixed_qkvz` and `mixed_ba`. |
| if loaded_shard_id is None: | ||
| # Loaded weight is already fused on disk | ||
| # Split it and load each shard individually. | ||
| param_data = param.data | ||
| # Check if this is weight or weight_scale | ||
| is_scale_param = param is getattr( | ||
| self, "weight_scale", None | ||
| ) or param is getattr(self, "input_scale", None) | ||
|
|
||
| # For fused weight, need to match param shape | ||
| if param_data.shape == loaded_weight.shape: | ||
| # Shapes match - direct copy | ||
| param.weight_loader_process(param_data, loaded_weight) | ||
| return | ||
|
|
||
| # Otherwise, split the fused weight and load each output shard |
There was a problem hiding this comment.
In MergedColumnParallelLinear.weight_loader, the new loaded_shard_id is None path only does a direct load when param_data.shape == loaded_weight.shape. This bypasses weight_loader_process's built-in reshape logic (it can reshape when numel matches but shapes differ). For scale tensors in particular (e.g., (n,) vs (n, 1)), this can incorrectly fall through into the shard-splitting logic and likely crash. Consider attempting weight_loader_process when loaded_weight.numel() == param_data.numel() (or always using weight_loader_process in the "shapes match" case too) before trying to split by output_sizes.
Signed-off-by: ganyi <ygan@amd.com>
|
can you also add the recipe? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| Derives `query`, `key` and `value` tensors from `mixed_qkvzba`. | ||
| """ |
There was a problem hiding this comment.
The docstring still refers to mixed_qkvzba, but this method now takes mixed_qkvz and mixed_ba separately. Please update the docstring (and any referenced tensor layout) so it matches the current arguments/behavior.
Motivation
Technical Details
Test Plan
Test Result
qwen3next 80B fp8
qwen3next 80B bf16
Submission Checklist