Skip to content

Add qwen3 model#1485

Open
pstjohn wants to merge 3 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/qwen3
Open

Add qwen3 model#1485
pstjohn wants to merge 3 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/qwen3

Conversation

@pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Feb 28, 2026

This PR includes changes from #1486, if we merge that the diff here will just be in qwen3.

Includes a temporary fix for NVIDIA/TransformerEngine#2718, which we can remove when that is merged and in the base image.

This adds the Qwen3 model (https://huggingface.co/Qwen/Qwen3-0.6B), specifically the dense variant, although MoE would be fairly easy to add with our Mixtral model recipe.

Key differences of Qwen3 vs. Llama3 --

  • qk_norm layers, using TE's qk_norm_type and qk_norm_before_rope kwargs
  • sliding window attention (SWA), using the window_size kwarg on certain layers:
window_size=(config.sliding_window, config.sliding_window) 
  if config.layer_types[layer_idx] == "sliding_attention" else None

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 28, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 28, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant