Add qwen3 model by pstjohn · Pull Request #1485 · NVIDIA/bionemo-framework

pstjohn · 2026-02-28T00:39:02Z

This PR includes changes from #1486, if we merge that the diff here will just be in qwen3.

Includes a temporary fix for NVIDIA/TransformerEngine#2718, which we can remove when that is merged and in the base image.

This adds the Qwen3 model (https://huggingface.co/Qwen/Qwen3-0.6B), specifically the dense variant, although MoE would be fairly easy to add with our Mixtral model recipe.

Key differences of Qwen3 vs. Llama3 --

qk_norm layers, using TE's qk_norm_type and qk_norm_before_rope kwargs
sliding window attention (SWA), using the window_size kwarg on certain layers:

window_size=(config.sliding_window, config.sliding_window) 
  if config.layer_types[layer_idx] == "sliding_attention" else None

copy-pr-bot · 2026-02-28T00:39:05Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-28T00:39:13Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

pstjohn force-pushed the pstjohn/qwen3 branch from 9c7d191 to 70d72f8 Compare February 28, 2026 15:41

pstjohn marked this pull request as ready for review February 28, 2026 15:41

pstjohn requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, savitha-eng and trvachov as code owners February 28, 2026 15:41

pstjohn force-pushed the pstjohn/qwen3 branch from 70d72f8 to eeadf01 Compare February 28, 2026 15:49

pstjohn added 2 commits February 28, 2026 07:54

refactor autoregressive model tests

ef39689

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

add qwen3 model

ebe0f0c

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

pstjohn force-pushed the pstjohn/qwen3 branch from eeadf01 to ebe0f0c Compare February 28, 2026 15:55

add qwen2.5

323b8a1

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add qwen3 model#1485

Add qwen3 model#1485
pstjohn wants to merge 3 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/qwen3

pstjohn commented Feb 28, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pstjohn commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pstjohn commented Feb 28, 2026 •

edited

Loading