support gpt-oss mxfp4 directly loading by xin3he · Pull Request #1401 · intel/auto-round

xin3he · 2026-02-04T10:13:41Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Copilot

Pull request overview

This PR adds support for directly loading GPT-OSS models quantized with MXFP4 format by automatically detecting MXFP4 quantization and applying dequantization during model loading.

Changes:

Updated model references in test files from local/unsloth paths to official OpenAI model identifiers
Added MXFP4 quantization detection and automatic dequantization support in model loading utilities

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
test/test_cuda/models/test_moe_model.py	Updated GPT-OSS model reference from local path to OpenAI identifier
test/test_cpu/models/test_moe_model.py	Updated GPT-OSS model reference from unsloth path to OpenAI identifier
auto_round/utils/model.py	Added MXFP4 detection function and integrated dequantization config into model loading

auto_round/utils/model.py

Signed-off-by: He, Xin3 <xin3.he@intel.com>

n1ck-guo · 2026-02-09T07:10:27Z

auto_round/utils/model.py

+        )
+        model_type = getattr(config, "model_type", "")
+        return quant_method == "mxfp4" and model_type in supported_model_types
+    except Exception:


why use try catch here?

Agreed, should change with an efficient way

n1ck-guo · 2026-02-09T07:24:28Z

auto_round/utils/model.py

+    # Check if model is MXFP4 quantized and needs dequantization
+    # Only set quantization_config when explicitly needed, to avoid overriding model's built-in config
+    if _is_mxfp4_model(pretrained_model_name_or_path):
+        try:


In my opinion, I prefer to check version instead of try catch. Using too much try-catch blocks might prevent some bugs from being exposed.

yiliu30 · 2026-02-09T07:45:36Z

test/test_cuda/models/test_moe_model.py

 def setup_gpt_oss():
    """Fixture to set up the GPT-OSS model and tokenizer."""
-    model_name = "/models/gpt-oss-20b-BF16"
+    model_name = "openai/gpt-oss-20b"


This path is currently used to load the BF16 gpt-oss model, so please keep it as is.
You can add a new path specifically for the MXFP4 model.

yiliu30 · 2026-02-09T07:50:14Z

auto_round/utils/model.py

-            trust_remote_code=trust_remote_code,
-            device_map="auto" if use_auto_mapping else None,
-        )
+        model = model_cls.from_pretrained(pretrained_model_name_or_path, **load_kwargs)


We currently don’t have enough test coverage for HPU, so please make any changes carefully. If possible, adding more UTs would be really helpful!

yiliu30 · 2026-02-09T07:51:39Z

auto_round/utils/model.py

+
+    # Check if model is MXFP4 quantized and needs dequantization
+    # Only set quantization_config when explicitly needed, to avoid overriding model's built-in config
+    if _is_mxfp4_model(pretrained_model_name_or_path):


I have a small concern that this check might slow down the Auto‑round initialization.
Could you please double‑check it? Thanks!

Agreed, thanks!

support gpt-oss mxfp4 directly loading

8139928

Copilot AI review requested due to automatic review settings February 4, 2026 10:13

Copilot AI reviewed Feb 4, 2026

View reviewed changes

auto_round/utils/model.py Show resolved Hide resolved

auto_round/utils/model.py Show resolved Hide resolved

update with load_kwargs

65ae337

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he requested review from n1ck-guo and yiliu30 February 9, 2026 06:31

n1ck-guo reviewed Feb 9, 2026

View reviewed changes

yiliu30 reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support gpt-oss mxfp4 directly loading#1401

support gpt-oss mxfp4 directly loading#1401
xin3he wants to merge 2 commits intomainfrom
xinhe/gpt-oss

xin3he commented Feb 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

n1ck-guo Feb 9, 2026

Uh oh!

xin3he Feb 9, 2026

Uh oh!

n1ck-guo Feb 9, 2026

Uh oh!

yiliu30 Feb 9, 2026

Uh oh!

yiliu30 Feb 9, 2026

Uh oh!

yiliu30 Feb 9, 2026

Uh oh!

xin3he Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xin3he commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

n1ck-guo Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

xin3he Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

n1ck-guo Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

yiliu30 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

yiliu30 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

yiliu30 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

xin3he Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xin3he commented Feb 4, 2026 •

edited

Loading