Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for directly loading GPT-OSS models quantized with MXFP4 format by automatically detecting MXFP4 quantization and applying dequantization during model loading.
Changes:
- Updated model references in test files from local/unsloth paths to official OpenAI model identifiers
- Added MXFP4 quantization detection and automatic dequantization support in model loading utilities
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/test_cuda/models/test_moe_model.py | Updated GPT-OSS model reference from local path to OpenAI identifier |
| test/test_cpu/models/test_moe_model.py | Updated GPT-OSS model reference from unsloth path to OpenAI identifier |
| auto_round/utils/model.py | Added MXFP4 detection function and integrated dequantization config into model loading |
Signed-off-by: He, Xin3 <xin3.he@intel.com>
| ) | ||
| model_type = getattr(config, "model_type", "") | ||
| return quant_method == "mxfp4" and model_type in supported_model_types | ||
| except Exception: |
There was a problem hiding this comment.
Agreed, should change with an efficient way
| # Check if model is MXFP4 quantized and needs dequantization | ||
| # Only set quantization_config when explicitly needed, to avoid overriding model's built-in config | ||
| if _is_mxfp4_model(pretrained_model_name_or_path): | ||
| try: |
There was a problem hiding this comment.
In my opinion, I prefer to check version instead of try catch. Using too much try-catch blocks might prevent some bugs from being exposed.
| def setup_gpt_oss(): | ||
| """Fixture to set up the GPT-OSS model and tokenizer.""" | ||
| model_name = "/models/gpt-oss-20b-BF16" | ||
| model_name = "openai/gpt-oss-20b" |
There was a problem hiding this comment.
This path is currently used to load the BF16 gpt-oss model, so please keep it as is.
You can add a new path specifically for the MXFP4 model.
| trust_remote_code=trust_remote_code, | ||
| device_map="auto" if use_auto_mapping else None, | ||
| ) | ||
| model = model_cls.from_pretrained(pretrained_model_name_or_path, **load_kwargs) |
There was a problem hiding this comment.
We currently don’t have enough test coverage for HPU, so please make any changes carefully. If possible, adding more UTs would be really helpful!
|
|
||
| # Check if model is MXFP4 quantized and needs dequantization | ||
| # Only set quantization_config when explicitly needed, to avoid overriding model's built-in config | ||
| if _is_mxfp4_model(pretrained_model_name_or_path): |
There was a problem hiding this comment.
I have a small concern that this check might slow down the Auto‑round initialization.
Could you please double‑check it? Thanks!
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting