fix: auto-precision for GPU/CPU should default to fp32, not fp16#998
Merged
DingmaomaoBJTU merged 2 commits intoJun 29, 2026
Merged
Conversation
Previously _AUTO_PRECISION mapped 'gpu' and 'cpu' to 'fp16', causing resolve_quant_compile_config to trigger an unintended FP16 model conversion whenever a user ran without --precision on a GPU/CPU machine (including AMD/MIGraphX). This broke eval tests because the model was silently converted. Fix: change the mapping to 'fp32' (no-op) for both gpu and cpu. FP16 conversion now only happens when the user explicitly passes --precision fp16. Fixes AMD eval failure reported against PR #872.
Add three e2e tests in TestConfigFlagVariations to guard against regression of the auto-precision GPU/CPU bug fixed in #998: - test_cpu_auto_precision_no_quant: device=cpu + precision=auto must resolve to fp32 (no quant config), not fp16. - test_gpu_auto_precision_no_quant: device=gpu + precision=auto must resolve to fp32 (no quant config), breaking AMD/MIGraphX fix. - test_explicit_fp16_still_triggers_quant: --precision fp16 (explicit) must still produce an fp16 quant config, ensuring the fix didn't regress intentional FP16 conversion. All 41 e2e config tests pass.
KayMKM
approved these changes
Jun 29, 2026
DingmaomaoBJTU
added a commit
that referenced
this pull request
Jun 30, 2026
## Problem
MIGraphX cannot compile FP16 models and hangs until timeout on AMD
machines. Two tests that explicitly pass `--precision fp16` were
triggering model compilation via MIGraphX, causing eval CI failures.
## Fix
Add `require_not_ep("migraphx")` guard to:
- `test_image_to_text_fp16`
- `test_compare_mode_image_classification`
Note: `test_precision_warning_for_prebuilt_onnx` is NOT guarded — it
passes a pre-built ONNX so `--precision fp16` is ignored and no
compilation occurs.
Companion workaround for the AMD eval failures alongside #998.
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
PR #872 introduced FP16 conversion as a quantization mode, but
_AUTO_PRECISIONmapped"gpu"and"cpu"to"fp16". This caused silent, unintended FP16 conversion on any GPU/CPU machine whenever--precisionwas not explicitly passed (e.g.winml eval,winml buildwith defaults).On AMD machines (MIGraphX EP), this broke eval tests because MIGraphX received an FP16 model it wasn't expecting.
Fix
Changed
_AUTO_PRECISIONinsrc/winml/modelkit/config/precision.py:"gpu": "fp16"→"gpu": "fp32""cpu": "fp16"→"cpu": "fp32"FP16 conversion now only happens when the user explicitly passes
--precision fp16.Testing
test_precision.py,test_build.py,test_build_onnx.pyFixes AMD eval failures introduced by #872.