Skip to content

fix: auto-precision for GPU/CPU should default to fp32, not fp16#998

Merged
DingmaomaoBJTU merged 2 commits into
mainfrom
dingmaomaobjtu-fix-auto-precision-gpu-cpu-default
Jun 29, 2026
Merged

fix: auto-precision for GPU/CPU should default to fp32, not fp16#998
DingmaomaoBJTU merged 2 commits into
mainfrom
dingmaomaobjtu-fix-auto-precision-gpu-cpu-default

Conversation

@DingmaomaoBJTU

Copy link
Copy Markdown
Collaborator

Problem

PR #872 introduced FP16 conversion as a quantization mode, but _AUTO_PRECISION mapped "gpu" and "cpu" to "fp16". This caused silent, unintended FP16 conversion on any GPU/CPU machine whenever --precision was not explicitly passed (e.g. winml eval, winml build with defaults).

On AMD machines (MIGraphX EP), this broke eval tests because MIGraphX received an FP16 model it wasn't expecting.

Fix

Changed _AUTO_PRECISION in src/winml/modelkit/config/precision.py:

  • "gpu": "fp16""gpu": "fp32"
  • "cpu": "fp16""cpu": "fp32"

FP16 conversion now only happens when the user explicitly passes --precision fp16.

Testing

  • Updated affected unit tests in test_precision.py, test_build.py, test_build_onnx.py
  • All 331 unit tests pass

Fixes AMD eval failures introduced by #872.

Previously _AUTO_PRECISION mapped 'gpu' and 'cpu' to 'fp16', causing
resolve_quant_compile_config to trigger an unintended FP16 model
conversion whenever a user ran without --precision on a GPU/CPU machine
(including AMD/MIGraphX). This broke eval tests because the model was
silently converted.

Fix: change the mapping to 'fp32' (no-op) for both gpu and cpu.
FP16 conversion now only happens when the user explicitly passes
--precision fp16.

Fixes AMD eval failure reported against PR #872.
@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner June 29, 2026 09:37
Add three e2e tests in TestConfigFlagVariations to guard against
regression of the auto-precision GPU/CPU bug fixed in #998:

- test_cpu_auto_precision_no_quant: device=cpu + precision=auto
  must resolve to fp32 (no quant config), not fp16.
- test_gpu_auto_precision_no_quant: device=gpu + precision=auto
  must resolve to fp32 (no quant config), breaking AMD/MIGraphX fix.
- test_explicit_fp16_still_triggers_quant: --precision fp16 (explicit)
  must still produce an fp16 quant config, ensuring the fix didn't
  regress intentional FP16 conversion.

All 41 e2e config tests pass.
@DingmaomaoBJTU DingmaomaoBJTU merged commit f2a5464 into main Jun 29, 2026
9 checks passed
@DingmaomaoBJTU DingmaomaoBJTU deleted the dingmaomaobjtu-fix-auto-precision-gpu-cpu-default branch June 29, 2026 09:57
DingmaomaoBJTU added a commit that referenced this pull request Jun 30, 2026
## Problem
MIGraphX cannot compile FP16 models and hangs until timeout on AMD
machines. Two tests that explicitly pass `--precision fp16` were
triggering model compilation via MIGraphX, causing eval CI failures.

## Fix
Add `require_not_ep("migraphx")` guard to:
- `test_image_to_text_fp16`
- `test_compare_mode_image_classification`

Note: `test_precision_warning_for_prebuilt_onnx` is NOT guarded — it
passes a pre-built ONNX so `--precision fp16` is ignored and no
compilation occurs.

Companion workaround for the AMD eval failures alongside #998.

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants