fix(build): honor loader.model_class on pretrained load path by xieofxie · Pull Request #997 · microsoft/winml-cli

xieofxie · 2026-06-29T08:38:57Z

Problem

winml build -c <build_config.json> -m openai/clip-vit-base-patch32 --device cpu fails with:

Error: You have to specify pixel_values

even though the build config explicitly requests the text-only CLIP variant:

"loader": {
  "task": "feature-extraction",
  "model_class": "CLIPTextModelWithProjection",
  "model_type": "clip_text_model"
}

Root cause

Bisected to #836 (3946a01, "Enable static quantization for Qwen3-0.6B decoder").

The pretrained branch of _load_model (build/hf.py) called load_hf_model without forwarding config.loader.model_class, so class resolution depended entirely on the (model_type, task) lookup in MODEL_CLASS_MAPPING.

#836 began threading config.loader.model_type into load_hf_model as a build-variant override. For CLIP this regresses resolution:

The build config stores the variant tag model_type = "clip_text_model".
MODEL_CLASS_MAPPING is keyed on the native type: ("clip", "feature-extraction") -> CLIPTextModelWithProjection.
The override "clip_text_model" is not a key, so resolution falls through to TasksManager → AutoModel → the full CLIPModel, which requires pixel_values. Export with text-only inputs then fails.

The override scheme works for qwen3_transformer_only because that variant tag is a registered key; CLIP's is not, and the explicit model_class that would have saved it was dropped on the pretrained path (the random-init path already honors it).

Fix

Forward config.loader.model_class to load_hf_model on the pretrained path. resolve_task then takes its Stage-0 user-class path, resolving CLIPTextModelWithProjection directly and ignoring the model_type override. Safe for the qwen3 transformer-only variant since Stage 0 only activates when model_class is set.

Verification

Repro command now completes (exit 0), loading CLIPTextModelWithProjection and exporting text-only inputs through optimize + fp16.
Added regression test test_pretrained_load_threads_model_class.
tests/unit/build/, tests/unit/loader/test_load_hf_model.py, tests/unit/commands/test_build.py all pass (192 + new test).

The pretrained branch of _load_model dropped the explicit config.loader.model_class when calling load_hf_model, so class resolution fell back to the (model_type, task) lookup alone. After #836 began threading config.loader.model_type as a build-variant override, CLIP feature-extraction broke: the variant tag 'clip_text_model' is not a key in MODEL_CLASS_MAPPING (which is keyed on the native 'clip'), so resolution fell through to AutoModel and loaded the full CLIPModel. Export with text-only inputs then failed with 'You have to specify pixel_values'. Forwarding model_class lets resolve_task take its Stage-0 user-class path, which resolves CLIPTextModelWithProjection directly and ignores the model_type override. Safe for the qwen3 transformer-only variant since Stage 0 only activates when model_class is set.

xieofxie requested a review from a team as a code owner June 29, 2026 08:38

KayMKM approved these changes Jun 30, 2026

View reviewed changes

DingmaomaoBJTU approved these changes Jun 30, 2026

View reviewed changes

xieofxie merged commit 8dea215 into main Jun 30, 2026
9 checks passed

xieofxie deleted the hualxie/fix_clip branch June 30, 2026 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(build): honor loader.model_class on pretrained load path#997

fix(build): honor loader.model_class on pretrained load path#997
xieofxie merged 1 commit into
mainfrom
hualxie/fix_clip

xieofxie commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

xieofxie commented Jun 29, 2026

Problem

Root cause

Fix

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants