Skip to content

fix(build): honor loader.model_class on pretrained load path#997

Merged
xieofxie merged 1 commit into
mainfrom
hualxie/fix_clip
Jun 30, 2026
Merged

fix(build): honor loader.model_class on pretrained load path#997
xieofxie merged 1 commit into
mainfrom
hualxie/fix_clip

Conversation

@xieofxie

Copy link
Copy Markdown
Contributor

Problem

winml build -c <build_config.json> -m openai/clip-vit-base-patch32 --device cpu fails with:

Error: You have to specify pixel_values

even though the build config explicitly requests the text-only CLIP variant:

"loader": {
  "task": "feature-extraction",
  "model_class": "CLIPTextModelWithProjection",
  "model_type": "clip_text_model"
}

Root cause

Bisected to #836 (3946a01, "Enable static quantization for Qwen3-0.6B decoder").

The pretrained branch of _load_model (build/hf.py) called load_hf_model without forwarding config.loader.model_class, so class resolution depended entirely on the (model_type, task) lookup in MODEL_CLASS_MAPPING.

#836 began threading config.loader.model_type into load_hf_model as a build-variant override. For CLIP this regresses resolution:

  • The build config stores the variant tag model_type = "clip_text_model".
  • MODEL_CLASS_MAPPING is keyed on the native type: ("clip", "feature-extraction") -> CLIPTextModelWithProjection.
  • The override "clip_text_model" is not a key, so resolution falls through to TasksManagerAutoModel → the full CLIPModel, which requires pixel_values. Export with text-only inputs then fails.

The override scheme works for qwen3_transformer_only because that variant tag is a registered key; CLIP's is not, and the explicit model_class that would have saved it was dropped on the pretrained path (the random-init path already honors it).

Fix

Forward config.loader.model_class to load_hf_model on the pretrained path. resolve_task then takes its Stage-0 user-class path, resolving CLIPTextModelWithProjection directly and ignoring the model_type override. Safe for the qwen3 transformer-only variant since Stage 0 only activates when model_class is set.

Verification

  • Repro command now completes (exit 0), loading CLIPTextModelWithProjection and exporting text-only inputs through optimize + fp16.
  • Added regression test test_pretrained_load_threads_model_class.
  • tests/unit/build/, tests/unit/loader/test_load_hf_model.py, tests/unit/commands/test_build.py all pass (192 + new test).

The pretrained branch of _load_model dropped the explicit
config.loader.model_class when calling load_hf_model, so class
resolution fell back to the (model_type, task) lookup alone.

After #836 began threading config.loader.model_type as a build-variant
override, CLIP feature-extraction broke: the variant tag
'clip_text_model' is not a key in MODEL_CLASS_MAPPING (which is keyed on
the native 'clip'), so resolution fell through to AutoModel and loaded
the full CLIPModel. Export with text-only inputs then failed with
'You have to specify pixel_values'.

Forwarding model_class lets resolve_task take its Stage-0 user-class
path, which resolves CLIPTextModelWithProjection directly and ignores
the model_type override. Safe for the qwen3 transformer-only variant
since Stage 0 only activates when model_class is set.
@xieofxie xieofxie requested a review from a team as a code owner June 29, 2026 08:38
@xieofxie xieofxie merged commit 8dea215 into main Jun 30, 2026
9 checks passed
@xieofxie xieofxie deleted the hualxie/fix_clip branch June 30, 2026 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants