Multi Resolution Support for Qwen2.5-VL Model by quic-sanising · Pull Request #875 · quic/efficient-transformers

quic-sanising · 2026-03-18T21:39:21Z

Qwen2.5‑VL workloads often see variable image sizes; supporting multiple resolutions improves usability and benchmarking realism. This PR adds multi‑resolution support to the Qwen2.5‑VL specialization flow. The goal is to allow a single run to handle multiple image sizes more robustly while keeping specialization metadata consistent and avoiding shape/buffer mismatches.

What changed?

Qwen2.5‑VL specialization now supports multiple (width, height) pairs without requiring changes in model onnx:

Accept width and height as either int or List[int].
Reuse shared smart_resize utility instead of local implementation (imports: from qwen_vl_utils import smart_resize).
Compute encoder specialization for each resolution by looping over (width, height) pairs.
Compute decoder specialization based on the min_vision_size across all provided resolutions.
Allow overriding image tokenization constraints via mm_processor_kwargs, particularly, min_pixels and max_pixels.
Allow overriding prefill/decode vision_size specialization via vision_size user input.

Testing

Single‑resolution path: pass height=int, width=int and verify behavior matches previous outputs.
Multi‑resolution path: pass height=[...], width=[...] and confirm:
- specializations are generated for each resolution,
- min_vision_size is used consistently,
- no shape mismatch at runtime.
KV‑offload run with at least two distinct pixel_values shapes that are present in vision_session.allowed_shapes.

Checklist

Support both scalar and list inputs for image resolution.
Remove duplicated resizing logic in favor of shared utility.
Select correct buffers for vision session based on input shapes.
Pad vision embeddings to match language session constraints.

Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

…nd max pixels Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

quic-sanising added 5 commits March 18, 2026 14:40

Copy changes from PR quic#755

5dba890

Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

Fix logic to calculate vision tokens

3e6208f

Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

Update example to remove dimensions

58183e9

Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

Import smart_resize from qwen_vl_utils and allow user input for min a…

6dede8c

…nd max pixels Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

Reformat code

c8cf322

Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

quic-sanising force-pushed the qwenvl2_5_multi_spec branch from f2eee65 to c8cf322 Compare March 18, 2026 21:41

quic-sanising marked this pull request as ready for review March 18, 2026 21:49

quic-sanising requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners March 18, 2026 21:49

quic-sanising mentioned this pull request Mar 18, 2026

Onboarding Qwen3VL Dense #780

Draft

Allow user to specify vision_size for decoder specialization

bba6252

Signed-off-by: quic-sanising <sanising@qti.qualcomm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi Resolution Support for Qwen2.5-VL Model#875

Multi Resolution Support for Qwen2.5-VL Model#875
quic-sanising wants to merge 6 commits intoquic:release/v1.21.0from
quic-sanising:qwenvl2_5_multi_spec

quic-sanising commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quic-sanising commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

quic-sanising commented Mar 18, 2026 •

edited

Loading