Onboarding Qwen3VL Dense by qcdipankar · Pull Request #780 · quic/efficient-transformers

qcdipankar · 2026-02-06T07:32:15Z

Adding Qwen3VL Support to QEff

QEfficient/transformers/models/pytorch_transforms.py

anujgupt-github · 2026-02-10T06:13:20Z

pyproject.toml

 requires-python = ">=3.8,<3.11"
 dependencies = [
-    "transformers==4.55.0",
+    "transformers==4.57.0",


@quic-rishinr / @quic-hemagnih : can we trigger TA?

Yes we should raise it, and start the run of all the models with 4.57 in parallel, typically it takes 1week.

QEfficient/transformers/models/qwen3_vl/modeling_qwen3_vl.py

anujgupt-github · 2026-02-10T06:17:35Z

QEfficient/transformers/models/qwen3_vl/modeling_qwen3_vl.py

+            attention_mask, torch.tensor(MIN_MASKED_ATTENTION_VALUE, dtype=torch.float32), attn_weights
+        )
+
+    attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)


can you set this to dtype passed from pretrained()

QEfficient/transformers/models/modeling_auto.py

quic-hemagnih

I am still reviewing the modelling file.

examples/qwen3_vl/qwen3_vl.py

quic-hemagnih · 2026-02-10T07:38:16Z

examples/qwen3_vl/qwen3_vl.py

+
+    messages = [messages] * batch_size
+
+    inputs = processor.apply_chat_template(


I think we can combine the code from line 62 to 77 and 122 to 140 at one place.

Idea is to avoid the code repetition.

this we can discuss

QEfficient/transformers/models/qwen3_vl/modeling_qwen3_vl.py

quic-hemagnih · 2026-02-10T07:48:51Z

pyproject.toml

 requires-python = ">=3.8,<3.11"
 dependencies = [
-    "transformers==4.55.0",
+    "transformers==4.57.0",


Yes we should raise it, and start the run of all the models with 4.57 in parallel, typically it takes 1week.

QEfficient/transformers/models/pytorch_transforms.py

QEfficient/transformers/models/qwen3_vl/modeling_qwen3_vl.py

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

quic-xiyushi · 2026-02-20T02:01:07Z

QEfficient/transformers/models/pytorch_transforms.py

Could you add QEffQwen3VLDecoderWrapper here under SamplerTransform? The on-device sampling is generic, so it can support new VLMs. Thank you.

If not, we can also raise a new patch @quic-sanising

Yes, please add this here @qcdipankar. Thanks!

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

…gated Serving (quic#776) In this PR, we are addressing the compilation error which is happening when we enable CCL during decoding qpc generation of gpt-oss model in Disaggregated Serving. For example, in the following command: python3 -m qaic_disagg \ --prefill-port 9802 \ --decode-port 9902 \ --port 8002 \ --decode-device-group 16,17,18,19 \ --prefill-device-group 20,21,22,23 \ --model openai/gpt-oss-20b \ --prefill-max-num-seqs 1 \ --decode-max-num-seqs 1 \ --prefill-max-seq-len-to-capture 128 \ --max-model-len 4096 \ --prefill-override-qaic-config "split_retained_state_io:True mxfp6_matmul:True enable_chunking:True" \ --decode-override-qaic-config "mxfp6_matmul:True retain_full_kv:True ccl_enabled=True comp_ctx_lengths_decode=1024,2048,4096" \ -vvv \ --dtype bfloat16 \ --kv-cache-dtype mxint8 \ --kv-handOff-port 5068 \ --tool-call-parser openai \ --enable-auto-tool-choice \ --enable-log-outputs We are activating CCL during decoding however this causes a compilation error "Error message: No input that uniquely identifies specialization". The source of this error is because of new changes in modeling_gpt_oss.py script which were for the support of disaggregated serving in gpt-oss however it causes error with CCL feature. --------- Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com> Co-authored-by: Hem Agnihotri <hemagnih@qti.qualcomm.com>

Granitemoe export issue fixed and added to CI. --------- Signed-off-by: Ann <akuruvil@qti.qualcomm.com> Co-authored-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

…) (quic#779) Needed for passing custom config via vllm. --------- --------- Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com> Co-authored-by: Mamta Singh <mamtsing@qti.qualcomm.com>

This feature adds support for exporting a proxy model, which disables the Embedding Layer and LM Head of the model. Set `enable_proxy = True` to export the proxy model. Set `write_io = True` to save input/output files during the generation stage. Refer to the example script for implementation details. ## Testing 1. Text Models 2. Embedding Models 3. Vision Models 4. Audio Models Note: Check the Example Script for the same. --------- Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

Gemma3 NPI File Update. With new file namely gemma_updated_npi.yaml MMMU metric is met. --------- Signed-off-by: Hem Agnihotri <quic_hemagnih@quicinc.com>

Minor updates for better rendering in FT docs --------- Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Automated daily PR dashboard that generates report of all open pull requests and emails it to a configured list of recipients. --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Updated the SMPT server Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Removed git workflow and email test changes as we are moving to Jenkin based approach --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

updating the Qeff python version to 3.12 still keeping support for 3.10 3.11. Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Hem Agnihotri <hemagnih@qti.qualcomm.com>

**Adding disagg support to Qwen3Moe** > Config used PL =128 CL=128*3 <img width="726" height="1077" alt="image" src="https://github.com/user-attachments/assets/7b9afa00-8505-4df5-9a91-68b55e89b416" /> --------- Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Summary - Keep `use_onnx_subfunctions` disabled by default in `QEfficient.cloud.infer` - Provide explicit opt-in via `--use-onnx-subfunctions` only - Remove `--no-use-onnx-subfunctions` - Update infer unit tests for explicit-enable and default-disabled behavior - Update quick-start and text-generation docs to reflect explicit opt-in behavior Why - Align infer behavior with reviewer feedback to keep defaults unchanged and avoid model-specific auto-enable behavior. Fixes - Fixes quic#702 Validation - `python -m py_compile QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5 deselected) --------- Signed-off-by: jd316 <jd316biswas@gmail.com>

Removed following packages from pyproject.toml multidict==6.0.4 urllib3<2 Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Pytest unit tests designed as a preflight before submitting a PR. Runs fully on CPU and focuses on module level testing, transformation correctness, and accuracy comparison between HF, transformed HF, and ORT for representative models. --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Co-authored-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

qcdipankar requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners February 6, 2026 07:32

qcdipankar assigned qcdipankar and tv-karthikeya Feb 6, 2026

qcdipankar force-pushed the qwen3_vl branch from 747ddee to bd2c354 Compare February 7, 2026 09:04

anujgupt-github requested changes Feb 10, 2026

View reviewed changes

quic-hemagnih requested changes Feb 10, 2026

View reviewed changes

quic-xiyushi reviewed Feb 10, 2026

View reviewed changes

QEfficient/transformers/models/qwen3_vl/modeling_qwen3_vl.py Show resolved Hide resolved

quic-rishinr mentioned this pull request Feb 12, 2026

Onboarding Qwen3VlMoe #590

Merged

qcdipankar added the model-enablement label Feb 13, 2026

qcdipankar and others added 7 commits February 16, 2026 13:12

Onboarding Qwen3VL Dense

fd2f295

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Minor Fix of Output Putting Rotary back to hf rotary

dc61d12

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Fixed ros_embed and added multi vision config

1ccf45b

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Updating the Min Pixel Calculations

240a0b3

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Cleaning Code 3

b65399f

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Removed breakpoints and commented code only.

330444a

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Code Cleaning Done 1

b94ae73

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

qcdipankar force-pushed the qwen3_vl branch from dc95ccf to b94ae73 Compare February 16, 2026 07:42

qcdipankar assigned quic-dhirajku and unassigned tv-karthikeya Feb 16, 2026

qcdipankar added 4 commits February 18, 2026 19:33

Modified the qwen3vl multi config example script

db4d3b3

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Added Continous batch script for qwen3vl

bddbf40

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Added CB support for Qwen3_VL

4727b63

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Code Cleaning Done 1

a398371

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

qcdipankar marked this pull request as draft February 19, 2026 09:16

quic-xiyushi reviewed Feb 20, 2026

View reviewed changes

qcdipankar changed the base branch from main to qwen3_vl_mainline February 24, 2026 11:43

Qwen3Vl

7d68604

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Add fp8 support (quic#802)

953ab34

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

qcdipankar force-pushed the qwen3_vl_mainline branch from 310b78b to 953ab34 Compare February 24, 2026 12:05

Merge branch 'qwen3_vl_mainline' into qwen3_vl

5afc2cf

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

qcdipankar force-pushed the qwen3_vl_mainline branch 3 times, most recently from 5bc0eb7 to 19a163b Compare March 2, 2026 06:37

qcdipankar force-pushed the qwen3_vl_mainline branch 2 times, most recently from 46d18ab to 47dd748 Compare March 11, 2026 03:27

qcdipankar changed the base branch from qwen3_vl_mainline to main March 17, 2026 05:42

qcdipankar changed the base branch from main to qwen3_vl_mainline March 17, 2026 17:20

qcdipankar changed the base branch from qwen3_vl_mainline to main March 17, 2026 17:21

qcdipankar changed the base branch from main to qwen3_vl_mainline March 17, 2026 17:25

vjanfaza and others added 16 commits March 17, 2026 18:09

Fixed Granite_moe and added to CI (quic#771)

037124e

Granitemoe export issue fixed and added to CI. --------- Signed-off-by: Ann <akuruvil@qti.qualcomm.com> Co-authored-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Gemma3 NPI File Update (quic#810)

633cb60

Gemma3 NPI File Update. With new file namely gemma_updated_npi.yaml MMMU metric is met. --------- Signed-off-by: Hem Agnihotri <quic_hemagnih@quicinc.com>

Updated FT docs (quic#822)

d267fd7

Minor updates for better rendering in FT docs --------- Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Daily PR report workflow and email notification system (quic#824)

97b60ff

Automated daily PR dashboard that generates report of all open pull requests and emails it to a configured list of recipients. --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Updated SMPT server (quic#830)

926da4c

Updated the SMPT server Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Removed git workflow and email test changes (quic#836)

1d2707f

Removed git workflow and email test changes as we are moving to Jenkin based approach --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Upgrade python version from 3.10 to 3.12 (quic#782)

29b09e6

updating the Qeff python version to 3.12 still keeping support for 3.10 3.11. Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Hem Agnihotri <hemagnih@qti.qualcomm.com>

Removed urllib and multidict (quic#846)

dd494c9

Removed following packages from pyproject.toml multidict==6.0.4 urllib3<2 Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>

Onboarding Qwen3VL Dense

1fd7a3b

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

Add fp8 support (quic#802)

cbcae2a

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>

qcdipankar changed the base branch from qwen3_vl_mainline to main March 17, 2026 19:02

Merge branch 'main' into qwen3_vl

c6ef663

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>


		messages = [messages] * batch_size

		inputs = processor.apply_chat_template(

Conversation

qcdipankar commented Feb 6, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quic-hemagnih left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants