Conversation
| requires-python = ">=3.8,<3.11" | ||
| dependencies = [ | ||
| "transformers==4.55.0", | ||
| "transformers==4.57.0", |
There was a problem hiding this comment.
@quic-rishinr / @quic-hemagnih : can we trigger TA?
There was a problem hiding this comment.
Yes we should raise it, and start the run of all the models with 4.57 in parallel, typically it takes 1week.
| attention_mask, torch.tensor(MIN_MASKED_ATTENTION_VALUE, dtype=torch.float32), attn_weights | ||
| ) | ||
|
|
||
| attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype) |
There was a problem hiding this comment.
can you set this to dtype passed from pretrained()
quic-hemagnih
left a comment
There was a problem hiding this comment.
I am still reviewing the modelling file.
|
|
||
| messages = [messages] * batch_size | ||
|
|
||
| inputs = processor.apply_chat_template( |
There was a problem hiding this comment.
I think we can combine the code from line 62 to 77 and 122 to 140 at one place.
Idea is to avoid the code repetition.
There was a problem hiding this comment.
this we can discuss
| requires-python = ">=3.8,<3.11" | ||
| dependencies = [ | ||
| "transformers==4.55.0", | ||
| "transformers==4.57.0", |
There was a problem hiding this comment.
Yes we should raise it, and start the run of all the models with 4.57 in parallel, typically it takes 1week.
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
There was a problem hiding this comment.
Could you add QEffQwen3VLDecoderWrapper here under SamplerTransform? The on-device sampling is generic, so it can support new VLMs. Thank you.
If not, we can also raise a new patch @quic-sanising
Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
310b78b to
953ab34
Compare
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
5bc0eb7 to
19a163b
Compare
46d18ab to
47dd748
Compare
…gated Serving (quic#776) In this PR, we are addressing the compilation error which is happening when we enable CCL during decoding qpc generation of gpt-oss model in Disaggregated Serving. For example, in the following command: python3 -m qaic_disagg \ --prefill-port 9802 \ --decode-port 9902 \ --port 8002 \ --decode-device-group 16,17,18,19 \ --prefill-device-group 20,21,22,23 \ --model openai/gpt-oss-20b \ --prefill-max-num-seqs 1 \ --decode-max-num-seqs 1 \ --prefill-max-seq-len-to-capture 128 \ --max-model-len 4096 \ --prefill-override-qaic-config "split_retained_state_io:True mxfp6_matmul:True enable_chunking:True" \ --decode-override-qaic-config "mxfp6_matmul:True retain_full_kv:True ccl_enabled=True comp_ctx_lengths_decode=1024,2048,4096" \ -vvv \ --dtype bfloat16 \ --kv-cache-dtype mxint8 \ --kv-handOff-port 5068 \ --tool-call-parser openai \ --enable-auto-tool-choice \ --enable-log-outputs We are activating CCL during decoding however this causes a compilation error "Error message: No input that uniquely identifies specialization". The source of this error is because of new changes in modeling_gpt_oss.py script which were for the support of disaggregated serving in gpt-oss however it causes error with CCL feature. --------- Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com> Co-authored-by: Hem Agnihotri <hemagnih@qti.qualcomm.com>
Granitemoe export issue fixed and added to CI. --------- Signed-off-by: Ann <akuruvil@qti.qualcomm.com> Co-authored-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
…) (quic#779) Needed for passing custom config via vllm. --------- --------- Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com> Co-authored-by: Mamta Singh <mamtsing@qti.qualcomm.com>
This feature adds support for exporting a proxy model, which disables the Embedding Layer and LM Head of the model. Set `enable_proxy = True` to export the proxy model. Set `write_io = True` to save input/output files during the generation stage. Refer to the example script for implementation details. ## Testing 1. Text Models 2. Embedding Models 3. Vision Models 4. Audio Models Note: Check the Example Script for the same. --------- Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Gemma3 NPI File Update. With new file namely gemma_updated_npi.yaml MMMU metric is met. --------- Signed-off-by: Hem Agnihotri <quic_hemagnih@quicinc.com>
Minor updates for better rendering in FT docs --------- Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Automated daily PR dashboard that generates report of all open pull requests and emails it to a configured list of recipients. --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Updated the SMPT server Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Removed git workflow and email test changes as we are moving to Jenkin based approach --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
updating the Qeff python version to 3.12 still keeping support for 3.10 3.11. Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Hem Agnihotri <hemagnih@qti.qualcomm.com>
**Adding disagg support to Qwen3Moe** > Config used PL =128 CL=128*3 <img width="726" height="1077" alt="image" src="https://github.com/user-attachments/assets/7b9afa00-8505-4df5-9a91-68b55e89b416" /> --------- Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Summary - Keep `use_onnx_subfunctions` disabled by default in `QEfficient.cloud.infer` - Provide explicit opt-in via `--use-onnx-subfunctions` only - Remove `--no-use-onnx-subfunctions` - Update infer unit tests for explicit-enable and default-disabled behavior - Update quick-start and text-generation docs to reflect explicit opt-in behavior Why - Align infer behavior with reviewer feedback to keep defaults unchanged and avoid model-specific auto-enable behavior. Fixes - Fixes quic#702 Validation - `python -m py_compile QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5 deselected) --------- Signed-off-by: jd316 <jd316biswas@gmail.com>
Removed following packages from pyproject.toml multidict==6.0.4 urllib3<2 Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Pytest unit tests designed as a preflight before submitting a PR. Runs fully on CPU and focuses on module level testing, transformation correctness, and accuracy comparison between HF, transformed HF, and ORT for representative models. --------- Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Co-authored-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Adding Qwen3VL Support to QEff