Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
f81ef6e
General disagg fix for prefill-only model (#698)
ochougul Jan 6, 2026
c57392d
Adding Vae Decoder in Wan (#688)
mohiso22 Jan 9, 2026
75367b1
Evaluating the values of CCL lists for different scenarios (#710)
vjanfaza Jan 9, 2026
1e63710
Updating 2-layer instruction for Wan (#715)
tv-karthikeya Jan 12, 2026
1ef9935
Updated finetune docs for MULTI NODE Training (#717)
quic-akuruvil Jan 13, 2026
c76d5ea
Adding support for multi-node DDP training (#708)
smedhe Jan 13, 2026
7a39933
Updating MDP partition config: prioritizing dump over load (#720)
asmigosw Jan 13, 2026
08bce2c
Updated docs (#722)
quic-akuruvil Jan 13, 2026
8b00c1b
HOTFIX: changes in alpaca and grammar dataset utils (#724)
smedhe Jan 13, 2026
b074af0
Fixing the default value of CCL in infer.py (#725)
vjanfaza Jan 15, 2026
5fdde19
Adding support for multi-node PP+DDP (#726)
smedhe Jan 16, 2026
1f2ac51
Added default NPI file (#657)
quic-akuruvil Jan 19, 2026
dcbb7be
Release 1.21 docs (#718)
tv-karthikeya Jan 19, 2026
1ec3975
HOTFIX : Added support for repeat kv heads aligned Bias scaling for A…
quic-dhirajku Jan 20, 2026
e61a1a3
Removed OpenGVLab/InternVL2_5-1B and OpenGVLab/InternVL3_5-1B (#736)
quic-rishinr Jan 20, 2026
47a0fec
Qeff versioning (#741)
quic-rishinr Jan 20, 2026
3a8e5e9
Revert "Qeff versioning" (#746)
quic-rishinr Jan 21, 2026
0ffa4ea
Fix for Qwen 2.5 VL with subfunction (#733)
abhishek-singh591 Jan 21, 2026
32f30c0
Fixed torch patch for subfunction with VLMs (#750)
abhishek-singh591 Jan 22, 2026
eb74758
Added support of subfunction for VLMs (#699)
abhishek-singh591 Jan 23, 2026
742b7bd
Updated reduce sum calculation to use einsum for gpt_oss (#754)
asmigosw Jan 27, 2026
5a129c7
Updating pytest config for InternVL (#758)
tv-karthikeya Jan 28, 2026
b777e8b
Wan support to skip compilation (#734)
tv-karthikeya Jan 28, 2026
75bf976
Fixing SW issue in Gemma3 (#740)
qcdipankar Jan 28, 2026
3751f7e
Fix documentation of Multinode FT (#764)
quic-akuruvil Jan 29, 2026
27ebe8e
Adding support for gemma3 in continous batching script for CI (#763)
qcdipankar Jan 30, 2026
536e3fc
Subfunction Fix (#766)
abhishek-singh591 Feb 1, 2026
f64f703
Mainline version update (#752)
quic-rishinr Feb 2, 2026
1a3e09c
Updated compile from qaic-exec to qaic-compile (#703)
asmigosw Feb 3, 2026
e8e5c43
Fix for Diffusers subfunction (#759)
tv-karthikeya Feb 9, 2026
fc42332
Added One hot fix for MOE model with subfunction (#777)
abhishek-singh591 Feb 12, 2026
544327a
Adding support of QEFFAutoModelForSequenceClassification (#729)
quic-amitraj Feb 13, 2026
facae5f
CI test optimization (#751)
quic-rishinr Feb 13, 2026
4bd2239
Adding the support of dense models distilled from moe models with the…
vjanfaza Feb 20, 2026
a8a008d
Fix for CB incosistency for qwen2_5_vl (#765)
asmigosw Feb 24, 2026
c74b0bd
Fixing the issue of CCL support during the decoding phase of Disaggre…
vjanfaza Feb 25, 2026
a6f2dd4
Fixed Granite_moe and added to CI (#771)
quic-akuruvil Feb 26, 2026
69c83c2
removed duplication of `mdp_json_path` in compilation command (#706) …
ochougul Feb 27, 2026
471de6f
[Proxy]: Adding support for exporting proxy Model (#620)
abukhoy Mar 2, 2026
9bcab61
Gemma3 NPI File Update (#810)
quic-hemagnih Mar 3, 2026
33c8ff7
Updated FT docs (#822)
quic-akuruvil Mar 4, 2026
94f233e
Daily PR report workflow and email notification system (#824)
quic-rishinr Mar 5, 2026
ab920b2
Updated SMPT server (#830)
quic-rishinr Mar 5, 2026
300b252
Removed git workflow and email test changes (#836)
quic-rishinr Mar 9, 2026
85b0cf0
Upgrade python version from 3.10 to 3.12 (#782)
quic-rishinr Mar 9, 2026
3d0d663
Adding dissagg mode support to Qwen3Moe (#682)
qcdipankar Mar 10, 2026
815309e
fix(cloud.infer): reduce Qwen3-MoE export OOM risk (#821)
jd316 Mar 11, 2026
652351b
Removed urllib and multidict (#846)
quic-rishinr Mar 13, 2026
2f9675c
CPU pytest unit test suite (#852)
quic-rishinr Mar 17, 2026
575571f
[QEff. Finetune]: Added logger and its test cases. (#644)
quic-meetkuma Nov 28, 2025
20e5b13
[QEff. Finetune]: Added component registry and factory functionality.…
quic-meetkuma Nov 28, 2025
36044be
[QEff. Finetune]: Adding optimizer registry and its test cases (#649)
tchawada Dec 5, 2025
f736d93
[QEff. Finetune]: Added Base dataset class and SFT dataset classes al…
quic-dhirajku Dec 5, 2025
a85b687
[QEff. Finetune] Adding callback and its test cases. (#652)
tchawada Dec 8, 2025
7dcb29b
"[QEff.finetuning] Adding config_manager and its test cases." (#656)
tchawada Dec 15, 2025
86df5aa
Revert " "[QEff.finetuning] Adding config_manager and its test cases.…
quic-akuruvil Dec 15, 2025
e50ac64
"[QEff.finetuning} Rebasing: hf_config_mananger." (#667)
tchawada Dec 15, 2025
b9ce749
[QEff. Finetune]: Adding base class and HF class (#658)
quic-swatia Dec 25, 2025
f87c0a7
Added Trainer classes and tests for FT (#697)
quic-dhirajku Jan 2, 2026
400f911
[QEff.finetuning] Adding sample config and ReadMe file (#692)
tchawada Feb 5, 2026
263f152
['QEff.finetuning'] Changing some params from training config to mode…
tchawada Feb 5, 2026
529dc2c
[QEff. Finetuning] Adding text field and some other changes in datase…
quic-swatia Feb 9, 2026
b56770b
[QEff. Finetuning]: Adding FinetuningPipeline (finetune_experiemental…
quic-swatia Feb 15, 2026
72e93b5
Ft experimental rebasing with main (#793)
quic-akuruvil Feb 16, 2026
a34da25
Aligning with main (#794)
quic-akuruvil Feb 17, 2026
5b2db2c
[QEff. Finetuning]: Adding PP support in HF trainer stack (#813)
quic-swatia Feb 27, 2026
5f2d4b2
[QEff.finetuning] Hf config update (#795)
tchawada Mar 4, 2026
6dbbbfe
Restructure and added info in docs
Mar 5, 2026
5062d96
Cleanup
Mar 5, 2026
dfe8a9f
Cleanup
Mar 5, 2026
59d785a
[QEff.finetune]Test finetune (#826)
tchawada Mar 6, 2026
6002e0a
Docs Updated (#833)
quic-akuruvil Mar 8, 2026
2c51672
[QEff. Finetuning]: adding example scripts to demonstrate custom data…
smedhe Mar 9, 2026
92882be
Revert "[QEff. finetuning]: Rebasing ft_experimental into main" (#840)
quic-akuruvil Mar 10, 2026
429b39b
[QEff. Finetuning]: Fixed Data Parallel issue (#845)
quic-swatia Mar 11, 2026
56cece4
[QEff.finetune] FT logger (#851)
tchawada Mar 16, 2026
65e033f
Updated terminal logs (#862)
quic-akuruvil Mar 17, 2026
aa3203d
Merge branch 'ft_experimental' into rebase_main
smedhe Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 20 additions & 6 deletions QEfficient/cloud/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ def main(
qnn_config: Optional[str] = None,
trust_remote_code: Optional[bool] = False,
ccl_enabled: Optional[bool] = False,
use_onnx_subfunctions: bool = False,
**kwargs,
) -> None:
"""
Expand Down Expand Up @@ -205,6 +206,8 @@ def main(
Path of the QNN Config parameters file. Default is None.
trust_remote_code : bool, optional
If True, trusts remote code when loading models from HuggingFace. Default is False.
use_onnx_subfunctions : bool, optional
Enables ONNX subfunctions during export and compile. Default is False.
**kwargs :
Additional compiler options passed directly to `qaic-compile`. Any flag supported by
`qaic-compile` can be passed. Parameters are converted to flags as follows:
Expand All @@ -231,12 +234,14 @@ def main(
"""
cache_dir = check_and_assign_cache_dir(local_model_dir, cache_dir)

if "--mxfp6" in sys.argv:
if args.mxfp6:
logger.warning("mxfp6 is going to be deprecated in a future release, use -mxfp6_matmul instead.")
if "--mxint8" in sys.argv:
if args.mxint8:
logger.warning("mxint8 is going to be deprecated in a future release, use -mxint8_kv_cache instead.")
if "--mxfp6" in sys.argv and mxfp6:
logger.warning("mxfp6 is going to be deprecated in a future release, use -mxfp6_matmul instead.")
if "--mxint8" in sys.argv and mxint8:
logger.warning("mxint8 is going to be deprecated in a future release, use -mxint8_kv_cache instead.")

qaic_config = {"ccl_enabled": True} if ccl_enabled else None

qaic_config = {"ccl_enabled": True} if ccl_enabled else None

qaic_config = {"ccl_enabled": True} if ccl_enabled else None

Expand Down Expand Up @@ -280,6 +285,7 @@ def main(
allow_mxint8_mdp_io=allow_mxint8_mdp_io,
enable_qnn=enable_qnn,
qnn_config=qnn_config,
use_onnx_subfunctions=use_onnx_subfunctions,
**kwargs,
)

Expand Down Expand Up @@ -382,6 +388,14 @@ def main(
action="store_true",
help="Compress Present/Past KV to MXINT8 using CustomIO config, default is False",
)
parser.add_argument(
"--use-onnx-subfunctions",
"--use_onnx_subfunctions",
dest="use_onnx_subfunctions",
action="store_true",
default=False,
help="Enable ONNX subfunctions during export/compile.",
)
parser.add_argument(
"--num_cores", "--num-cores", type=int, required=True, help="Number of cores to compile on Cloud AI 100"
)
Expand Down
13 changes: 13 additions & 0 deletions QEfficient/proxy/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# -----------------------------------------------------------------------------
#
# Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
# SPDX-License-Identifier: BSD-3-Clause
#
# ----------------------------------------------------------------------------

from QEfficient.proxy.proxy_transform import QeffProxyEmbedding, QeffProxyLinear

__all__ = [
"QeffProxyEmbedding",
"QeffProxyLinear",
]
27 changes: 27 additions & 0 deletions QEfficient/proxy/proxy_transform.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# -----------------------------------------------------------------------------
#
# Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
# SPDX-License-Identifier: BSD-3-Clause
#
# ----------------------------------------------------------------------------
import torch
from torch import nn


class QeffProxyEmbedding(nn.Module):
def __init__(self, num_embeddings, embedding_dim):
self.embed_tokens = None
self.num_embeddings = num_embeddings
self.embedding_dim = embedding_dim

def forward(self, hidden_states, past_key_values_length=None):
inputs_embeds = torch.unsqueeze(hidden_states.float(), 2).expand(-1, -1, self.embedding_dim)
return inputs_embeds


class QeffProxyLinear(nn.Module):
def __init__(self, in_features, out_features, bias=False):
self.lm_head = None

def forward(self, hidden_states):
return hidden_states
22 changes: 22 additions & 0 deletions QEfficient/proxy/pytorch_transform.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# -----------------------------------------------------------------------------
#
# Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
# SPDX-License-Identifier: BSD-3-Clause
#
# ----------------------------------------------------------------------------

import torch.nn as nn

from QEfficient.base.pytorch_transforms import ProxyModuleMappingTransform
from QEfficient.proxy import QeffProxyEmbedding, QeffProxyLinear


class QeffProxyModuleTransform(ProxyModuleMappingTransform):
"""
This transform is used to replace the original modules with QEfficient modules.
"""

_module_mapping = {
nn.Embedding: QeffProxyEmbedding,
nn.Linear: QeffProxyLinear,
}
Loading
Loading