Cherry pick v1.24.0 by xin3he · Pull Request #2439 · intel/neural-compressor

xin3he · 2026-03-31T08:01:04Z

No description provided.

#303) * [SW-240730] Support Compressed Tensors quantization method with fp8 weights

* [SW-240400] Fix MoE weights handling in measure

Signed-off-by: xinhe3 <xinhe3@habana.ai>

Change-Id: I442e306714479b92935f9e1ec79e60c9096d1109 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

* Add option to specify output tensor in torch.matmul * Fix unit tests * Fix unit tests v2 --------- Co-authored-by: Linoy Buchnik <linoybu@gmail.com>

* [SW-233758] Support dynamic quantization for Matmul * [SW-233758] Unit tests for Matmul dynamic quantization Signed-off-by: xinhe3 <xinhe3@habana.ai>

…ut (#323)

#327) * [GAUDISW-5809] - Distinguish runtime scale patching from dynamic quantization Signed-off-by: xinhe3 <xinhe3@habana.ai>

…330) * [GAUDISW-228042] Add support for dynamic vLLM kv-cache quantization * [GAUDISW-228042] Add support for dynamic KVCache with V scales on hidden dim * use amax to calc scales on all batch dims * fix static quantization issues Signed-off-by: xinhe3 <xinhe3@habana.ai>

)

* disable autoround tests [GAUDISW-245272] * enable autoround test, and check if the fix works [GAUDISW-245272] Signed-off-by: xinhe3 <xinhe3@habana.ai>

* [GAUDISW-245117] add b2b op Signed-off-by: xinhe3 <xinhe3@habana.ai>

[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim

* Skip test with incorrect scale shapes * Update test/3x/torch/algorithms/fp8_quant/unit_tests/test_save_load.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update test_save_load.py --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: xinhe3 <xinhe3@habana.ai>

…e update (#350)

* [GAUDISW-245950] disable test fp8_aware_gptq * Update test_gptq_mixed_precision.py

Signed-off-by: xinhe3 <xinhe3@habana.ai>

* Added dynamic quant with weight PCS POW2 * Added tests * Rename scale method to MAXABS_PCS_POW2 Signed-off-by: xinhe3 <xinhe3@habana.ai>

…arameter assignments (#362) * Initial plan * [GAUDISW-246550] Remove spaces before equals in scale_method_config parameter assignments Co-authored-by: HolyFalafel <19345135+HolyFalafel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

) * [GAUDISW-246083] - adjust load mode to pcs for new pytorch version * [GAUDISW-246083] - Fix CR comments * [GAUDISW-246083] - fix CI tests failures

* Fix all 96 Coverity CIDs across 40 files CWE-476: Null pointer dereference fixes (null checks, variable initialization) CWE-328: Weak hash algorithm (sha1 -> sha256) CWE-561: Dead code removal (unreachable code, dead assignments) CWE-398: Code quality (bare except, unused imports, resource handling) CWE-532: Information exposure through log files (sanitize logging) CWE-688: Function call with incorrect variable (fix parameter shadowing) * Coverity-related fixes in 27 existing files (no new files). * Replace coverity asserts with logger.error and revert behavior changes - Replace 4 added asserts with logger.error in coco.py, test_pt2e_quant.py, test_pruning.py - Revert use_cuda, recipe_cfgs default, strategy deepcopy, textual_inversion flow, teacher_model guard, pt2e utility var, static_quant timing, self_distillation log, inc_dataset_loader error handling, pruneOFA/glueOFA renames, distillation prints - Keep all legitimate Coverity fixes (null checks, hash upgrades, resource leaks, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Restore min_max variable removed incorrectly by coverity fix The variable is used on lines 147 and 154 - removing it breaks quantization. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Revert fuse_qdq_conv to use new_match_node_name with init protection Initialize new_match_node_name = match_node_name before the if block so the original new_match_node_name[-1] usage is preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Restore config = AutoConfig.from_pretrained() in gpt-j/main.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Replace print with logger.error for stats None check Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix missing else branch in textual_inversion.py verify_loading Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add None guard for recipe_cfgs before .get() call Keeps default as None (original behavior) but prevents AttributeError. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: xinhe3 <xinhe3@habana.ai>

for more information, see https://pre-commit.ci

xin3he · 2026-03-31T09:08:54Z

I think the failure comes from the fork repo. Let's wait for the next release of Habana.

Yantom1 and others added 28 commits March 31, 2026 10:19

[SW-240730] Support Compressed Tensors quantization method with fp8 w… (

13a17ae

#303) * [SW-240730] Support Compressed Tensors quantization method with fp8 weights

[SW-240400] Fix MoE weights handling in measure (#315)

c49b792

* [SW-240400] Fix MoE weights handling in measure

[SW-240869] add support for str padding - same and valid (#311)

d91dcf3

Signed-off-by: xinhe3 <xinhe3@habana.ai>

[PERFC-756] skip xpu qunaitzed func wrapper test (#316)

b537864

Signed-off-by: xinhe3 <xinhe3@habana.ai>

corret dequant func check (#318)

dbc18e7

Change-Id: I442e306714479b92935f9e1ec79e60c9096d1109 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai>

[PERFC-756] fix skipping test (#320)

996f2c5

[SW-240800] Add option to specify output tensor in torch.matmul (#306)

432f933

* Add option to specify output tensor in torch.matmul * Fix unit tests * Fix unit tests v2 --------- Co-authored-by: Linoy Buchnik <linoybu@gmail.com>

[SW-233758] Support dynamic quantization for Matmul (#317)

0dd3853

* [SW-233758] Support dynamic quantization for Matmul * [SW-233758] Unit tests for Matmul dynamic quantization Signed-off-by: xinhe3 <xinhe3@habana.ai>

[SW-233758] Adjust Matmul axis for scale calculation according to inp…

4302144

…ut (#323)

[GAUDISW-5809] - Distinguish runtime scale patching from dynamic quan… (

c56603a

#327) * [GAUDISW-5809] - Distinguish runtime scale patching from dynamic quantization Signed-off-by: xinhe3 <xinhe3@habana.ai>

[GAUDISW-244631] dispatch quantized hidden_states (#337)

8f99ece

[GAUDISW-244192] - Set whether using dynamic quantization from Inc (#326

8a5862a

)

disable autoround tests [GAUDISW-245272] (#342)

e1ea559

* disable autoround tests [GAUDISW-245272] * enable autoround test, and check if the fix works [GAUDISW-245272] Signed-off-by: xinhe3 <xinhe3@habana.ai>

[GAUDISW-245117] add b2b op (#341)

805222f

* [GAUDISW-245117] add b2b op Signed-off-by: xinhe3 <xinhe3@habana.ai>

[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim (#339)

4047e5a

[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim

[GAUDISW-244752] Fix torch.compile graph break in v-cache hidden scal…

46533c0

…e update (#350)

[GAUDISW-245917] Added WA for weights in 3d - Granite 4.0 (#353)

ff5c096

[GAUDISW-245950] disable test fp8_aware_gptq (#352)

b8bb206

* [GAUDISW-245950] disable test fp8_aware_gptq * Update test_gptq_mixed_precision.py

[GAUDISW-245131] Skip test for load/save model checkpoint part 2 (#351)

3e95f25

Signed-off-by: xinhe3 <xinhe3@habana.ai>

[GAUDISW-224538] Calling init_linear in __init__ (#357)

89a8090

Signed-off-by: xinhe3 <xinhe3@habana.ai>

[GAUDISW-246337] Added dynamic quant with weight PCS POW2 (#354)

96d09c1

* Added dynamic quant with weight PCS POW2 * Added tests * Rename scale method to MAXABS_PCS_POW2 Signed-off-by: xinhe3 <xinhe3@habana.ai>

[GAUDISW-246352] assign value of 1 to scales of non-active blocks (#358)

48db385

[GAUDISW-246083] - adjust load mode to pcs for new pytorch version (#363

58ae88e

) * [GAUDISW-246083] - adjust load mode to pcs for new pytorch version * [GAUDISW-246083] - Fix CR comments * [GAUDISW-246083] - fix CI tests failures

[pre-commit.ci] auto fixes from pre-commit.com hooks

fb80aab

for more information, see https://pre-commit.ci

chensuyue changed the base branch from master to v3.7.2rc March 31, 2026 08:49

chensuyue changed the base branch from v3.7.2rc to master March 31, 2026 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry pick v1.24.0#2439

Cherry pick v1.24.0#2439
xin3he wants to merge 28 commits intomasterfrom
cherry_pick_v1.24.0

xin3he commented Mar 31, 2026 •

edited

Loading

Uh oh!

xin3he commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

xin3he commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xin3he commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

xin3he commented Mar 31, 2026 •

edited

Loading

xin3he commented Mar 31, 2026 •

edited

Loading