Skip to content

Cherry pick v1.24.0#2439

Open
xin3he wants to merge 28 commits intomasterfrom
cherry_pick_v1.24.0
Open

Cherry pick v1.24.0#2439
xin3he wants to merge 28 commits intomasterfrom
cherry_pick_v1.24.0

Conversation

@xin3he
Copy link
Copy Markdown
Contributor

@xin3he xin3he commented Mar 31, 2026

No description provided.

Yantom1 and others added 28 commits March 31, 2026 10:19
#303)

* [SW-240730] Support Compressed Tensors quantization method with fp8 weights
* [SW-240400] Fix MoE weights handling in measure
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Change-Id: I442e306714479b92935f9e1ec79e60c9096d1109

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
* Add option to specify output tensor in torch.matmul

* Fix unit tests

* Fix unit tests v2

---------

Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
* [SW-233758] Support dynamic quantization for Matmul

* [SW-233758] Unit tests for Matmul dynamic quantization

Signed-off-by: xinhe3 <xinhe3@habana.ai>
#327)

* [GAUDISW-5809] - Distinguish runtime scale patching from dynamic quantization

Signed-off-by: xinhe3 <xinhe3@habana.ai>
…330)

* [GAUDISW-228042] Add support for dynamic vLLM kv-cache quantization

* [GAUDISW-228042] Add support for dynamic KVCache with V scales on hidden dim

* use amax to calc scales on all batch dims

* fix static quantization issues

Signed-off-by: xinhe3 <xinhe3@habana.ai>
* disable autoround tests [GAUDISW-245272]

* enable autoround test, and check if the fix works [GAUDISW-245272]

Signed-off-by: xinhe3 <xinhe3@habana.ai>
* [GAUDISW-245117] add b2b op

Signed-off-by: xinhe3 <xinhe3@habana.ai>
[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim
* Skip test with incorrect scale shapes

* Update test/3x/torch/algorithms/fp8_quant/unit_tests/test_save_load.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update test_save_load.py

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: xinhe3 <xinhe3@habana.ai>
* [GAUDISW-245950] disable test fp8_aware_gptq

* Update test_gptq_mixed_precision.py
Signed-off-by: xinhe3 <xinhe3@habana.ai>
* Added dynamic quant with weight PCS POW2

* Added tests

* Rename scale method to MAXABS_PCS_POW2

Signed-off-by: xinhe3 <xinhe3@habana.ai>
…arameter assignments (#362)

* Initial plan

* [GAUDISW-246550] Remove spaces before equals in scale_method_config parameter assignments

Co-authored-by: HolyFalafel <19345135+HolyFalafel@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
)

* [GAUDISW-246083] - adjust load mode to pcs for new pytorch version

* [GAUDISW-246083] - Fix CR comments

* [GAUDISW-246083] - fix CI tests failures
* Fix all 96 Coverity CIDs across 40 files

CWE-476: Null pointer dereference fixes (null checks, variable initialization)
CWE-328: Weak hash algorithm (sha1 -> sha256)
CWE-561: Dead code removal (unreachable code, dead assignments)
CWE-398: Code quality (bare except, unused imports, resource handling)
CWE-532: Information exposure through log files (sanitize logging)
CWE-688: Function call with incorrect variable (fix parameter shadowing)

* Coverity-related fixes in 27 existing files (no new files).

* Replace coverity asserts with logger.error and revert behavior changes

- Replace 4 added asserts with logger.error in coco.py, test_pt2e_quant.py, test_pruning.py
- Revert use_cuda, recipe_cfgs default, strategy deepcopy, textual_inversion flow,
  teacher_model guard, pt2e utility var, static_quant timing, self_distillation log,
  inc_dataset_loader error handling, pruneOFA/glueOFA renames, distillation prints
- Keep all legitimate Coverity fixes (null checks, hash upgrades, resource leaks, etc.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore min_max variable removed incorrectly by coverity fix

The variable is used on lines 147 and 154 - removing it breaks quantization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert fuse_qdq_conv to use new_match_node_name with init protection

Initialize new_match_node_name = match_node_name before the if block
so the original new_match_node_name[-1] usage is preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore config = AutoConfig.from_pretrained() in gpt-j/main.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace print with logger.error for stats None check

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix missing else branch in textual_inversion.py verify_loading

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add None guard for recipe_cfgs before .get() call

Keeps default as None (original behavior) but prevents AttributeError.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: xinhe3 <xinhe3@habana.ai>
@chensuyue chensuyue changed the base branch from master to v3.7.2rc March 31, 2026 08:49
@chensuyue chensuyue changed the base branch from v3.7.2rc to master March 31, 2026 08:53
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Mar 31, 2026

I think the failure comes from the fork repo. Let's wait for the next release of Habana.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.