Support transposed input for data-aware Weights Compression by rk119 · Pull Request #3296 · openvinotoolkit/nncf

rk119 · 2025-02-19T12:56:19Z

Changes

Added support for transpose of activations in matmul operation in Weights Compression.
While collecting statistics, reduction axes is set according to whether the input is transposed or not. If the input is transposed, the second last size dimension is the hidden dimension else the last dimension is hidden.
Pass the input transpose value in matmul operation in insert_adapters for OV backend to match the inner dimensions.
Implemented a common backend function get_activation_channel_axis for input transpose_a=True support.

Closes Issue

#3230

alexsu52 · 2025-05-01T07:09:15Z

@rk119, do you have any update?

rk119 · 2025-05-04T21:35:58Z

@rk119, do you have any update?

Hi @alexsu52,

Apologies for my late response, I was traveling.
I have resolved the merge conflicts. It was a bit messy due to my lack of experience using the GitHub UI for making changes, as my laptop was facing issues and I was unable to resolve them locally. My implementation so far works.

I noticed that weights compression for onnx support is actively being implemented.
So far, these are my changes related to handling transposed activations for ov mainly.
I could add a test in tests/onnx/quantization/test_weights_compression.py to verify transposed weights and activations using a model with a gemm node and this test could be updated later to review transposed activations.

@ljaljushkin, please clarify if it is required for me to add this in the current PR.

At the moment, my changes won't make a noticeable difference in onnx, since data-aware weights compression and algorithms like lora_correction, gptq, scale_estimation, and awq are not yet supported for onnx for me to fully test and reflect on my changes.
Since, data-free weights compression has been implemented so far, the transposed weights should just affect the reduction_axes that are obtained.

alexsu52 · 2025-05-05T06:50:45Z

@rk119, do you have any update?

Hi @alexsu52,

Apologies for my late response, I was traveling. I have resolved the merge conflicts. It was a bit messy due to my lack of experience using the GitHub UI for making changes, as my laptop was facing issues and I was unable to resolve them locally. My implementation so far works.

I noticed that weights compression for onnx support is actively being implemented. So far, these are my changes related to handling transposed activations for ov mainly. I could add a test in tests/onnx/quantization/test_weights_compression.py to verify transposed weights and activations using a model with a gemm node and this test could be updated later to review transposed activations.

@ljaljushkin, please clarify if it is required for me to add this in the current PR.

At the moment, my changes won't make a noticeable difference in onnx, since data-aware weights compression and algorithms like lora_correction, gptq, scale_estimation, and awq are not yet supported for onnx for me to fully test and reflect on my changes. Since, data-free weights compression has been implemented so far, the transposed weights should just affect the reduction_axes that are obtained.

Thanks for the contribution! @andrey-churkin, please cover review for onnx backend.

…on.py Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

andreyanufr · 2025-05-12T16:29:53Z

+def test_compression_with_transpose(transpose_a, transpose_b, raises_error, kwargs):
    dataset_size = 4
-    model = LMLinearModel(transpose_a=True, transpose_b=False).ov_model
+    model = LMLinearModel(transpose_a=transpose_a, transpose_b=transpose_b).ov_model


@rk119 could you please add AWQModel to this test case.

Sure, just to confirm, should a model like the following be added in tests/openvino/native/quantization/test_weights_compression.py:

class AWQModel(OVReferenceModel): OUTPUT_DIM = 32 HIDDEN_DIM = 16 INPUT_SHAPE = [1, 24, HIDDEN_DIM] # [B, SeqLen, HiddenDim] def _create_ov_model(self, transpose_b: bool = True, transpose_a: bool = False, input_shape: Optional[list[int]] = None, is_int8=False): self._input_shape = self.INPUT_SHAPE if input_shape is None else input_shape hdim_axis = -2 if transpose_a else -1 self._hidden_dim = self._input_shape[hdim_axis] input_1 = opset.parameter(self._input_shape, name="Input") weight_shape = self.get_weight_shape(transpose_b) data = self._rng.random(weight_shape).astype(np.float32) weights = AWQMatmulModel.get_weights(data, is_int8=is_int8, name="weights_1") matmul = opset.matmul(input_1, weights, transpose_a=transpose_a, transpose_b=transpose_b, name="MatMul") result = opset.result(matmul, name="Result") result.get_output_tensor(0).set_names(set(["Result"])) model = ov.Model([result], [input_1]) return model @property def hidden_dim(self): return self._hidden_dim def get_weight_shape(self, transpose_b: bool = True): return [self.OUTPUT_DIM, self.hidden_dim] if transpose_b else [self.hidden_dim, self.OUTPUT_DIM]

and the test to be updated like this:

@pytest.mark.parametrize( "model", [ (LMLinearModel), (AWQModel), ], ids=["lm_linear", "awq_model"], ) @pytest.mark.parametrize( ("transpose_a", "transpose_b", "raises_error"), [ (False, True, False), (True, True, False), (False, False, True), (True, False, True), ], ids=["tb_nota", "ta_tb", "nota_notb", "ta_notb"], ) @pytest.mark.parametrize( "kwargs", [ dict(scale_estimation=True), dict(lora_correction=True), dict( gptq=True, awq=True, scale_estimation=True, advanced_parameters=CompressionParams(gptq_params=GPTQParams(subset_size=2)), ), ], ids=["se", "lora", "gptq_se_awq"], ) def test_compression_with_transpose(model, transpose_a, transpose_b, raises_error, kwargs): dataset_size = 4 model = model(transpose_a=transpose_a, transpose_b=transpose_b).ov_model input_data = [np.ones(inp.shape) for inp in model.inputs] * dataset_size dataset = Dataset(input_data) with ( pytest.raises(nncf.UnsupportedModelError) if raises_error and not kwargs.get("lora_correction", False) else nullcontext() ): compress_weights( model, mode=CompressWeightsMode.INT4_SYM, ratio=1.0, group_size=8, subset_size=2, dataset=dataset, all_layers=True, **kwargs, )

ljaljushkin · 2026-02-17T08:40:43Z

@daniil-lyakhov should it be closed since you're working on many related PRs?
if yes, please close

daniil-lyakhov · 2026-02-17T09:39:33Z

WC/AWQ/Mixed Precision transpose_a support: #3794
Scale Estimation transpose_a support: #3839
Lora correction transpose support: #3230

rk119 and others added 2 commits February 17, 2025 20:35

Lora correction input transpose support

ada9c0a

Merge branch 'openvinotoolkit:develop' into support_transposed_input

90381d7

github-actions Bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Feb 19, 2025

rk119 mentioned this pull request Feb 19, 2025

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

Open

rk119 added 8 commits February 24, 2025 22:39

OV backend act transpose support

a18ba70

Merge branch 'develop' into support_transposed_input

64f769e

pre-commit fix

d69ff3f

Merge branch 'develop' into support_transposed_input

c78b033

Brute force solution

d493314

Minor fix

202815b

pre-commit fix

1dc14ba

attempt fix

50eddb4

alexsu52 requested review from andreyanufr and ljaljushkin February 28, 2025 09:33

alexsu52 assigned ljaljushkin Feb 28, 2025

rk119 added 2 commits March 2, 2025 17:14

Add doc string

9dccf21

Merge branch 'develop' into support_transposed_input

f5d2a1f

rk119 marked this pull request as ready for review March 2, 2025 14:01

rk119 requested a review from a team as a code owner March 2, 2025 14:01

ljaljushkin suggested changes Mar 4, 2025

View reviewed changes

Comment thread tests/openvino/native/quantization/test_weights_compression.py

ljaljushkin suggested changes Mar 4, 2025

View reviewed changes

Comment thread nncf/quantization/algorithms/weight_compression/algorithm.py Outdated

Comment thread nncf/quantization/algorithms/weight_compression/openvino_backend.py Outdated

Merge branch 'develop' into support_transposed_input

3fde78a

ljaljushkin reviewed Mar 4, 2025

View reviewed changes

Comment thread nncf/quantization/algorithms/weight_compression/scale_estimation.py Outdated

Implement get_activation_channel_axis

aa062dc

github-actions Bot added the NNCF PT Pull requests that updates NNCF PyTorch label Mar 5, 2025

rk119 added 4 commits March 5, 2025 11:55

fix test

051a183

Fix error

7f7f468

Merge branch 'develop' into support_transposed_input

b82b0c4

Fix OV NNCF Graph Builder add edges

e19804c

rk119 added 6 commits May 4, 2025 10:02

Merge branch 'develop' into support_transposed_input

cb8c0d3

Update torch_backend.py

f854405

Update backend.py

78b2993

Update openvino_backend.py

670a5b4

Update torch_fx_backend.py

ab9c054

update

19febb6

github-actions Bot added the NNCF ONNX Pull requests that updates NNCF ONNX label May 4, 2025

pre-commit fix

d808f70

alexsu52 requested a review from andrey-churkin May 5, 2025 06:49

ljaljushkin suggested changes May 5, 2025

View reviewed changes

Comment thread nncf/quantization/algorithms/weight_compression/scale_estimation.py Outdated

Comment thread nncf/quantization/algorithms/weight_compression/gptq.py Outdated

rk119 and others added 2 commits May 5, 2025 20:30

Update nncf/quantization/algorithms/weight_compression/scale_estimati…

8897321

…on.py Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Update nncf/quantization/algorithms/weight_compression/gptq.py

31a0c40

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

ljaljushkin reviewed May 5, 2025

View reviewed changes

Comment thread nncf/quantization/algorithms/weight_compression/algorithm.py Outdated

rk119 and others added 2 commits May 5, 2025 20:45

Update nncf/quantization/algorithms/weight_compression/algorithm.py

a495284

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

pre-commit fix

90cb1c7

ljaljushkin approved these changes May 6, 2025

View reviewed changes

andrey-churkin approved these changes May 12, 2025

View reviewed changes

andreyanufr reviewed May 12, 2025

View reviewed changes

ljaljushkin mentioned this pull request May 14, 2025

[Good First Issue][NNCF]: Support not transposed weight for data-aware weight compression methods #3494

Open

daniil-lyakhov mentioned this pull request Sep 25, 2025

[Good First Issue][NNCF]: Move is_weight_compression_needed function to common #3668

Closed

daniil-lyakhov mentioned this pull request Nov 7, 2025

[Rebase] Support transposed input for data-aware Weights Compression #3725

Closed

Shehrozkashif mentioned this pull request Jan 29, 2026

Enable transpose_a support for LoRA Correction #3864

Open

ljaljushkin assigned daniil-lyakhov and unassigned ljaljushkin Feb 17, 2026

daniil-lyakhov closed this Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support transposed input for data-aware Weights Compression#3296

Support transposed input for data-aware Weights Compression#3296
rk119 wants to merge 40 commits intoopenvinotoolkit:developfrom
rk119:support_transposed_input

rk119 commented Feb 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexsu52 commented May 1, 2025

Uh oh!

rk119 commented May 4, 2025 •

edited

Loading

Uh oh!

alexsu52 commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreyanufr May 12, 2025

Uh oh!

rk119 May 14, 2025

Uh oh!

andreyanufr Jul 11, 2025

Uh oh!

ljaljushkin commented Feb 17, 2026

Uh oh!

daniil-lyakhov commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

rk119 commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Closes Issue

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexsu52 commented May 1, 2025

Uh oh!

rk119 commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexsu52 commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreyanufr May 12, 2025

Choose a reason for hiding this comment

Uh oh!

rk119 May 14, 2025

Choose a reason for hiding this comment

Uh oh!

andreyanufr Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

ljaljushkin commented Feb 17, 2026

Uh oh!

daniil-lyakhov commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rk119 commented Feb 19, 2025 •

edited

Loading

rk119 commented May 4, 2025 •

edited

Loading