Skip to content

fix gptq quantization condition#2416

Open
jiqing-feng wants to merge 2 commits intohuggingface:mainfrom
jiqing-feng:main
Open

fix gptq quantization condition#2416
jiqing-feng wants to merge 2 commits intohuggingface:mainfrom
jiqing-feng:main

Conversation

@jiqing-feng
Copy link
Copy Markdown
Contributor

Same as huggingface/transformers#44588. The quantization only works for original nn.Linear module, subclass has custom forward so quantized layer cannot handle it.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

jiqing-feng commented Mar 25, 2026

Hi @SunMarc . Please also review this PR. Thanks!

cc @Qubitium

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@Qubitium
Copy link
Copy Markdown
Contributor

@jiqing-feng @SunMarc Looks good to me for the short-term.

Long term, since we have lack of information, we really don't know if this object which inherits nn.Linear but is not exactly nn.Linear is truely not-quantizable. For example, if we have a module that overrides nn.Linear and only wraps forward and the code inside just move tensors from disk to gpu pre-forward, and then after fwd, move the tensor back to disk, it would be black-listed by this logic but it is actually qualifiable for quantization.

Like my comment in huggingface/transformers#44588 (comment), in the future, we need much more information to better decide.

In the current space with lack of info, any decision we make is going to be incomplete and will either target too wide or too narrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants