fix gptq quantization condition#2416
Conversation
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
@jiqing-feng @SunMarc Looks good to me for the short-term. Long term, since we have lack of information, we really don't know if this object which inherits nn.Linear but is not exactly nn.Linear is truely not-quantizable. For example, if we have a module that overrides nn.Linear and only wraps forward and the code inside just move tensors from disk to gpu pre-forward, and then after fwd, move the tensor back to disk, it would be black-listed by this logic but it is actually qualifiable for quantization. Like my comment in huggingface/transformers#44588 (comment), in the future, we need much more information to better decide. In the current space with lack of info, any decision we make is going to be incomplete and will either target too wide or too narrow. |
Same as huggingface/transformers#44588. The quantization only works for original nn.Linear module, subclass has custom forward so quantized layer cannot handle it.