-
-
Notifications
You must be signed in to change notification settings - Fork 827
Description
I'm working with a RecurrentGemma model (alpindale/recurrentgemma-9b-it).
My desired result is quantizing the model to int8, but keeping the recurrent blocks (model.layers.X.temporal_block) unquantized.
Unfortunately setting llm_int8_skip_modules=["temporal_block"] doesn't work.
On loading the model, I get AttributeError: 'Parameter' object has no attribute 'SCB'.
You can reproduce it with this Colab: https://colab.research.google.com/drive/1NkgcXmuYJg0XqFWMuZA-ZwWDjgdsuCjr
I've also tried llm_int8_skip_modules=["recurrent"], this yields the same error.
Also the library isn't broken; llm_int8_skip_modules=["lm_head"] works fine.
My initial thought is that I'm setting the option incorrectly, but I'm not even sure how to find the right setting.
Anyone have any thoughts on how to fix this?