-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Hey guys!
You're work on Moe-Quant is awesome, I am using it as my primary reference for building 4Bit-Forge.
Just wondering if you could help me clear a doubt?
Is there a reason you only apply this step at tied_gptq_handle "owners" (None)?
Since you broadcast the hessians to tied_gpt_handles, doesn't this make up_proj layer's activations pruned based off their parent gate_proj/"owner" (Unless H is not broadcasted, and only self.H is, which would still make chol(Hinv) different for non-"owners")?
Here are the lines I'm referring to:
zero_cols = torch.nonzero(w.eq(0).all(dim=0)) #<-gate_proj only
H = self.H
# Regularize Hessian before quantization
if not self.tied_gptq_handle:
# Mask rows with zero input channels
H[zero_cols, :] = 0
H[:, zero_cols] = 0
H[zero_cols, zero_cols] = 1Here is how i currently do it in 4Bit-Forge, please let me know if there is a reason that I am missing! Looking forward to learning from you!
Metadata
Metadata
Assignees
Labels
No labels