Skip to content

(Doubt) H pruning based of tied_gpt_handle owner's weight #9

@Pranshu-Bahadur

Description

@Pranshu-Bahadur

Hey guys!

You're work on Moe-Quant is awesome, I am using it as my primary reference for building 4Bit-Forge.

Just wondering if you could help me clear a doubt?
Is there a reason you only apply this step at tied_gptq_handle "owners" (None)?
Since you broadcast the hessians to tied_gpt_handles, doesn't this make up_proj layer's activations pruned based off their parent gate_proj/"owner" (Unless H is not broadcasted, and only self.H is, which would still make chol(Hinv) different for non-"owners")?

Here are the lines I'm referring to:

zero_cols = torch.nonzero(w.eq(0).all(dim=0)) #<-gate_proj only
H = self.H
# Regularize Hessian before quantization
if not self.tied_gptq_handle:
    # Mask rows with zero input channels
    H[zero_cols, :] = 0
    H[:, zero_cols] = 0
    H[zero_cols, zero_cols] = 1

Here is how i currently do it in 4Bit-Forge, please let me know if there is a reason that I am missing! Looking forward to learning from you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions