(Doubt) H pruning based of tied_gpt_handle owner's weight

Hey guys!

You're work on Moe-Quant is awesome, I am using it as my primary reference for building [4Bit-Forge](https://github.com/Pranshu-Bahadur/4Bit-Forge).

Just wondering if you could help me clear a doubt?
Is there a reason you only apply this step at tied_gptq_handle "owners" (None)? 
Since you broadcast the hessians to tied_gpt_handles, doesn't this make up_proj layer's activations pruned based off their parent gate_proj/"owner" (Unless H is not broadcasted, and only self.H is, which would still make chol(Hinv) different for non-"owners")?

[Here](https://github.com/IST-DASLab/MoE-Quant/blob/5a3b298cfb5c475a9b6584d48b43fcebc4ddfb2f/src/gptq.py#L224) are the lines I'm referring to:

```python
zero_cols = torch.nonzero(w.eq(0).all(dim=0)) #<-gate_proj only
H = self.H
# Regularize Hessian before quantization
if not self.tied_gptq_handle:
    # Mask rows with zero input channels
    H[zero_cols, :] = 0
    H[:, zero_cols] = 0
    H[zero_cols, zero_cols] = 1
```

[Here is how i currently do it in 4Bit-Forge](https://github.com/Pranshu-Bahadur/4Bit-Forge/blob/d7da32d265c20b5f41e6d95c9bfeb9edda3a5c62/forge/gptq.py#L139), please let me know if there is a reason that I am missing! Looking forward to learning from you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Doubt) H pruning based of tied_gpt_handle owner's weight #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(Doubt) H pruning based of tied_gpt_handle owner's weight #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions