-
Notifications
You must be signed in to change notification settings - Fork 120
Open
Description
Description:
I'm using the Wanda pruning method on Llama 3.2-1b, but the process gets stuck at "loading calibration data" and doesn't proceed further. However, pruning with Llama 2 works fine and runs without issues. Below are the command and relevant information.
Command:
python main_new.py --model /Data/llama3.2-1b --prune_method wanda --sparsity_ratio 0.5 --sparsity_type unstructured --save out/llama3.2-1b/unstructured/wanda/ --save_model out/llama3.2-1b/unstructured/wanda/pruned_modelEnvironment:
- torch 1.10.1
- transformers 4.28.0
- accelerate 0.18.0
- Number of GPUs: 4
Debug Output:
Arguments:
sparsity_type: unstructured
sparsity_ratio: 0.5
prune_method: wanda
------------------------------
loading llm model /Data/llama3.2-1b
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /Data/llama3.2-1b and are newly initialized:
['model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', ...]
You should probably TRAIN this model on a downstream task to be able to use it for predictions and inference.
use device cuda:0
pruning starts
loading calibration data
Problem:
- The pruning process stops at "loading calibration data" and does not proceed further.
- The pruning works fine on Llama 2, but not on Llama 3.2-1b.
- This issue only occurs with Llama 3.2-1b, not with previous versions.
Steps Taken:
I have tried running the command with both Llama 3.2-1b and Llama 2.
The issue is specific to Llama 3.2-1b, as pruning on Llama 2 runs without issues.
Expected Behavior:
The pruning process should proceed without getting stuck at the "loading calibration data" step.
Any advice or insights would be greatly appreciated.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels