TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'cu_seq_lens_q'
{
"model_name_or_path": "/mnt/model/model/granite-34b-code-base-gptq-20241001T150701",
"training_data_path": "/mnt/scratch/dataset/alpaca_data.json",
"output_dir": "/mnt/output/model",
"save_model_dir": "/mnt/output/model",
"num_train_epochs": 1.0,
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 4,
"gradient_accumulation_steps": 4,
"save_strategy": "no",
"learning_rate": 1e-5,
"weight_decay": 0.0,
"lr_scheduler_type": "cosine",
"include_tokens_per_second": true,
"response_template": "\n### Response:",
"dataset_text_field": "output",
"use_flash_attn": true,
"peft_method": "lora",
"target_modules": ["all-linear"],
"auto_gptq": ["triton_v2"],
"torch_dtype": "float16",
"fp16": true,
"fast_kernels": [true, true, true],
"fused_lora": ["auto_gptq", true],
"padding_free": ["huggingface"]
}
INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
WARNING:sft_trainer.py:PAD token set to default, to make it different from eos token
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
INFO - Compatibility: converting `checkpoint_format` from `gptq` to `gptq_v2`.
WARNING:sft_trainer.py:PAD token set to default, to make it different from eos token
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 520 examples [00:00, 110142.31 examples/s]
Map (num_proc=80): 0%| | 0/520 [00:00<?, ? examples/s]
Map (num_proc=80): 1%|â– | 7/520 [00:00<00:13, 39.24 examples/s]
Map (num_proc=80): 8%|â–Š | 42/520 [00:00<00:02, 170.18 examples/s]
Map (num_proc=80): 15%|█■| 77/520 [00:00<00:01, 229.55 examples/s]
Map (num_proc=80): 22%|██■| 112/520 [00:00<00:01, 269.21 examples/s]
Map (num_proc=80): 28%|██▊ | 147/520 [00:00<00:01, 243.08 examples/s]
Map (num_proc=80): 34%|███▎ | 175/520 [00:00<00:01, 240.79 examples/s]
Map (num_proc=80): 42%|████■| 217/520 [00:00<00:01, 275.45 examples/s]
Map (num_proc=80): 48%|████▊ | 252/520 [00:01<00:00, 279.71 examples/s]
Map (num_proc=80): 55%|█████▌ | 286/520 [00:01<00:00, 284.01 examples/s]
Map (num_proc=80): 61%|██████ | 316/520 [00:01<00:00, 268.06 examples/s]
Map (num_proc=80): 68%|██████▊ | 352/520 [00:01<00:00, 278.88 examples/s]
Map (num_proc=80): 73%|███████▎ | 382/520 [00:01<00:00, 276.99 examples/s]
Map (num_proc=80): 79%|███████▉ | 412/520 [00:01<00:00, 274.43 examples/s]
Map (num_proc=80): 85%|████████▌ | 442/520 [00:01<00:00, 273.51 examples/s]
Map (num_proc=80): 91%|█████████ | 472/520 [00:01<00:00, 258.29 examples/s]
Map (num_proc=80): 97%|█████████▋| 502/520 [00:01<00:00, 269.09 examples/s]
Map (num_proc=80): 100%|██████████| 520/520 [00:02<00:00, 246.16 examples/s]
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/granite.py:172: UserWarning: Granite Rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/granite.py:172: UserWarning: Granite Rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/llama.py:167: UserWarning: LLamaRules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/mistral.py:160: UserWarning: Mistral rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/llama.py:167: UserWarning: LLamaRules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/mistral.py:160: UserWarning: Mistral rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_aadp/framework_plugin_padding_free.py:132: UserWarning: transformers version supports padding free natively in various models.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_aadp/framework_plugin_padding_free.py:132: UserWarning: transformers version supports padding free natively in various models.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/transformers/training_args.py:2058: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/transformers/training_args.py:2058: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py:364: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
trainer = SFTTrainer(
/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py:364: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
trainer = SFTTrainer(
Map: 0%| | 0/520 [00:00<?, ? examples/s]
Map: 100%|██████████| 520/520 [00:00<00:00, 3944.25 examples/s]
Map: 100%|██████████| 520/520 [00:00<00:00, 3873.46 examples/s]
/home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
warnings.warn(
0%| | 0/65 [00:00<?, ?it/s]ERROR:sft_trainer.py:Traceback (most recent call last):
File "/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py", line 676, in main
trainer, additional_train_info = train(
^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py", line 420, in train
trainer.train(resume_from_checkpoint)
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 3731, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/peft/peft_model.py", line 1644, in forward
return self.base_model(
^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
return self.model.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'cu_seq_lens_q'
Training of the model pass successfully.
Training failed, see description for logs.
Add any other context about the problem here.
Describe the bug
Running QLoRA finetuning of granite-34b-code-base-gptq model fails with error:
Finetuning configuration:
Full Pod log:
Platform
fms-hf-tuning image:
quay.io/modh/fms-hf-tuning:v2.6.0Trained model:
granite-34b-code-base-gptq-20241001T150701Sample Code
Expected behavior
Training of the model pass successfully.
Observed behavior
Training failed, see description for logs.
Additional context
Add any other context about the problem here.