-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🐞 Describe the Bug
I'm trying to freeze specific layers of a pretrained model (for example only layer 0).
The problem is that loading a pretrained model like Apriel-Thinker will load its decoder config as a FixedBlockSequenceConfig. However I need to pass a block-pattern config to freeze only certain layers. e.g.
decoder:
type: pattern
pattern:
- train_block
- freeze_block
- freeze_block
blocks:
train_block:
mlp:
lr_scale: 1.e-12
freeze_block:
lr_scale: 1.e-12We currently cannot reconcile these two configs.
So the solution would be to prevent loading the pretrained config with load_config: none, and re-pass the entire block config.
However this does not work currently because some type parameters are creeping into the decoder config:
'!!! block':
type: decoder
mixer:
type: attention
rotary:
type: none
mlp:
type: mlp
normalization:
type: layer_norm🔄 Steps to Reproduce
Steps to reproduce the behavior:
- Fast-LLM version: https://github.com/ServiceNow/Fast-LLM/tree/b7c0de61662c61e83c617bd8157d0bf9426e3d52
- Train with the following config
pretrained:
format: mistral
path: /mnt/checkpoints/upstream/Apriel-Nemotron-15b-Thinker-reinit-attn-layer-0
load_config: none
model:
base_model:
decoder:
type: pattern
pattern:
- train_block
- freeze_block
- freeze_block
- freeze_block
blocks:
train_block:
mlp:
lr_scale: 1.e-12
freeze_block:
lr_scale: 1.e-12
type: gpt- Fails during config validation with
fast_llm.config.NestedValidationError: Validation failed for field `model` of type `fast_llm.models.gpt.config.GPTModelConfig` in class fast_llm.models.gpt.config.GPTTrainerConfig:
Validation failed for field `base_model` of type `fast_llm.models.gpt.config.GPTBaseModelConfig` in class fast_llm.models.gpt.config.GPTModelConfig:
Validation failed for field `decoder` of type `fast_llm.layers.block.config.BlockSequenceConfig` in class fast_llm.models.gpt.config.GPTBaseModelConfig:
Unknown field `block` in class fast_llm.layers.block.config.PatternBlockSequenceConfig
The decoder config would look like:
decoder:
type: pattern
blocks:
train_block:
[...]
freeze_block:
[...]
pattern:
- train_block
- freeze_block
- freeze_block
- freeze_block
num_blocks: 50
'!!! block': <--- undesired entry coming from the pretrained checkpoint
type: decoder
mixer:
type: attention
rotary:
type: none
mlp:
type: mlp
normalization:
type: layer_norm
🎯 Expected Behavior
Should only load current config
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working