Skip to content

[bug] Unable to freeze specific layers of a pretrained model #394

@RaymondLi0

Description

@RaymondLi0

🐞 Describe the Bug

I'm trying to freeze specific layers of a pretrained model (for example only layer 0).
The problem is that loading a pretrained model like Apriel-Thinker will load its decoder config as a FixedBlockSequenceConfig. However I need to pass a block-pattern config to freeze only certain layers. e.g.

decoder:
  type: pattern
  pattern:
    - train_block
    - freeze_block
    - freeze_block
  blocks:
    train_block:
      mlp:
        lr_scale: 1.e-12
    freeze_block:
      lr_scale: 1.e-12

We currently cannot reconcile these two configs.
So the solution would be to prevent loading the pretrained config with load_config: none, and re-pass the entire block config.
However this does not work currently because some type parameters are creeping into the decoder config:

'!!! block':
  type: decoder
  mixer:
    type: attention
    rotary:
      type: none
  mlp:
    type: mlp
  normalization:
    type: layer_norm

🔄 Steps to Reproduce

Steps to reproduce the behavior:

  1. Fast-LLM version: https://github.com/ServiceNow/Fast-LLM/tree/b7c0de61662c61e83c617bd8157d0bf9426e3d52
  2. Train with the following config
pretrained:
  format: mistral
  path: /mnt/checkpoints/upstream/Apriel-Nemotron-15b-Thinker-reinit-attn-layer-0
  load_config: none
model:
  base_model:
    decoder:
        type: pattern
        pattern:
          - train_block
          - freeze_block
          - freeze_block
          - freeze_block
        blocks:
          train_block:
            mlp:
              lr_scale: 1.e-12
          freeze_block:
            lr_scale: 1.e-12
  type: gpt
  1. Fails during config validation with
fast_llm.config.NestedValidationError: Validation failed for field `model` of type `fast_llm.models.gpt.config.GPTModelConfig` in class fast_llm.models.gpt.config.GPTTrainerConfig:
  Validation failed for field `base_model` of type `fast_llm.models.gpt.config.GPTBaseModelConfig` in class fast_llm.models.gpt.config.GPTModelConfig:
    Validation failed for field `decoder` of type `fast_llm.layers.block.config.BlockSequenceConfig` in class fast_llm.models.gpt.config.GPTBaseModelConfig:
      Unknown field `block` in class fast_llm.layers.block.config.PatternBlockSequenceConfig

The decoder config would look like:

    decoder:
      type: pattern
      blocks:
        train_block:
          [...]
        freeze_block:
          [...]
      pattern:
      - train_block
      - freeze_block
      - freeze_block
      - freeze_block
      num_blocks: 50
      '!!! block':   <--- undesired entry coming from the pretrained checkpoint
        type: decoder
        mixer:
          type: attention
          rotary:
            type: none
        mlp:
          type: mlp
        normalization:
          type: layer_norm

🎯 Expected Behavior

Should only load current config

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions