Skip to content

Start a v3 ladder with zero-centered RMSNorm#636

Draft
YashasSamaga wants to merge 7 commits intoallenai:mainfrom
YashasSamaga:ladder_v3
Draft

Start a v3 ladder with zero-centered RMSNorm#636
YashasSamaga wants to merge 7 commits intoallenai:mainfrom
YashasSamaga:ladder_v3

Conversation

@YashasSamaga
Copy link
Copy Markdown

@YashasSamaga YashasSamaga commented Mar 11, 2026

A new v3 ladder.

Changes:

  • adds embedding norm to v3

Other changes which are not in the v3 ladder:

  • adds zero-centered RMSNorm that has a more natural resting state for weight decay

Pending:

  • some restructuring is required for versioning correctly; parts of the code are hardcoded to v2

More v3 changes will be added in separate PRs.

Adds a zero-centered reparameterization option to
RMSNorm and LayerNormConfig that reparameterizes
the scale as (1 + weight) with weight initialized
to zero, instead of the standard initialization to
one. This allows a more natural weight decay
behavior on the scale parameter, since it will
decay towards zero (no scaling).
@YashasSamaga YashasSamaga requested a review from dirkgr March 11, 2026 00:13
Comment thread src/scripts/train/ladder/gemma_like_ladder.py Outdated
Copy link
Copy Markdown
Contributor

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just this one change!

Comment thread src/scripts/train/ladder/gemma_like_ladder.py Outdated
Add a WIP note indicating that more changes are to come before v3 is declared done.

Co-authored-by: Dirk Groeneveld <groeneveld@gmail.com>
Comment thread src/olmo_core/nn/layer_norm.py
@YashasSamaga YashasSamaga marked this pull request as ready for review March 11, 2026 22:48
@YashasSamaga YashasSamaga marked this pull request as draft March 26, 2026 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants