Skip to content

merge changes from upstream#550

Draft
lhphanto wants to merge 36 commits into
intrinsic-dev:mainfrom
lhphanto:main
Draft

merge changes from upstream#550
lhphanto wants to merge 36 commits into
intrinsic-dev:mainfrom
lhphanto:main

Conversation

@lhphanto
Copy link
Copy Markdown

@google-cla
Copy link
Copy Markdown

google-cla Bot commented May 14, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

lhphanto added 29 commits May 14, 2026 20:30
  - Takes c = (B, d) conditioning → projects to 6×d parameters via SiLU + Linear
  - Zero-initialized output linear → all gates start at 0, so the model is a pure identity at init (training stability)
  - Reuses the same self.norm (no learnable affine) for both SA and FFN pre-norms

  TransformerLayer:
  - Removed norm1, norm2 — replaced by self.adaLN
  - forward now takes cond and uses the adaptive shift/scale/gate for both sub-layers
  - SA residual: x = x + gate_sa * attn_out — gate is learned per-feature, per-layer
  - FFN pre-norm: adaLN.norm(x) * (1 + scale_ff) + shift_ff — same shared norm instance

  VectorFieldTransformer:
  - Removed norm_task from __init__ (task no longer a sequence token)
  - forward computes cond = timestep_emb + task_token.squeeze(1) — both are (B, d) and summed
  - Prefix is now just [img_tokens, state_tokens] — shorter sequence, cheaper attention
  - cond passed to every layer so each layer independently adapts its normalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant