Skip to content

pi06: align prompt template + add future byte-identity tests against pi07 planners #230

@shuheng-liu

Description

@shuheng-liu

Motivation

Issue #226 established a byte-identity check between pi05 (continuous-state mode) and pi07_paligemma's low-level planner: with the same backbone weights and all optional middle blocks (response / subgoal / metadata) dropped, embed_prefix should produce a byte-equivalent prefix to pi05. This was a nice anchor for refactor safety: any future divergence in the prefix layout shows up immediately as non-zero MSE / CE deltas.

The analogous test for pi07 (Gemma 3 backbone) on feat/pi07 — verifying pi07.embed_prefix == pi06.embed_prefix when the pi07-specific features are all dropped — is not currently feasible without first aligning pi06's prompt template (or introducing a pi06_mem policy as the analog of pi05_mem):

  • pi07_low_level uses a continuous-state token block ("State: " + state_proj(state) + state-end) with separate "Action: " indicator tokens. pi06 inlines a discretized state into the language prompt ("Task: {task}<eos>State: {state}<eos>Actions:") and has no State:/Action: indicators in embed_prefix. The two architectures simply do not produce the same prefix.

  • pi07_high_level uses discrete state inlined in the language prompt — like pi06 — but the prompt template differs:

    • pi07_high_level: "Task: {task}, Past Memory: {past_mem}, State: {state}, "
    • pi06: "Task: {task}<eos>State: {state}<eos>Actions:" / "<eos>Response:"
      Additionally, pi07_high_level keeps an unconditional "Updated Memory: " AR anchor that pi06 has no counterpart for (it must stay unconditional because inference relies on it — memory_tokens is None at inference by design).

PR #229 ports the #226-equivalent gating fixes to both pi07 planners but explicitly defers the byte-identity test to this follow-up.

Proposed sub-tasks

  1. Update / extend pi06's prompt template so that — given equivalent inputs — its language token stream can match pi07_high_level's. Two options to consider:

    • (a) Add an alternate prompt mode in pi06 that uses the Past Memory: + comma-separated layout used by pi07_high_level.
    • (b) Add a pi06_mem policy that mirrors how pi05_mem extends pi05, exposing both continuous-state (for pi07_low_level parity) and discrete-state (for pi07_high_level parity) modes.
  2. Add a byte-identity test pi07_low_level vs pi06_mem (continuous mode) when metadata, subgoal, and historical observations are dropped (T=1, single-step state) — same recipe as the pi05pi07_paligemma test in issue pi07 low-level planner forward is not byte-equivalent to pi05 when all optional inputs are dropped #226. Expectations:

    • Same backbone seed → copy gemma3_with_expert.state_dict() + projection layers from the reference policy into pi07_low_level.
    • Same noise / time / fixed inputs → losses (MSE + CE) and full prefix tensor are byte-identical.
  3. Add a byte-identity test pi07_high_level vs pi06 (discrete mode) when metadata is dropped — requires resolving the "Updated Memory: " AR anchor:

    • Either run pi06 with an analogous anchor in non-AR mode (so the comparison is fair),
    • Or add a test-only gate in pi07_high_level that suppresses the anchor (less faithful to the deployed model).

References

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions