You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue #226 established a byte-identity check between pi05 (continuous-state mode) and pi07_paligemma's low-level planner: with the same backbone weights and all optional middle blocks (response / subgoal / metadata) dropped, embed_prefix should produce a byte-equivalent prefix to pi05. This was a nice anchor for refactor safety: any future divergence in the prefix layout shows up immediately as non-zero MSE / CE deltas.
The analogous test for pi07 (Gemma 3 backbone) on feat/pi07 — verifying pi07.embed_prefix == pi06.embed_prefix when the pi07-specific features are all dropped — is not currently feasible without first aligning pi06's prompt template (or introducing a pi06_mem policy as the analog of pi05_mem):
pi07_low_level uses a continuous-state token block ("State: " + state_proj(state) + state-end) with separate "Action: " indicator tokens. pi06 inlines a discretized state into the language prompt ("Task: {task}<eos>State: {state}<eos>Actions:") and has no State:/Action: indicators in embed_prefix. The two architectures simply do not produce the same prefix.
pi07_high_level uses discrete state inlined in the language prompt — like pi06 — but the prompt template differs:
pi07_high_level: "Task: {task}, Past Memory: {past_mem}, State: {state}, "
pi06: "Task: {task}<eos>State: {state}<eos>Actions:" / "<eos>Response:"
Additionally, pi07_high_level keeps an unconditional "Updated Memory: " AR anchor that pi06 has no counterpart for (it must stay unconditional because inference relies on it — memory_tokens is None at inference by design).
PR #229 ports the #226-equivalent gating fixes to both pi07 planners but explicitly defers the byte-identity test to this follow-up.
Proposed sub-tasks
Update / extend pi06's prompt template so that — given equivalent inputs — its language token stream can match pi07_high_level's. Two options to consider:
(a) Add an alternate prompt mode in pi06 that uses the Past Memory: + comma-separated layout used by pi07_high_level.
(b) Add a pi06_mem policy that mirrors how pi05_mem extends pi05, exposing both continuous-state (for pi07_low_level parity) and discrete-state (for pi07_high_level parity) modes.
Motivation
Issue #226 established a byte-identity check between
pi05(continuous-state mode) andpi07_paligemma's low-level planner: with the same backbone weights and all optional middle blocks (response / subgoal / metadata) dropped,embed_prefixshould produce a byte-equivalent prefix topi05. This was a nice anchor for refactor safety: any future divergence in the prefix layout shows up immediately as non-zero MSE / CE deltas.The analogous test for
pi07(Gemma 3 backbone) onfeat/pi07— verifyingpi07.embed_prefix == pi06.embed_prefixwhen the pi07-specific features are all dropped — is not currently feasible without first aligningpi06's prompt template (or introducing api06_mempolicy as the analog ofpi05_mem):pi07_low_leveluses a continuous-state token block ("State: " + state_proj(state) + state-end) with separate"Action: "indicator tokens.pi06inlines a discretized state into the language prompt ("Task: {task}<eos>State: {state}<eos>Actions:") and has noState:/Action:indicators inembed_prefix. The two architectures simply do not produce the same prefix.pi07_high_leveluses discrete state inlined in the language prompt — likepi06— but the prompt template differs:pi07_high_level:"Task: {task}, Past Memory: {past_mem}, State: {state}, "pi06:"Task: {task}<eos>State: {state}<eos>Actions:"/"<eos>Response:"Additionally,
pi07_high_levelkeeps an unconditional"Updated Memory: "AR anchor thatpi06has no counterpart for (it must stay unconditional because inference relies on it —memory_tokensisNoneat inference by design).PR #229 ports the #226-equivalent gating fixes to both
pi07planners but explicitly defers the byte-identity test to this follow-up.Proposed sub-tasks
Update / extend
pi06's prompt template so that — given equivalent inputs — its language token stream can matchpi07_high_level's. Two options to consider:pi06that uses thePast Memory:+ comma-separated layout used bypi07_high_level.pi06_mempolicy that mirrors howpi05_memextendspi05, exposing both continuous-state (forpi07_low_levelparity) and discrete-state (forpi07_high_levelparity) modes.Add a byte-identity test
pi07_low_levelvspi06_mem(continuous mode) when metadata, subgoal, and historical observations are dropped (T=1, single-step state) — same recipe as thepi05↔pi07_paligemmatest in issue pi07 low-level planner forward is not byte-equivalent to pi05 when all optional inputs are dropped #226. Expectations:gemma3_with_expert.state_dict()+ projection layers from the reference policy intopi07_low_level.Add a byte-identity test
pi07_high_levelvspi06(discrete mode) when metadata is dropped — requires resolving the"Updated Memory: "AR anchor:pi06with an analogous anchor in non-AR mode (so the comparison is fair),pi07_high_levelthat suppresses the anchor (less faithful to the deployed model).References
pi05↔pi07_paligemmabyte-identity issue and fix.pi07follow-up that this issue extends.