pi06: align prompt template + add future byte-identity tests against pi07 planners

## Motivation

Issue #226 established a byte-identity check between `pi05` (continuous-state mode) and `pi07_paligemma`'s low-level planner: with the same backbone weights and all optional middle blocks (response / subgoal / metadata) dropped, `embed_prefix` should produce a byte-equivalent prefix to `pi05`. This was a nice anchor for refactor safety: any future divergence in the prefix layout shows up immediately as non-zero MSE / CE deltas.

The analogous test for `pi07` (Gemma 3 backbone) on [`feat/pi07`](https://github.com/TensorAuto/OpenTau/tree/feat/pi07) — verifying `pi07.embed_prefix == pi06.embed_prefix` when the pi07-specific features are all dropped — is **not currently feasible** without first aligning `pi06`'s prompt template (or introducing a `pi06_mem` policy as the analog of `pi05_mem`):

- **`pi07_low_level`** uses a continuous-state token block (`"State: " + state_proj(state) + state-end`) with separate `"Action: "` indicator tokens. `pi06` inlines a discretized state into the language prompt (`"Task: {task}<eos>State: {state}<eos>Actions:"`) and has no `State:`/`Action:` indicators in `embed_prefix`. The two architectures simply do not produce the same prefix.

- **`pi07_high_level`** uses discrete state inlined in the language prompt — like `pi06` — but the prompt template differs:
    - `pi07_high_level`: `"Task: {task}, Past Memory: {past_mem}, State: {state}, "`
    - `pi06`: `"Task: {task}<eos>State: {state}<eos>Actions:"` / `"<eos>Response:"`
  Additionally, `pi07_high_level` keeps an unconditional `"Updated Memory: "` AR anchor that `pi06` has no counterpart for (it must stay unconditional because inference relies on it — `memory_tokens` is `None` at inference by design).

PR #229 ports the #226-equivalent gating fixes to both `pi07` planners but explicitly defers the byte-identity test to this follow-up.

## Proposed sub-tasks

1. **Update / extend `pi06`'s prompt template** so that — given equivalent inputs — its language token stream can match `pi07_high_level`'s. Two options to consider:
   - (a) Add an alternate prompt mode in `pi06` that uses the `Past Memory:` + comma-separated layout used by `pi07_high_level`.
   - (b) Add a `pi06_mem` policy that mirrors how [`pi05_mem`](https://github.com/TensorAuto/OpenTau/tree/main/src/opentau/policies/pi05_mem) extends `pi05`, exposing both continuous-state (for `pi07_low_level` parity) and discrete-state (for `pi07_high_level` parity) modes.

2. **Add a byte-identity test `pi07_low_level` vs `pi06_mem` (continuous mode)** when metadata, subgoal, and historical observations are dropped (T=1, single-step state) — same recipe as the `pi05` ↔ `pi07_paligemma` test in issue #226. Expectations:
   - Same backbone seed → copy `gemma3_with_expert.state_dict()` + projection layers from the reference policy into `pi07_low_level`.
   - Same noise / time / fixed inputs → losses (MSE + CE) and full prefix tensor are byte-identical.

3. **Add a byte-identity test `pi07_high_level` vs `pi06` (discrete mode)** when metadata is dropped — requires resolving the `"Updated Memory: "` AR anchor:
   - Either run `pi06` with an analogous anchor in non-AR mode (so the comparison is fair),
   - Or add a test-only gate in `pi07_high_level` that suppresses the anchor (less faithful to the deployed model).

## References

- #226 — the analogous `pi05` ↔ `pi07_paligemma` byte-identity issue and fix.
- #229 — the `pi07` follow-up that this issue extends.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pi06: align prompt template + add future byte-identity tests against pi07 planners #230

Motivation

Proposed sub-tasks

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

pi06: align prompt template + add future byte-identity tests against pi07 planners #230

Description

Motivation

Proposed sub-tasks

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions