refactor(policies): deduplicate copy-pasted modules across pi0/pi05/pi06/pi07 (~7-8k lines of overlap)

## Summary

`src/opentau/policies/` has accumulated significant copy-paste between policy families. Each new policy (pi0 → pi05 → pi05_mem → pi06 → pi07 → pi07_paligemma) has been added as a fresh subdirectory rather than by extending shared modules, so utilities like `video_encoder.py`, `paligemma_with_expert.py`, and `gemma3_with_expert.py` exist as near-identical forks. There are no shared base classes; each `modeling_*.py` re-implements the same set of `prepare_*` / `embed_*` / `sample_actions` / `denoise_step` methods.

This issue audits the duplication, quantifies it, and proposes concrete deduplication targets. Roughly **7-8k lines** could be unified.

## Findings (quantified)

Numbers are `wc -l` of each file and `diff A B | wc -l` (raw diff line count, lower = more similar).

| # | Files | Lines (A / B) | Diff lines | Notes |
|---|-------|---------------|-----------|-------|
| 1 | `pi05_mem/video_encoder.py` vs `pi07/low_level_planner/video_encoder.py` | 460 / 505 | ~115 | ~88% identical. Diff is mostly docstring (`PI05Mem*` vs `PI07*`), one `contextmanager` import, and the `suppress_spacetime_temporal` helper that pi07 adds. |
| 2 | `pi06/gemma3_with_expert.py` vs `pi07/gemma3_with_expert.py` | 861 / 919 | ~100 | ~94% identical. Diff is one new config flag (`load_pretrained_gemma3`), updated docstrings (π0.6 → π0.7), and a `_multi_modal_projector` lookup helper. |
| 3 | `pi0/paligemma_with_expert.py` vs `pi05/paligemma_with_expert.py` | 691 / 786 | ~250 | ~83% identical. pi05 adds discrete-action vocab support, AdaRMS config, validation. |
| 4 | `pi07/low_level_planner/modeling_pi07_low_level.py` vs `pi07_paligemma/low_level_planner/modeling_pi07_low_level.py` | 1879 / 1744 | ~636 | Vision-encoder swap (SpaceTimeSiglip ↔ V-JEPA2) + class-name changes. See also #210, #211, #192. |
| 5 | `pi07/high_level_planner/modeling_pi07_high_level.py` vs `pi07_paligemma/high_level_planner/modeling_pi07_high_level.py` | 1487 / 1440 | ~391 | Same situation as #4 but for the high-level planner. |
| 6 | `pi05/modeling_pi05.py` vs `pi05_mem/modeling_pi05.py` | 1733 / 1194 | ~1570 | pi05_mem is an intentional memory variant — biggest divergence of the six, but the shared scaffolding (forward, prepare_*, sample_actions) is still copy-paste. |

## Method-level duplication in `modeling_*.py`

Every flow-matching policy reimplements the same surface. Grepping method signatures across `pi0`, `pi05`, `pi06`, `pi07/low_level_planner`:

```
predict_action_chunk(batch)
select_action(batch, noise=None)
sample_actions(batch, noise=None)
forward(batch)
prepare_images(batch)            # pi07 has prepare_videos + prepare_subgoal_images instead
prepare_language(batch)
prepare_state(batch)             # pi0, pi05, pi07
prepare_discrete_state(batch)    # pi05, pi06, pi07_high
prepare_discrete_actions(batch)  # pi05, pi06, pi07_low
prepare_response(batch)          # pi05, pi06, pi07
sample_noise(shape, device)
sample_time(bsize, device)
```

And inside each `*FlowMatching` submodule:

```
embed_prefix(...)
embed_suffix(noisy_actions, timestep)
forward(...)
sample_actions(...)
denoise_step(...)
```

The bodies of `sample_noise`, `sample_time`, `select_action`, `predict_action_chunk`, and `denoise_step` are byte-trivial differences across policies.

## Bonus finding: class-name typo

`pi07_paligemma/low_level_planner/configuration_pi07_low_level.py:38` defines `PI07lowlevelPlannerConfig` (lowercase "lowlevel"), while `pi07/low_level_planner/configuration_pi07_low_level.py:40` defines `PI07LowLevelPlannerConfig`. `factory.py:50` papers over this with an `as` alias. This is a downstream symptom of fork-and-edit duplication.

## Cross-import map

`grep -rn "from opentau.policies\." src/opentau/policies/` shows each policy directory only imports from itself and from `policies/{pretrained,normalize,utils,factory}`. There is **zero** sharing between sibling policy folders — the only cross-policy import is `pi07/gemma3_with_expert.py` lazily importing `pi07/low_level_planner/video_encoder.py`. This silo structure is what causes the duplication to grow with each new policy.

## Proposed deduplication targets

Ordered by ROI (lines saved / behavior risk):

1. **`shared/video_encoder.py`** — extract the SpaceTime-SigLIP wrapper used by both `pi05_mem` and `pi07/low_level_planner`. Pi07's `suppress_spacetime_temporal` context manager is a strict superset of pi05_mem's behavior, so pi05_mem can adopt it for free. Smallest, lowest-risk win (~460 lines deleted).

2. **`shared/gemma3_with_expert.py`** — fold pi06 and pi07 versions together. The pi07 superset is `load_pretrained_gemma3` flag + a vision/projector-locator helper; both are safe additions for pi06. ~860 lines deleted.

3. **`shared/paligemma_with_expert.py`** — fold pi0 and pi05 versions together; pi05's discrete-action and AdaRMS additions are gated by config flags that pi0 simply doesn't set. ~690 lines deleted.

4. **`pi07_paligemma` removal/merge** — already tracked in #211. Extracting (1)–(3) makes that merge trivial: pi07_paligemma becomes a config variant (vision encoder choice) of pi07, not a forked codebase.

5. **`BaseFlowMatchingPolicy` / `BaseFlowMatchingExpert` mixins** — pull `sample_noise`, `sample_time`, `select_action`, `predict_action_chunk`, `denoise_step` and the standard `prepare_*` skeleton into base classes. Subclasses override only what genuinely differs (vision tower, action head, prefix/suffix layout). Largest payoff but highest risk — should land after the byte-equivalence regression tests from #226 and #230 are in place so we can prove the refactor is identity-preserving.

6. **`BaseVLMPolicyConfig`** — `configuration_pi0.py`, `configuration_pi05.py`, `configuration_pi06.py` share most fields (vision/state/action shapes, optimizer block, normalization mapping). A shared dataclass base with policy-specific subclasses overriding only divergent fields would shrink each config to <50 lines.

## Suggested rollout

Land 1 → 2 → 3 as independent PRs, each gated on byte-identical loss/forward output for at least one smoke config per affected policy (pi0/pi05/pi06/pi07). Defer 5 until #226 and #230 give us a regression net. 4 happens naturally as a follow-up to 1–3.

## Related

- #211 — `pi07_paligemma`: re-implement V-JEPA2 or delete the policy
- #210 — `pi07_paligemma` low-level planner broken import
- #192 — replace V-JEPA2 with MEM space-time SigLIP in pi07_paligemma
- #226 — pi07 byte-equivalence regression test
- #230 — pi06 prompt alignment + future byte-identity tests against pi07
- #247 — rename `pi07/low_level_planner` → `low_level`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(policies): deduplicate copy-pasted modules across pi0/pi05/pi06/pi07 (~7-8k lines of overlap) #249

Summary

Findings (quantified)

Method-level duplication in `modeling_*.py`

Bonus finding: class-name typo

Cross-import map

Proposed deduplication targets

Suggested rollout

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

#	Files	Lines (A / B)	Diff lines	Notes
1	`pi05_mem/video_encoder.py` vs `pi07/low_level_planner/video_encoder.py`	460 / 505	~115	~88% identical. Diff is mostly docstring (`PI05Mem` vs `PI07`), one `contextmanager` import, and the `suppress_spacetime_temporal` helper that pi07 adds.
2	`pi06/gemma3_with_expert.py` vs `pi07/gemma3_with_expert.py`	861 / 919	~100	~94% identical. Diff is one new config flag (`load_pretrained_gemma3`), updated docstrings (π0.6 → π0.7), and a `_multi_modal_projector` lookup helper.
3	`pi0/paligemma_with_expert.py` vs `pi05/paligemma_with_expert.py`	691 / 786	~250	~83% identical. pi05 adds discrete-action vocab support, AdaRMS config, validation.
4	`pi07/low_level_planner/modeling_pi07_low_level.py` vs `pi07_paligemma/low_level_planner/modeling_pi07_low_level.py`	1879 / 1744	~636	Vision-encoder swap (SpaceTimeSiglip ↔ V-JEPA2) + class-name changes. See also #210, #211, #192.
5	`pi07/high_level_planner/modeling_pi07_high_level.py` vs `pi07_paligemma/high_level_planner/modeling_pi07_high_level.py`	1487 / 1440	~391	Same situation as #4 but for the high-level planner.
6	`pi05/modeling_pi05.py` vs `pi05_mem/modeling_pi05.py`	1733 / 1194	~1570	pi05_mem is an intentional memory variant — biggest divergence of the six, but the shared scaffolding (forward, prepare_*, sample_actions) is still copy-paste.

refactor(policies): deduplicate copy-pasted modules across pi0/pi05/pi06/pi07 (~7-8k lines of overlap) #249

Description

Summary

Findings (quantified)

Method-level duplication in modeling_*.py

Bonus finding: class-name typo

Cross-import map

Proposed deduplication targets

Suggested rollout

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Method-level duplication in `modeling_*.py`