Skip to content

refactor(policies): deduplicate copy-pasted modules across pi0/pi05/pi06/pi07 (~7-8k lines of overlap) #249

@shuheng-liu

Description

@shuheng-liu

Summary

src/opentau/policies/ has accumulated significant copy-paste between policy families. Each new policy (pi0 → pi05 → pi05_mem → pi06 → pi07 → pi07_paligemma) has been added as a fresh subdirectory rather than by extending shared modules, so utilities like video_encoder.py, paligemma_with_expert.py, and gemma3_with_expert.py exist as near-identical forks. There are no shared base classes; each modeling_*.py re-implements the same set of prepare_* / embed_* / sample_actions / denoise_step methods.

This issue audits the duplication, quantifies it, and proposes concrete deduplication targets. Roughly 7-8k lines could be unified.

Findings (quantified)

Numbers are wc -l of each file and diff A B | wc -l (raw diff line count, lower = more similar).

# Files Lines (A / B) Diff lines Notes
1 pi05_mem/video_encoder.py vs pi07/low_level_planner/video_encoder.py 460 / 505 ~115 ~88% identical. Diff is mostly docstring (PI05Mem* vs PI07*), one contextmanager import, and the suppress_spacetime_temporal helper that pi07 adds.
2 pi06/gemma3_with_expert.py vs pi07/gemma3_with_expert.py 861 / 919 ~100 ~94% identical. Diff is one new config flag (load_pretrained_gemma3), updated docstrings (π0.6 → π0.7), and a _multi_modal_projector lookup helper.
3 pi0/paligemma_with_expert.py vs pi05/paligemma_with_expert.py 691 / 786 ~250 ~83% identical. pi05 adds discrete-action vocab support, AdaRMS config, validation.
4 pi07/low_level_planner/modeling_pi07_low_level.py vs pi07_paligemma/low_level_planner/modeling_pi07_low_level.py 1879 / 1744 ~636 Vision-encoder swap (SpaceTimeSiglip ↔ V-JEPA2) + class-name changes. See also #210, #211, #192.
5 pi07/high_level_planner/modeling_pi07_high_level.py vs pi07_paligemma/high_level_planner/modeling_pi07_high_level.py 1487 / 1440 ~391 Same situation as #4 but for the high-level planner.
6 pi05/modeling_pi05.py vs pi05_mem/modeling_pi05.py 1733 / 1194 ~1570 pi05_mem is an intentional memory variant — biggest divergence of the six, but the shared scaffolding (forward, prepare_*, sample_actions) is still copy-paste.

Method-level duplication in modeling_*.py

Every flow-matching policy reimplements the same surface. Grepping method signatures across pi0, pi05, pi06, pi07/low_level_planner:

predict_action_chunk(batch)
select_action(batch, noise=None)
sample_actions(batch, noise=None)
forward(batch)
prepare_images(batch)            # pi07 has prepare_videos + prepare_subgoal_images instead
prepare_language(batch)
prepare_state(batch)             # pi0, pi05, pi07
prepare_discrete_state(batch)    # pi05, pi06, pi07_high
prepare_discrete_actions(batch)  # pi05, pi06, pi07_low
prepare_response(batch)          # pi05, pi06, pi07
sample_noise(shape, device)
sample_time(bsize, device)

And inside each *FlowMatching submodule:

embed_prefix(...)
embed_suffix(noisy_actions, timestep)
forward(...)
sample_actions(...)
denoise_step(...)

The bodies of sample_noise, sample_time, select_action, predict_action_chunk, and denoise_step are byte-trivial differences across policies.

Bonus finding: class-name typo

pi07_paligemma/low_level_planner/configuration_pi07_low_level.py:38 defines PI07lowlevelPlannerConfig (lowercase "lowlevel"), while pi07/low_level_planner/configuration_pi07_low_level.py:40 defines PI07LowLevelPlannerConfig. factory.py:50 papers over this with an as alias. This is a downstream symptom of fork-and-edit duplication.

Cross-import map

grep -rn "from opentau.policies\." src/opentau/policies/ shows each policy directory only imports from itself and from policies/{pretrained,normalize,utils,factory}. There is zero sharing between sibling policy folders — the only cross-policy import is pi07/gemma3_with_expert.py lazily importing pi07/low_level_planner/video_encoder.py. This silo structure is what causes the duplication to grow with each new policy.

Proposed deduplication targets

Ordered by ROI (lines saved / behavior risk):

  1. shared/video_encoder.py — extract the SpaceTime-SigLIP wrapper used by both pi05_mem and pi07/low_level_planner. Pi07's suppress_spacetime_temporal context manager is a strict superset of pi05_mem's behavior, so pi05_mem can adopt it for free. Smallest, lowest-risk win (~460 lines deleted).

  2. shared/gemma3_with_expert.py — fold pi06 and pi07 versions together. The pi07 superset is load_pretrained_gemma3 flag + a vision/projector-locator helper; both are safe additions for pi06. ~860 lines deleted.

  3. shared/paligemma_with_expert.py — fold pi0 and pi05 versions together; pi05's discrete-action and AdaRMS additions are gated by config flags that pi0 simply doesn't set. ~690 lines deleted.

  4. pi07_paligemma removal/merge — already tracked in pi07_paligemma: re-implement V-JEPA2 video encoder, or delete the policy #211. Extracting (1)–(3) makes that merge trivial: pi07_paligemma becomes a config variant (vision encoder choice) of pi07, not a forked codebase.

  5. BaseFlowMatchingPolicy / BaseFlowMatchingExpert mixins — pull sample_noise, sample_time, select_action, predict_action_chunk, denoise_step and the standard prepare_* skeleton into base classes. Subclasses override only what genuinely differs (vision tower, action head, prefix/suffix layout). Largest payoff but highest risk — should land after the byte-equivalence regression tests from pi07 low-level planner forward is not byte-equivalent to pi05 when all optional inputs are dropped #226 and pi06: align prompt template + add future byte-identity tests against pi07 planners #230 are in place so we can prove the refactor is identity-preserving.

  6. BaseVLMPolicyConfigconfiguration_pi0.py, configuration_pi05.py, configuration_pi06.py share most fields (vision/state/action shapes, optimizer block, normalization mapping). A shared dataclass base with policy-specific subclasses overriding only divergent fields would shrink each config to <50 lines.

Suggested rollout

Land 1 → 2 → 3 as independent PRs, each gated on byte-identical loss/forward output for at least one smoke config per affected policy (pi0/pi05/pi06/pi07). Defer 5 until #226 and #230 give us a regression net. 4 happens naturally as a follow-up to 1–3.

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions