Skip to content

pi07_paligemma low-level planner: import is broken — needs migration to SpaceTimeSiglipVideoEncoder (or restoration of VJEPA2VideoEncoder) #210

@shuheng-liu

Description

@shuheng-liu

Summary

After #171 renamed VJEPA2VideoEncoderSpaceTimeSiglipVideoEncoder in pi05_mem/video_encoder.py, the pi07 paligemma low-level planner was left with stale imports of the now-removed name. The module fails at import time:

ImportError: cannot import name 'VJEPA2VideoEncoder'
    from 'opentau.policies.pi05_mem.video_encoder'

This affects two sites in src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py:

  • L59: from opentau.policies.pi05_mem.video_encoder import VJEPA2VideoEncoder
  • L1053: self.video_encoder = VJEPA2VideoEncoder(...) with 8 V-JEPA2-specific kwargs

PR #209 adds --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py to cpu_test.yml as a tactical CI unblock — but the production module is still broken. Anyone trying to instantiate PI07LowLevelPlannerPolicy today gets the same ImportError. This issue tracks the proper fix.

Why it's not a one-line rename

Old ctor — self-contained:

VJEPA2VideoEncoder(
    vjepa2_model_name=...,
    num_frames=...,
    crop_size=...,
    num_video_tokens=...,
    vlm_hidden_size=...,
    perceiver_heads=...,
    freeze_encoder=...,
    encoder_dtype=...,
)

New ctor — caller-owned vision components:

SpaceTimeSiglipVideoEncoder(
    vision_tower=...,           # passed by reference from caller's PaliGemma
    multi_modal_projector=...,  # ditto
    spacetime_layer_stride=...,
    num_video_tokens=...,
    # ... different config surface entirely
)

The new class does not construct its own backbone; it wraps and mutates the caller's PaliGemma vision tower in place. pi07 currently builds its own V-JEPA2 model independently of the PaliGemma it owns. To migrate, you have to decide:

  1. Use PaliGemma's SigLIP as the video encoder (matches what pi05_mem does post-feat: replace V-JEPA2 with space-time SigLIP video encoder in pi05_mem #171). Drop the V-JEPA2 path entirely. pi07's vjepa2_* config fields become inactive (or get renamed to spacetime_* / siglip_* equivalents). Major behavioural change for any in-flight pi07 experiments.
  2. Keep V-JEPA2 in pi07: re-introduce a VJEPA2VideoEncoder class somewhere (e.g. pi07_paligemma/video_encoder.py, owned by pi07 since pi05_mem no longer wants it). Lifts the now-deleted code from before feat: replace V-JEPA2 with space-time SigLIP video encoder in pi05_mem #171's rename and keeps pi07 architecturally independent of pi05_mem's encoder choice.

Both paths need the pi07 owner's design input. Option 2 is the smaller / less risky choice if pi07 was deliberately using V-JEPA2 over SigLIP for a reason.

Owner / scope

Touches:

  • src/opentau/policies/pi07_paligemma/low_level_planner/modeling_pi07_low_level.py — primary site (import + ctor at L59, L1053).
  • Possibly src/opentau/policies/pi07_paligemma/low_level_planner/configuration_pi07_low_level.py — the vjepa2_* config fields it uses today.
  • New src/opentau/policies/pi07_paligemma/video_encoder.py if option 2 is chosen.
  • Possibly tests/policies/test_pi07_paligemma_low_level_planner.py — assertions about token counts / hidden sizes will likely shift.
  • Once fixed: drop the --ignore line added by ci: skip pi07_paligemma low-level test (broken at import) to unblock CPU CI #209 from .github/workflows/cpu_test.yml.

CI implication: until this lands, --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py (PR #209) keeps CPU CI green on every PR.

Refs: #167 (added pi07), #171 (renamed encoder), #182 (precision fix; affected by the broken CI), #209 (tactical CI unblock).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions