Skip to content

RoboCasa: wire training-data co-training + 12-DoF projection head for meaningful eval #379

@shuheng-liu

Description

@shuheng-liu

Context

The initial RoboCasa365 sim-eval integration (src/opentau/envs/robocasa.py, RoboCasaEnv config, factory dispatch) makes the eval half of the loop work: parallel vec envs, success-rate aggregation, and grid_summary wandb videos all run against the real sim. But RoboCasa is currently eval-only — there is no RoboCasa training data in the dataset mixture and no projection head sized for its action/state, so eval has nothing meaningful to run.

LIBERO, by contrast, is a closed loop: TensorAuto/libero (20 fps v2.1) is co-trained, then eval on the LIBERO sim yields benchmark-comparable success rates.

Problem

  • No RoboCasa LeRobot dataset is wired into DatasetMixtureConfig.
  • RoboCasa's robot (PandaOmron) is 12-D action / 16-D state / 3 cameras, distinct from LIBERO's 7-D/8-D. There is no validated per-(robot_type, control_mode) projection head for it, and no norm stats.
  • The example eval config (configs/examples/pi05_robocasa_eval_config.json) loads a LIBERO checkpoint as a plumbing smoke, so rollouts are effectively random — "success rate" is not meaningful for RoboCasa yet.

Why it matters

This is the single thing that turns RoboCasa from "validated plumbing" into a real benchmark sibling of LIBERO. Until a RoboCasa-trained policy exists, the eval metrics are not interpretable.

Suggested approach

References

Follow-up to the initial RoboCasa env integration (branch claude/lucid-albattani-b33067).

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or requestmodelnew model or model request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions