Skip to content

feat(robocasa): integrate RoboCasa kitchen sim alongside LIBERO#289

Open
akshay18iitg wants to merge 1 commit into
mainfrom
feat/robocasa_eval_script
Open

feat(robocasa): integrate RoboCasa kitchen sim alongside LIBERO#289
akshay18iitg wants to merge 1 commit into
mainfrom
feat/robocasa_eval_script

Conversation

@akshay18iitg
Copy link
Copy Markdown
Collaborator

What this does

Adds a RoboCasa env wrapper mirroring the LIBERO layout so policies can train/eval against the kitchen benchmark with the same TrainPipelineConfig plumbing (env.type=robocasa, task_ids fan out per accelerator rank). Bundles an opentau-eval-compatible entrypoint plus a pyproject robocasa extra mutually exclusive with libero (numpy 1.x vs 2.x).

Why: opentau-train so far only had LIBERO. RoboCasa unlocks 25 atomic + 301 composite kitchen tasks for the same pi0.5/pi0.7 checkpoints with no eval loop changes.

What:

  • envs/configs.py: register robocasa as a draccus EnvConfig; resolve a flat task (class names) and/or task_ids (int indices) into a fanned list per rank.
  • envs/robocasa.py: gym.Env wrapper around robocasa.utils.env_utils.create_env; emits positional camera{i} obs keys (matching LiberoEnv) and inlines _build_proprio_vector / _get_task_prompt so an upstream robocasa install is sufficient (no robocasa.scripts.client dep). Adds has_wrapper_attr / get_wrapper_attr shims for gymnasium 0.29 envs.
  • envs/factory.py: route make_envs / make_env_config to the new wrapper.
  • envs/robocasa_tasks.py: append-only registry covering 25 atomic skills (indices 0-24, matching upstream class names PickPlace*, OpenCabinet, OpenFridge, StartCoffeeMachine, ...) and 301 composite multi-stage tasks (25-325) grouped by category.
  • configs/robocasa.py: RoboCasaEvalConfig + TrainConfigWithRoboCasaEval mirroring configs/libero.py.
  • scripts/robocasa_eval/: standalone eval entrypoint reusing eval_policy_all/consolidate_eval_info. Directory deliberately not named robocasa to avoid shadowing the upstream robocasa package on accelerate's sys.path[0].
  • configs/examples/pi05_robocasa_eval_config.json: demo config with three cameras and a mix of atomic + composite tasks (task_ids=[0,12,267]).
  • pyproject.toml: new robocasa extra, [tool.uv.sources] pin for robocasa
    • scoped robosuite-master, conflicts vs libero/urdf, and override- dependencies (numba, scipy, protobuf, draccus) so uv sync resolves.

Verified by running opentau-train --config_path=.../pi05_robocasa_eval_config.json end-to-end on a 2-GPU box with MUJOCO_GL=osmesa: env construction, preprocess_observation key contract, policy.select_action, AsyncVectorEnv worker rollouts (150 steps × PickPlaceCounterToCabinet / OpenDrawer / CerealAndBowl), and per-group eval aggregation all succeed; eval_info.json

  • per-task MP4s land in cfg.output_dir.

How it was tested

running pi05 robocasa run

Checklist

  • I have added Google-style docstrings to important functions and ensured function parameters are typed.
  • My PR includes policy-related changes.
    • If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Note: Before submitting this PR, please read the contributor guideline.

Adds a RoboCasa env wrapper mirroring the LIBERO layout so policies can
train/eval against the kitchen benchmark with the same TrainPipelineConfig
plumbing (env.type=robocasa, task_ids fan out per accelerator rank). Bundles
an opentau-eval-compatible entrypoint plus a pyproject `robocasa` extra
mutually exclusive with `libero` (numpy 1.x vs 2.x).

Why: opentau-train so far only had LIBERO. RoboCasa unlocks 25 atomic + 301
composite kitchen tasks for the same pi0.5/pi0.7 checkpoints with no eval
loop changes.

What:
- envs/configs.py: register `robocasa` as a draccus EnvConfig; resolve a
  flat `task` (class names) and/or `task_ids` (int indices) into a fanned
  list per rank.
- envs/robocasa.py: gym.Env wrapper around robocasa.utils.env_utils.create_env;
  emits positional camera{i} obs keys (matching LiberoEnv) and inlines
  _build_proprio_vector / _get_task_prompt so an upstream robocasa install
  is sufficient (no robocasa.scripts.client dep). Adds has_wrapper_attr /
  get_wrapper_attr shims for gymnasium 0.29 envs.
- envs/factory.py: route make_envs / make_env_config to the new wrapper.
- envs/robocasa_tasks.py: append-only registry covering 25 atomic skills
  (indices 0-24, matching upstream class names PickPlace*, OpenCabinet,
  OpenFridge, StartCoffeeMachine, ...) and 301 composite multi-stage tasks
  (25-325) grouped by category.
- configs/robocasa.py: RoboCasaEvalConfig + TrainConfigWithRoboCasaEval
  mirroring configs/libero.py.
- scripts/robocasa_eval/: standalone eval entrypoint reusing
  eval_policy_all/consolidate_eval_info. Directory deliberately not named
  `robocasa` to avoid shadowing the upstream `robocasa` package on
  accelerate's sys.path[0].
- configs/examples/pi05_robocasa_eval_config.json: demo config with three
  cameras and a mix of atomic + composite tasks (task_ids=[0,12,267]).
- pyproject.toml: new `robocasa` extra, [tool.uv.sources] pin for robocasa
  + scoped robosuite-master, conflicts vs libero/urdf, and override-
  dependencies (numba, scipy, protobuf, draccus) so uv sync resolves.

Verified by running `opentau-train --config_path=.../pi05_robocasa_eval_config.json`
end-to-end on a 2-GPU box with `MUJOCO_GL=osmesa`: env construction,
preprocess_observation key contract, policy.select_action, AsyncVectorEnv
worker rollouts (150 steps × PickPlaceCounterToCabinet / OpenDrawer /
CerealAndBowl), and per-group eval aggregation all succeed; eval_info.json
+ per-task MP4s land in cfg.output_dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"dataset_mixture": {
"datasets": [
{
"repo_id": "physical-intelligence/libero"
Copy link
Copy Markdown
Member

@shuheng-liu shuheng-liu May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an example robocasa dataset?

Copy link
Copy Markdown
Member

@shuheng-liu shuheng-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together — the LIBERO-shaped wrapper, per-rank fan-out, and pyproject conflict declarations all look reasonable. A few findings before merge; the first two will fail at runtime.

Bugs

1. OrganizeMetalicUtensils in the registry is a typo and will fail create_env.
src/opentau/envs/robocasa_tasks.py:270 lists "OrganizeMetalicUtensils" (one l). Upstream robocasa registers the class as OrganizeMetallicUtensils (two ls) — the filename organize_metalic_utensils.py is robocasa's own filename misspelling, but the class inside is class OrganizeMetallicUtensils(Kitchen). robosuite.make(env_name="OrganizeMetalicUtensils") will raise. The pyproject.toml typo override that whitelists "Metalic" is masking the bug rather than fixing it — please fix the spelling in the registry and drop the override.

2. The 301-entry composite list hasn't been cross-checked against upstream class names.
The atomic block looks right (matches ATOMIC_TASK_DATASETS), but I spot-checked the composite block and at least one name is wrong (#1 above). Given the size and the append-only contract, please regenerate this list programmatically from robocasa.environments.ALL_KITCHEN_ENVIRONMENTS (which is keyed by __name__ of registered classes) and commit the script that produces it, so the registry can be re-verified when upstream adds tasks. Without that, every bad entry is a silent landmine — the wrapper resolves task_ids long before create_env runs.

Significant

3. No tests. 2,682 lines of new code, zero tests. At minimum:

  • An import-smoke test for opentau.envs.robocasa_tasks (no robocasa install needed) that asserts len(ROBOCASA_TASKS) == 326, no duplicates, and round-trips task_index(task_name(i)) == i.
  • A gated test (skip unless robocasa is installed) that imports robocasa.environments.ALL_KITCHEN_ENVIRONMENTS and asserts every name in ROBOCASA_TASKS is registered upstream — this would have caught #1 and any other typos.

4. configs/robocasa.py is dead code as wired.
scripts/robocasa_eval/eval.py:69 annotates cfg: TrainPipelineConfig, not TrainConfigWithRoboCasaEval, so RoboCasaEvalConfig / TrainConfigWithRoboCasaEval are never instantiated, the chunk_usage validator never runs, and cfg.robocasa.* would AttributeError if anyone tried to use it. Either delete the module or switch the eval entrypoint's annotation to TrainConfigWithRoboCasaEval and have eval.py use cfg.robocasa.*.

5. scripts/robocasa_eval/eval.py duplicates scripts/eval.py almost verbatim — the only deltas are an isinstance(cfg.env, RoboCasaEnv) guard and return_episode_data=False. Since make_envs(cfg.env, ...) already dispatches by env type, opentau-eval should just work for RoboCasa today. Either:

  • delete the duplicate and document that opentau-eval handles both, or
  • if you want a separate console command, add opentau-robocasa-eval = "opentau.scripts.launch:robocasa_eval" in [project.scripts] and a launch.py wrapper. As-is the new file is only reachable via direct accelerate launch and the PR description's "opentau-eval-compatible entrypoint" doesn't actually exist.

6. Naming collision. RoboCasaEnv is both the draccus config (envs/configs.py:189) and the gym.Env wrapper (envs/robocasa.py:112). LIBERO avoids this by naming the wrapper LiberoEnv (config) vs LiberoEnv only existing as a config — the env wrapper has a different class name. Suggest renaming the gym.Env subclass to RoboCasaGymEnv or RoboCasaSimEnv to match the LIBERO module layout and avoid from ... import RoboCasaEnv ambiguity.

7. reset() tears down and rebuilds the MuJoCo env every call (envs/robocasa.py:295-300). The comment says robosuite "doesn't expose a re-seed API" but env.reset() in current robosuite accepts a new layout via hard_reset=True, and the unconditional rebuild adds seconds-to-tens-of-seconds of XML compilation per episode. At n_episodes=50 × 326 tasks this is a measurable chunk of eval wall-clock. At minimum, only rebuild when seed actually changes.

8. Two upstream-private calls. self._env._check_success() (step) and self._env._get_observations() (render) are private API. RoboCasa has public equivalents — env.is_success() / env.observe() — please prefer them, or pin a specific robocasa commit ({ git = ..., rev = "<sha>" } instead of branch = "main") so the wrapper doesn't silently break when upstream renames either.

9. Pinning branch = "main" for robocasa and robosuite in [tool.uv.sources] is fragile — every uv sync resolves a different commit. Please pin a rev = "<sha>" for both so this branch is reproducible.

Minor

  • RoboCasaEvalConfig.task: str = "PnPCounterToCab" (configs/robocasa.py:39) — PnPCounterToCab is the legacy robosuite alias; the registry uses PickPlaceCounterToCabinet. Pick one convention everywhere (I'd suggest the dataset-registry form, since that's what the integer index resolves to).
  • pyproject.toml:151-152 comments still say scripts/robocasa/server.py; the dir was renamed to scripts/robocasa_eval/.
  • src/opentau/scripts/robocasa_eval/server.py:39 Run block says python -m opentau.scripts.robocasa_server_async — should be opentau.scripts.robocasa_eval.server.
  • configs/robocasa.py:69 robocasa: RoboCasaEvalConfig = None — annotate as RoboCasaEvalConfig | None.
  • RoboCasaEvalConfig.task_names is set in __post_init__ but isn't a declared field, so asdict(cfg) won't include it (consistency surprise vs. other configs).
  • is_atomic/is_composite silently accept bool (since isinstance(True, int)); task_name correctly rejects it — make them consistent.
  • f"Chunk usage must be between 1 and {self.action_chunk=}" (configs/robocasa.py:88) — the = debug spec yields self.action_chunk=10 in the message; drop the =.
  • PR checklist: this PR adds new env plumbing only, so the "policy-related changes" box being unchecked is fine — just confirm you don't need the nightly regression test signal before merge.

Happy to re-review once the registry is regenerated and the duplicate eval script / dead config module are resolved.


Generated by Claude Code

@akshay18iitg
Copy link
Copy Markdown
Collaborator Author

@claude fix all points except 5 and 8.

@akshay18iitg akshay18iitg self-assigned this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants