feat(robocasa): integrate RoboCasa kitchen sim alongside LIBERO#289
feat(robocasa): integrate RoboCasa kitchen sim alongside LIBERO#289akshay18iitg wants to merge 1 commit into
Conversation
Adds a RoboCasa env wrapper mirroring the LIBERO layout so policies can
train/eval against the kitchen benchmark with the same TrainPipelineConfig
plumbing (env.type=robocasa, task_ids fan out per accelerator rank). Bundles
an opentau-eval-compatible entrypoint plus a pyproject `robocasa` extra
mutually exclusive with `libero` (numpy 1.x vs 2.x).
Why: opentau-train so far only had LIBERO. RoboCasa unlocks 25 atomic + 301
composite kitchen tasks for the same pi0.5/pi0.7 checkpoints with no eval
loop changes.
What:
- envs/configs.py: register `robocasa` as a draccus EnvConfig; resolve a
flat `task` (class names) and/or `task_ids` (int indices) into a fanned
list per rank.
- envs/robocasa.py: gym.Env wrapper around robocasa.utils.env_utils.create_env;
emits positional camera{i} obs keys (matching LiberoEnv) and inlines
_build_proprio_vector / _get_task_prompt so an upstream robocasa install
is sufficient (no robocasa.scripts.client dep). Adds has_wrapper_attr /
get_wrapper_attr shims for gymnasium 0.29 envs.
- envs/factory.py: route make_envs / make_env_config to the new wrapper.
- envs/robocasa_tasks.py: append-only registry covering 25 atomic skills
(indices 0-24, matching upstream class names PickPlace*, OpenCabinet,
OpenFridge, StartCoffeeMachine, ...) and 301 composite multi-stage tasks
(25-325) grouped by category.
- configs/robocasa.py: RoboCasaEvalConfig + TrainConfigWithRoboCasaEval
mirroring configs/libero.py.
- scripts/robocasa_eval/: standalone eval entrypoint reusing
eval_policy_all/consolidate_eval_info. Directory deliberately not named
`robocasa` to avoid shadowing the upstream `robocasa` package on
accelerate's sys.path[0].
- configs/examples/pi05_robocasa_eval_config.json: demo config with three
cameras and a mix of atomic + composite tasks (task_ids=[0,12,267]).
- pyproject.toml: new `robocasa` extra, [tool.uv.sources] pin for robocasa
+ scoped robosuite-master, conflicts vs libero/urdf, and override-
dependencies (numba, scipy, protobuf, draccus) so uv sync resolves.
Verified by running `opentau-train --config_path=.../pi05_robocasa_eval_config.json`
end-to-end on a 2-GPU box with `MUJOCO_GL=osmesa`: env construction,
preprocess_observation key contract, policy.select_action, AsyncVectorEnv
worker rollouts (150 steps × PickPlaceCounterToCabinet / OpenDrawer /
CerealAndBowl), and per-group eval aggregation all succeed; eval_info.json
+ per-task MP4s land in cfg.output_dir.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| "dataset_mixture": { | ||
| "datasets": [ | ||
| { | ||
| "repo_id": "physical-intelligence/libero" |
There was a problem hiding this comment.
Do we have an example robocasa dataset?
shuheng-liu
left a comment
There was a problem hiding this comment.
Thanks for putting this together — the LIBERO-shaped wrapper, per-rank fan-out, and pyproject conflict declarations all look reasonable. A few findings before merge; the first two will fail at runtime.
Bugs
1. OrganizeMetalicUtensils in the registry is a typo and will fail create_env.
src/opentau/envs/robocasa_tasks.py:270 lists "OrganizeMetalicUtensils" (one l). Upstream robocasa registers the class as OrganizeMetallicUtensils (two ls) — the filename organize_metalic_utensils.py is robocasa's own filename misspelling, but the class inside is class OrganizeMetallicUtensils(Kitchen). robosuite.make(env_name="OrganizeMetalicUtensils") will raise. The pyproject.toml typo override that whitelists "Metalic" is masking the bug rather than fixing it — please fix the spelling in the registry and drop the override.
2. The 301-entry composite list hasn't been cross-checked against upstream class names.
The atomic block looks right (matches ATOMIC_TASK_DATASETS), but I spot-checked the composite block and at least one name is wrong (#1 above). Given the size and the append-only contract, please regenerate this list programmatically from robocasa.environments.ALL_KITCHEN_ENVIRONMENTS (which is keyed by __name__ of registered classes) and commit the script that produces it, so the registry can be re-verified when upstream adds tasks. Without that, every bad entry is a silent landmine — the wrapper resolves task_ids long before create_env runs.
Significant
3. No tests. 2,682 lines of new code, zero tests. At minimum:
- An import-smoke test for
opentau.envs.robocasa_tasks(no robocasa install needed) that assertslen(ROBOCASA_TASKS) == 326, no duplicates, and round-tripstask_index(task_name(i)) == i. - A gated test (skip unless robocasa is installed) that imports
robocasa.environments.ALL_KITCHEN_ENVIRONMENTSand asserts every name inROBOCASA_TASKSis registered upstream — this would have caught #1 and any other typos.
4. configs/robocasa.py is dead code as wired.
scripts/robocasa_eval/eval.py:69 annotates cfg: TrainPipelineConfig, not TrainConfigWithRoboCasaEval, so RoboCasaEvalConfig / TrainConfigWithRoboCasaEval are never instantiated, the chunk_usage validator never runs, and cfg.robocasa.* would AttributeError if anyone tried to use it. Either delete the module or switch the eval entrypoint's annotation to TrainConfigWithRoboCasaEval and have eval.py use cfg.robocasa.*.
5. scripts/robocasa_eval/eval.py duplicates scripts/eval.py almost verbatim — the only deltas are an isinstance(cfg.env, RoboCasaEnv) guard and return_episode_data=False. Since make_envs(cfg.env, ...) already dispatches by env type, opentau-eval should just work for RoboCasa today. Either:
- delete the duplicate and document that
opentau-evalhandles both, or - if you want a separate console command, add
opentau-robocasa-eval = "opentau.scripts.launch:robocasa_eval"in[project.scripts]and alaunch.pywrapper. As-is the new file is only reachable via directaccelerate launchand the PR description's "opentau-eval-compatible entrypoint" doesn't actually exist.
6. Naming collision. RoboCasaEnv is both the draccus config (envs/configs.py:189) and the gym.Env wrapper (envs/robocasa.py:112). LIBERO avoids this by naming the wrapper LiberoEnv (config) vs LiberoEnv only existing as a config — the env wrapper has a different class name. Suggest renaming the gym.Env subclass to RoboCasaGymEnv or RoboCasaSimEnv to match the LIBERO module layout and avoid from ... import RoboCasaEnv ambiguity.
7. reset() tears down and rebuilds the MuJoCo env every call (envs/robocasa.py:295-300). The comment says robosuite "doesn't expose a re-seed API" but env.reset() in current robosuite accepts a new layout via hard_reset=True, and the unconditional rebuild adds seconds-to-tens-of-seconds of XML compilation per episode. At n_episodes=50 × 326 tasks this is a measurable chunk of eval wall-clock. At minimum, only rebuild when seed actually changes.
8. Two upstream-private calls. self._env._check_success() (step) and self._env._get_observations() (render) are private API. RoboCasa has public equivalents — env.is_success() / env.observe() — please prefer them, or pin a specific robocasa commit ({ git = ..., rev = "<sha>" } instead of branch = "main") so the wrapper doesn't silently break when upstream renames either.
9. Pinning branch = "main" for robocasa and robosuite in [tool.uv.sources] is fragile — every uv sync resolves a different commit. Please pin a rev = "<sha>" for both so this branch is reproducible.
Minor
RoboCasaEvalConfig.task: str = "PnPCounterToCab"(configs/robocasa.py:39) —PnPCounterToCabis the legacy robosuite alias; the registry usesPickPlaceCounterToCabinet. Pick one convention everywhere (I'd suggest the dataset-registry form, since that's what the integer index resolves to).pyproject.toml:151-152comments still sayscripts/robocasa/server.py; the dir was renamed toscripts/robocasa_eval/.src/opentau/scripts/robocasa_eval/server.py:39Run block sayspython -m opentau.scripts.robocasa_server_async— should beopentau.scripts.robocasa_eval.server.configs/robocasa.py:69robocasa: RoboCasaEvalConfig = None— annotate asRoboCasaEvalConfig | None.RoboCasaEvalConfig.task_namesis set in__post_init__but isn't a declared field, soasdict(cfg)won't include it (consistency surprise vs. other configs).is_atomic/is_compositesilently acceptbool(sinceisinstance(True, int));task_namecorrectly rejects it — make them consistent.f"Chunk usage must be between 1 and {self.action_chunk=}"(configs/robocasa.py:88) — the=debug spec yieldsself.action_chunk=10in the message; drop the=.- PR checklist: this PR adds new env plumbing only, so the "policy-related changes" box being unchecked is fine — just confirm you don't need the nightly regression test signal before merge.
Happy to re-review once the registry is regenerated and the duplicate eval script / dead config module are resolved.
Generated by Claude Code
|
@claude fix all points except 5 and 8. |
What this does
Adds a RoboCasa env wrapper mirroring the LIBERO layout so policies can train/eval against the kitchen benchmark with the same TrainPipelineConfig plumbing (env.type=robocasa, task_ids fan out per accelerator rank). Bundles an opentau-eval-compatible entrypoint plus a pyproject
robocasaextra mutually exclusive withlibero(numpy 1.x vs 2.x).Why: opentau-train so far only had LIBERO. RoboCasa unlocks 25 atomic + 301 composite kitchen tasks for the same pi0.5/pi0.7 checkpoints with no eval loop changes.
What:
robocasaas a draccus EnvConfig; resolve a flattask(class names) and/ortask_ids(int indices) into a fanned list per rank.robocasato avoid shadowing the upstreamrobocasapackage on accelerate's sys.path[0].robocasaextra, [tool.uv.sources] pin for robocasaVerified by running
opentau-train --config_path=.../pi05_robocasa_eval_config.jsonend-to-end on a 2-GPU box withMUJOCO_GL=osmesa: env construction, preprocess_observation key contract, policy.select_action, AsyncVectorEnv worker rollouts (150 steps × PickPlaceCounterToCabinet / OpenDrawer / CerealAndBowl), and per-group eval aggregation all succeed; eval_info.jsonHow it was tested
running pi05 robocasa run
Checklist
Note: Before submitting this PR, please read the contributor guideline.