Skip to content

bad performace in robotwin evaluation #11

@justimyhxu

Description

@justimyhxu

Hi, thanks for the excellent work and for releasing the codebase!

I followed the README instructions to set up the evaluation environment for RobotWin with Motus, and tested the rollout trajectories on the task “put the object in the cabinet.” However, I observed a noticeable discrepancy between the reported and reproduced performance.

Test results:

  • Reproduced success rate: ~45%
  • Reported in the paper (supplementary):
    • ~88% in the clean setting
    • ~71% in the randomized setting

I would like to ask:

  1. Are there any additional evaluation details (e.g., environment version, random seed handling, number of rollouts, or success criteria) that might affect this result?
  2. Were the reported numbers averaged over multiple runs or random seeds?
  3. Have you tested the reproducibility of this task multiple times on a fresh environment to confirm the reported performance?

I would really appreciate any guidance on what might cause this gap, or pointers to specific evaluation settings to double-check. I’m happy to provide more details (logs, configs, seeds) if helpful.

Thanks again for sharing such a great project!

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions