Skip to content

Latest commit

 

History

History
348 lines (320 loc) · 12.3 KB

File metadata and controls

348 lines (320 loc) · 12.3 KB
layout project_page
permalink /
title MOPS: Multi-Object Photoreal Simulation Dataset for Computer Vision in Robot Manipulation
authors Maximilian X. Li, Paul Mattes, Nils Blank, Rudolf Lioutikov
affiliations Intuitive Robots Lab, Karlsruhe Institute of Technology, Germany
paper ./static/Li2026_MOPS.pdf
code https://github.com/LiXiling/mops-data

Abstract

Datasets bridging computer vision and robotics by providing high-quality visual annotations in manipulation-relevant scenes remain limited. This work introduces the Multi-Object Photoreal Simulation (MOPS) dataset, which provides comprehensive ground truth annotations for photorealistic simulated environments. MOPS employs a zero-shot asset augmentation pipeline based on Large Language Models (LLM) to automatically normalize 3D object scale and generate part-level affordances. The dataset features pixel-level segmentations for tasks crucial to robotic perception, including fine-grained part segmentation and affordance prediction (e.g., "graspable" or "pushable"). By combining detailed annotations with photorealistic simulation, MOPS generates a vast, diverse collection of scenes to accelerate progress in robot perception and manipulation. We validate MOPS through vision and robot learning benchmarks.

Alpha
Early Alpha Release — MOPS is under active development. The public API may change and some features are still in progress.
⚙️ mops-data — Image generation in ManiSkill3 Available 🤖 mops-il — Full robot trajectories in RoboCasa v0.1 Coming Soon

Annotation Modalities

MOPS provides rich, multi-modal ground truth for every scene

RGB Render
RGB
Depth Map
Depth
Surface Normals
Surface Normals
Part / Affordance Segmentation
Segmentation

Key Features

🎨

Photorealistic Simulation

High-quality visual rendering via ManiSkill3 and SAPIEN, optimized for computer vision tasks in robotic manipulation.

🤖

LLM-Powered Annotation

Zero-shot asset augmentation pipeline using large language models for automatic part-level labeling and semantic understanding.

🏷️

Pixel-Level Segmentation

Detailed ground truth for fine-grained part segmentation and affordance prediction (e.g., graspable, pushable).

🏠

Diverse Environments

Rich indoor scenes including kitchen environments, cluttered tabletops, and isolated object scenarios at scale.


Technical Overview

Asset Pipeline

Normalized asset management across multiple 3D libraries with automatic part-level annotation and semantic scene understanding.

Multi-Modal Ground Truth

Comprehensive annotations including RGB, depth, surface normals, segmentation masks, affordance maps, and 6D pose information.

Simulation Framework

Built on ManiSkill3 and SAPIEN for physics-accurate simulation with photorealistic rendering and programmable scene generation.


Dataset Comparison

MOPS provides significantly broader taxonomic coverage than existing datasets

Dataset Level Aff. Labels Obj. Cat. Objects
RGB-D Part Part 7 17 105
3D-AffNet Part 16 23 22,949
MOPS-Partnet Part 24 46 2,345
MOPS-Robocasa Object 44 101 1,008
MOPS (Total) Mixed 56 137 3,353

While 3D-AffNet has more instances, MOPS provides significantly higher taxonomic coverage across object categories and affordance types.


Robot Manipulation Results

Imitation learning on 24 RoboCasa tasks, evaluated over 10 environment seeds each

21.25%
Success Rate
RGB + MOPS Affordances
+7.92pp
Absolute Gain
over RGB-only baseline
Policy Inputs Success Rate Gain
RGB only 13.33%
RGB + MOPS Affordances 21.25% +7.92

MOPS affordance annotations provide a consistent boost to imitation learning performance across 24 RoboCasa manipulation tasks.


Getting Started

Prerequisites: Python 3.10  ·  CUDA-compatible GPU  ·  16 GB+ RAM

conda create -n mops python=3.10
conda activate mops

pip install mani_skill git clone https://github.com/LiXiling/mops-data cd mops-data pip install -e .

📖 Full Installation Guide →


Citation

If you use MOPS in your research, please cite our work

@article{li2026mops,
  title   = {Multi-Objective Photoreal Simulation (MOPS) Dataset
             for Computer Vision in Robot Manipulation},
  author  = {Maximilian Xiling Li and Paul Mattes and
             Nils Blank and Rudolf Lioutikov},
  year    = {2026}
}

This work is supported by the Intuitive Robots Lab at Karlsruhe Institute of Technology, Germany.