mops/index.md at main · intuitive-robots/mops

layout	project_page
permalink	/
title	MOPS: Multi-Object Photoreal Simulation Dataset for Computer Vision in Robot Manipulation
authors	Maximilian X. Li, Paul Mattes, Nils Blank, Rudolf Lioutikov
affiliations	Intuitive Robots Lab, Karlsruhe Institute of Technology, Germany
paper	./static/Li2026_MOPS.pdf
code	https://github.com/LiXiling/mops-data

Abstract

Datasets bridging computer vision and robotics by providing high-quality visual annotations in manipulation-relevant scenes remain limited. This work introduces the Multi-Object Photoreal Simulation (MOPS) dataset, which provides comprehensive ground truth annotations for photorealistic simulated environments. MOPS employs a zero-shot asset augmentation pipeline based on Large Language Models (LLM) to automatically normalize 3D object scale and generate part-level affordances. The dataset features pixel-level segmentations for tasks crucial to robotic perception, including fine-grained part segmentation and affordance prediction (e.g., "graspable" or "pushable"). By combining detailed annotations with photorealistic simulation, MOPS generates a vast, diverse collection of scenes to accelerate progress in robot perception and manipulation. We validate MOPS through vision and robot learning benchmarks.

Alpha

Early Alpha Release — MOPS is under active development. The public API may change and some features are still in progress.

⚙️ mops-data — Image generation in ManiSkill3 Available 🤖 mops-il — Full robot trajectories in RoboCasa v0.1 Coming Soon

Annotation Modalities

MOPS provides rich, multi-modal ground truth for every scene

RGB

Depth

Surface Normals

Segmentation

Key Features

🎨

Photorealistic Simulation

High-quality visual rendering via ManiSkill3 and SAPIEN, optimized for computer vision tasks in robotic manipulation.

🤖

LLM-Powered Annotation

Zero-shot asset augmentation pipeline using large language models for automatic part-level labeling and semantic understanding.

🏷️

Pixel-Level Segmentation

Detailed ground truth for fine-grained part segmentation and affordance prediction (e.g., graspable, pushable).

🏠

Diverse Environments

Rich indoor scenes including kitchen environments, cluttered tabletops, and isolated object scenarios at scale.

Technical Overview

Asset Pipeline

Normalized asset management across multiple 3D libraries with automatic part-level annotation and semantic scene understanding.

Multi-Modal Ground Truth

Comprehensive annotations including RGB, depth, surface normals, segmentation masks, affordance maps, and 6D pose information.

Simulation Framework

Built on ManiSkill3 and SAPIEN for physics-accurate simulation with photorealistic rendering and programmable scene generation.

Dataset Comparison

MOPS provides significantly broader taxonomic coverage than existing datasets

Dataset	Level	Aff. Labels	Obj. Cat.	Objects
RGB-D Part	Part	7	17	105
3D-AffNet	Part	16	23	22,949
MOPS-Partnet	Part	24	46	2,345
MOPS-Robocasa	Object	44	101	1,008
MOPS (Total)	Mixed	56	137	3,353

While 3D-AffNet has more instances, MOPS provides significantly higher taxonomic coverage across object categories and affordance types.

Robot Manipulation Results

Imitation learning on 24 RoboCasa tasks, evaluated over 10 environment seeds each

21.25%

Success Rate

RGB + MOPS Affordances

+7.92pp

Absolute Gain

over RGB-only baseline

Policy Inputs	Success Rate	Gain
RGB only	13.33%	—
RGB + MOPS Affordances	21.25%	+7.92

MOPS affordance annotations provide a consistent boost to imitation learning performance across 24 RoboCasa manipulation tasks.

Getting Started

Prerequisites: Python 3.10 · CUDA-compatible GPU · 16 GB+ RAM

conda create -n mops python=3.10 conda activate mops

pip install mani_skill git clone https://github.com/LiXiling/mops-data cd mops-data pip install -e .

📖 Full Installation Guide →

Citation

If you use MOPS in your research, please cite our work

@article{li2026mops,
  title   = {Multi-Objective Photoreal Simulation (MOPS) Dataset
             for Computer Vision in Robot Manipulation},
  author  = {Maximilian Xiling Li and Paul Mattes and
             Nils Blank and Rudolf Lioutikov},
  year    = {2026}
}

This work is supported by the Intuitive Robots Lab at Karlsruhe Institute of Technology, Germany.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract

Annotation Modalities

Key Features

Photorealistic Simulation

LLM-Powered Annotation

Pixel-Level Segmentation

Diverse Environments

Technical Overview

Dataset Comparison

Robot Manipulation Results

Getting Started

Citation

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

Abstract

Annotation Modalities

Key Features

Photorealistic Simulation

LLM-Powered Annotation

Pixel-Level Segmentation

Diverse Environments

Technical Overview

Dataset Comparison

Robot Manipulation Results

Getting Started

Citation