Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning

Official implementation of SBP (Seeing the Bigger Picture) (ICRA 2026).

Installation

git clone --recursive https://github.com/ExistentialRobotics/SBP.git
bash setup.sh
conda activate sbp

All dependencies including PyTorch Geometric and xformers are installed automatically by setup.sh.

Mapping Dataset Generation

1. Generate RGB-D Dataset

Render RGB-D data from ManiSkill environments using camera poses in dataset/camera_params/. Output is HDF5.

python dataset/render_from_camera_poses.py \
    --task set_table --build_config_idx 37 --task_plan_idx 0 --output_dir data/mapping

See the mshab repository for details on task parameters.

2. Generate Vision Embeddings

Extract DINOv3 or EVA-CLIP embeddings and write them back to the HDF5 file.

python dataset/generate_embedding.py --model eva_clip \
    --input_path data/mapping/set_table/<episode_name>.hdf5

Latent Mapping

Train the latent map on the generated HDF5 dataset:

python mapping/train_latent_map.py --config mapping/config/config.yaml

Override paths with --dataset_dir and --output_dir. Visualize results at localhost:8080 via Viser.

To train on multiple episodes, place all episode_*.hdf5 files in the same --dataset_dir directory — they will be loaded automatically.

Map-Conditioned Policy Learning

Training a map-conditioned BC policy requires two prerequisites:

Latent maps — Train your own following the Latent Mapping section above, or download pre-trained maps:
```
huggingface-cli download suk063/SBP models --repo-type dataset --local-dir data/
```
Pre-trained maps are available at: https://huggingface.co/datasets/suk063/SBP/tree/main/models
Expert demonstrations — Generated via PPO RL policies from the mshab repository. You can download our pre-generated demonstrations from HuggingFace:
```
huggingface-cli download suk063/SBP demonstrations --repo-type dataset --local-dir data/
```
The full dataset is also available at: https://huggingface.co/datasets/suk063/SBP

Training

Train a map-conditioned policy (e.g., set_table task):

python policy/train_bc.py policy/configs/set_table.yml \
    algo.data_dir_fp=<path_to_demo_data>

Task-specific configs are available under policy/configs/ (e.g., set_table.yml, prepare_groceries.yml, tidy_house.yml).

Alternatively, use the provided training script which handles path setup, resumption, and environment configuration automatically:

bash scripts/run_train.sh set_table

Evaluation

Evaluate a trained policy checkpoint:

python policy/eval.py policy/configs/set_table.yml \
    ckpt_path=<path_to_checkpoint>

Alternatively, use the provided evaluation script:

bash scripts/run_eval.sh set_table

Acknowledgement

We thank the authors of ManiSkill3 and mshab for their open-source contributions!

Citation

@article{kim2025seeing,
  title={Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning},
  author={Kim, Sunghwan and Chung, Woojeh and Dai, Zhirui and Bhatt, Dwait and Shukla, Arth and Su, Hao and Tian, Yulun and Atanasov, Nikolay},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
dataset		dataset
mapping		mapping
policy		policy
scripts		scripts
third_party		third_party
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning

Table of Contents

Installation

Mapping Dataset Generation

1. Generate RGB-D Dataset

2. Generate Vision Embeddings

Latent Mapping

Map-Conditioned Policy Learning

Training

Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning

Table of Contents

Installation

Mapping Dataset Generation

1. Generate RGB-D Dataset

2. Generate Vision Embeddings

Latent Mapping

Map-Conditioned Policy Learning

Training

Evaluation

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages