This document describes how to run the LabelAny3D pipeline on COCO dataset to generate 3D bounding box annotations.
From the repository root directory:
# Download COCO images + COCONUT annotations
# Note: COCONUT annotations for COCO training set requires more than one hour to process.
bash src/download_coco.sh
# Or download separately:
bash src/download_coco.sh --coco # Only COCO images
bash src/download_coco.sh --coconut # Only COCONUT annotations This creates:
dataset/coco/
├── images/
│ ├── train2017/
│ └── val2017/
└── annotations/
├── instances_train2017.json
├── instances_val2017.json
├── coconut_train.json
└── coconut_val.json
All commands should be run from the src/ directory:
cd srcCombines MoGe (scale-invariant) + DepthPro (metric) with RANSAC alignment.
python batch_scripts/depth.py --start_index=0 --end_index=1000 --split=valSuper-resolution using InvSR.
python batch_scripts/enhance.py --start_index=0 --end_index=1000 --split=valExtract individual object crops using COCONUT segmentation masks.
python batch_scripts/get_crops_enhanced.py --start_index=0 --end_index=1000 --split=valComplete occluded regions of object crops.
python batch_scripts/completion.py --start_index=0 --end_index=1000 --split=valEstimate viewing angle for each object.
python batch_scripts/elevation.py --start_index=0 --end_index=1000 --split=valPer-object 3D reconstruction using TRELLIS.
# Set compiler paths (required for TRELLIS)
export CC=$(which gcc)
export CXX=$(which g++)
python batch_scripts/reconstruction.py --start_index=0 --end_index=1000 --split=valAlign reconstructed objects into the scene using depth-guided placement.
python batch_scripts/whole.py --start_index=0 --end_index=1000 --split=valCombine all scene results into Omni3D format JSON.
python tools/combine_results.py --split=valOutput: ../experimental_results/COCO/COCO3D_val.json
All batch scripts support:
| Argument | Description | Default |
|---|---|---|
--start_index |
Start image index | 0 |
--end_index |
End image index | -1 (all) |
--split |
Dataset split | val |
experimental_results/COCO/
├── val/
│ ├── 000000000139/ # Scene folder (image ID)
│ │ ├── input.png # Original image
│ │ ├── cam_params.json # Camera intrinsics
│ │ ├── depth_map.npy # Aligned depth map
│ │ ├── depth_scene.ply # Scene point cloud
│ │ ├── depth_scene_no_edge.ply # Point cloud (no edge artifacts)
│ │ ├── bboxes.json # 2D bboxes from COCONUT
│ │ ├── 3dbbox.json # 3D bounding boxes
│ │ ├── enhanced/ # Super-resolved images
│ │ ├── crops/ # Object crops
│ │ │ ├── 0_chair_reproj.png # original object crop
│ │ │ ├── 0_chair_rgba.png # amodal-completed crop
│ │ │ └── 0_chair_crop_params.npy
│ │ ├── object_space/ # Per-object intermediate results
│ │ ├── reconstruction/ # 3D meshes
│ │ │ ├── 0_chair.glb
│ │ │ └── full_scene.glb
│ │ └── scene_bbox.mp4 # Rendered video with bboxes
│ └── ...
└── COCO3D_val.json # Combined annotations (Omni3D format)
Render 3D scenes with bounding box overlays using Blender.
# Install trimesh into Blender's Python
blender --background --python-expr "import subprocess, sys; subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'trimesh'])"blender --background --python bpy_render/bpy_load_blender_pointmap_plot.py -- \
--root ../experimental_results/COCO/val [--start_idx 0] [--end_idx 10] [--verbose]Arguments:
| Argument | Description | Default |
|---|---|---|
--root |
Root directory containing scene folders | (required) |
--start_idx |
Start directory index | 0 |
--end_idx |
End directory index | last |
--verbose |
Show rendering progress | false |
Output: scene_bbox.mp4 in each scene folder - camera trajectory video with 3D bbox overlay.
For HPC clusters, you can process in parallel using SLURM array jobs:
#!/bin/bash
#SBATCH --array=0-9
#SBATCH --gres=gpu:1
START=$((SLURM_ARRAY_TASK_ID * 100))
END=$((START + 100))
python batch_scripts/depth.py --start_index=$START --end_index=$END --split=val