PaveBench is a public benchmark harness for aerial pavement segmentation and pavement-to-polygon extraction.
The benchmark is built around a practical contractor workflow: given a top-down aerial image, return a valid paved-surface mask or polygon with interior cutouts for buildings, islands, grass, trees, and other non-paved holes.
pure_segmentation: image segmentation models that output masks or polygons.vlm_polygon: vision-language models that directly emit JSON polygon coordinates.image_generation_mask: image-generation/editing models that paint a mask image.hybrid_production: production-style pipelines using declared aids such as clicks, user boxes, parcel vectors, image cleaning, ensembles, or post-processing.
The tracks are intentionally separate. Direct VLM polygon generation, true mask segmentation, and hybrid post-processed systems test different capabilities.
semantic_mask: segment visible pavement in the image.click_connected_polygon: return the paved component containing the target click.scope_polygon: future task for matching a specific commercial project scope.
Do not use Google Maps Static API, Map Tiles API, or cached Google imagery in public PaveBench datasets.
Maps Static API is a Google Maps Platform Core Service, and Google Maps Platform terms restrict caching, extracting, and creating content from Google Maps Content. The terms also prohibit using Google Maps Content to train, test, validate, or fine-tune ML/AI models. See docs/google-static-maps-use.md.
Saved ProPaving human-traced paved surfaces are useful as guide material. They can seed candidate cases, rough boundaries, or reviewer starting points.
They are not automatically benchmark truth. A public PaveBench gold label must be reviewed against redistributable imagery and marked as reviewed_gold in metadata.
python3 -m pytest
python3 -m pavebench.cli oracle \
--allow-guide \
--manifest dataset/v0/manifest.example.jsonl \
--out results/oracle.example.jsonl
python3 -m pavebench.cli empty \
--allow-guide \
--manifest dataset/v0/manifest.example.jsonl \
--out results/empty.example.jsonl
python3 -m pavebench.cli score-manifest \
--allow-guide \
--manifest dataset/v0/manifest.example.jsonl \
--predictions results/oracle.example.jsonl \
--out results/oracle-score.json
python3 -m pavebench.cli score \
--allow-guide \
--case dataset/v0/cases/demo_human_trace_guided/metadata.json \
--predictions dataset/v0/cases/demo_human_trace_guided/predictions.example.jsonl \
--out results/demo-score.jsonCreate a review-guide case from a saved human trace:
python3 -m pavebench.cli case-from-trace \
--trace path/to/human-trace.geojson \
--case-id pb_us_example_001 \
--image-width 1024 \
--image-height 1024 \
--out-dir dataset/v0/cases/pb_us_example_001The generated case is marked role: guide and reviewStatus: needs_gold_review. It must be reviewed against redistributable public imagery before becoming benchmark truth. Scoring guide cases requires --allow-guide; reviewed benchmark cases should not use that flag.
Predictions are JSONL:
{
"caseId": "demo_human_trace_guided",
"task": "click_connected_polygon",
"track": "vlm_polygon",
"boundary": [[1, 1], [9, 1], [9, 9], [1, 9]],
"cutouts": [],
"latencyMs": 1200,
"costUsd": 0.01,
"metadata": {"model": "example"}
}Mask predictions are also supported:
{
"caseId": "demo_human_trace_guided",
"task": "semantic_mask",
"track": "pure_segmentation",
"maskPath": "relative/or/absolute/mask.png",
"metadata": {"model": "example"}
}See docs/submission-format.md for the full contract.
This repository is a v0 harness scaffold. It includes evaluator code, manifest scoring, oracle and empty baselines, human-trace case scaffolding, documentation, and a toy synthetic guide case. It does not yet include real public-domain aerial imagery.