Identify unmapped waterways in the Philippines by comparing terrain-modeled stream networks (derived from a 30m DEM) against existing OpenStreetMap data. Output is a prioritized grid of cells where OSM coverage lags the terrain model — ready for import into MapRoulette or the HOT Tasking Manager.
- Download — fetch Copernicus DEM 30m tiles, OSM waterways, and named lake polygons for the target area via Overpass API (or a local file via
--osm-file). - Model streams — run a hydrological analysis in GRASS GIS using single-flow-direction (D8) routing (flow accumulation → stream extraction → Strahler order). Named lakes (≥ 1 km²) are masked out before watershed analysis so DEM streams are not routed through large water bodies. Optionally, OSM waterways are burned into the DEM (
--carve) before watershed analysis so the modeled network follows known channels. - Overlay — intersect modeled streams and OSM waterways with a 200 m grid; compute total mapped length per cell.
- Score gaps — compute
delta_m = modeled_length − osm_lengthper cell; assign priority (low / medium / high) based on density and stream order. - Export — write
gap_analysis.geojson(WGS84) for use in tasking platforms.
| Tool | Install |
|---|---|
| GRASS GIS 8.4 | grass.osgeo.org — macOS app at /Applications/GRASS-8.4.app |
| Python 3 | system or pyenv |
| geopandas, pandas | pip3 install geopandas pandas |
| GDAL / ogr2ogr | bundled with GRASS or brew install gdal |
| osmium-tool | arch -arm64 brew install osmium-tool (Apple Silicon) |
Run the one-time setup script to install everything:
bash 00_setup.shbash run_pipeline.sh --name <slug> --bbox "<S> <W> <N> <E>" [options]| Option | Default | Description |
|---|---|---|
--name |
required | Short slug used for output paths (e.g. siquijor) |
--bbox |
required | Bounding box as "S W N E" in decimal degrees |
--threshold |
200 | Flow accumulation threshold (cells) for stream extraction |
--cell-size |
200 | Grid cell size in metres |
--skip-download |
off | Skip download step if data already present |
--osm-file |
off | Path to local OSM file (.pbf, .gpkg, .shp, .geojson); skips Overpass |
--carve |
off | Burn OSM waterways into DEM before watershed analysis; improves alignment of modeled streams with mapped channels |
--carve-width |
90 | Width of carved channel in metres (should be ≥ 30 — one DEM pixel) |
--carve-depth |
5.0 | Depth of carved channel in metres |
--mexp |
0 | Montgomery–Foufoula–Georgiou exponent for r.stream.extract; 0 disables (standard flow accumulation threshold) |
# First run — downloads everything
bash run_pipeline.sh --name siquijor --bbox "9.0 123.4 9.4 123.8"
# Re-run after tweaking threshold, skip re-download
bash run_pipeline.sh --name siquijor --bbox "9.0 123.4 9.4 123.8" --threshold 500 --skip-download
# Different island
bash run_pipeline.sh --name catanduanes --bbox "13.4791 123.9807 14.1459 124.4971"
# Large area — use Geofabrik PBF instead of Overpass (avoids timeout)
bash run_pipeline.sh --name luzon --bbox "13.5 119.5 18.7 122.5" \
--osm-file /path/to/philippines-latest.osm.pbf
# PBF + skip DEM re-download if tiles already cached
bash run_pipeline.sh --name luzon --bbox "13.5 119.5 18.7 122.5" \
--osm-file /path/to/philippines-latest.osm.pbf --skip-download
# Recommended for Philippines runs: OSM stream burning
bash run_pipeline.sh --name ilocos-norte --bbox "17.675 120.339 18.680 120.987" \
--osm-file philippines-latest.osm.pbf --skip-download --cell-size 500 \
--carveOutput is written to output/<name>/gap_analysis.geojson.
| Field | Description |
|---|---|
cat |
GRASS grid cell ID |
modeled_length_m |
Total DEM-modeled stream length in cell (m) |
osm_length_m |
Total OSM-mapped waterway length in cell (m) |
delta_m |
Gap in metres (modeled − osm, clipped to 0) |
delta_density |
delta_m / cell_side_m — scale-invariant gap density (same value in 200 m or 500 m cells for equal gap length) |
cell_area_m2 |
Cell area (m²) — partial for edge cells |
max_strahler |
Highest Strahler stream order in cell |
priority |
low / medium / high |
delta_density |
Priority |
|---|---|
| 0 – 0.5 | low |
| 0.5 – 1.0 | medium |
| > 1.0 | high |
Cells with max_strahler ≥ 3 and delta_m > 0 are promoted to high regardless of density — significant tributaries are never buried in low/medium.
river, stream, canal, drain, ditch, tidal_channel
.
├── run_pipeline.sh entry point
├── 00_setup.sh one-time dependency install
├── 00_download.sh download DEM tiles + OSM data for any bbox
├── 01_grass_hydro.sh GRASS GIS spatial pipeline
├── 02_grid_analysis.py gap scoring and GeoJSON export
├── data/
│ ├── srtm/ Copernicus DEM 30m tiles (shared cache, gitignored)
│ ├── osm/ OSM waterway files per area (gitignored)
│ └── boundary/ Admin boundaries (gitignored)
└── output/
└── <name>/ Per-area outputs including gap_analysis.geojson
The grass/ GRASS database is created at runtime and is gitignored.
- DEM tiles are cached in
data/srtm/and reused across runs. Multiple areas in the same UTM zone share tiles without conflict. - When invoked via
run_pipeline.sh, only the tiles that cover the target bbox are imported. The GRASS computational region is then clipped to the bbox + 5 km buffer, so grid cells and stream analysis are confined to the target area rather than the full tile extent. - Stream threshold of 200 cells ≈ 0.18 km² contributing area. Increase to 500–1000 for noisier/flatter terrain.
- Copernicus DEM 30m is available from a public AWS S3 bucket — no authentication required.
- For large areas (e.g. whole Philippines), Overpass times out. Download
philippines-latest.osm.pbffrom Geofabrik and pass it via--osm-file; the pipeline filters and clips it automatically. - Named lake polygons (
lakes_<name>.gpkg) are cached indata/osm/and reused on re-runs. If no named lakes exist in the bbox the step is silently skipped. --carveburns OSM waterways into the DEM before watershed analysis usingr.carve. This improves main-stem alignment significantly on flat/coastal terrain. Width must be ≥ 30 m (one DEM pixel) to have any effect; the default 90 m (3 pixels) gives reliable results.--carveis off by default so existing single-island runs reproduce unchanged results.