Objective: Semantic segmentation of Retrogressive Thaw Slumps (RTS) in Arctic satellite imagery for pan-arctic mapping (60-74°N).
Data versioning: Use semantic versioning (major.minor)
- Major: Added new training data or significant changes to processing
- Minor: No new training data but changes in existing labels
| Attribute | Value |
|---|---|
| Product | Global Quarterly PlanetScope Basemap |
| Temporal window | July–September (growing season) |
| Training year | 2024 composites |
| Inference year | 2025 composites |
| Bands | RGB (3 channels) |
| Resolution | 1.3–3.0 m (varies by latitude; ~3 m in study region) |
| Effective GSD | ~10 m (due to mosaic processing) |
| Coverage | Below 74°N only |
| Notes | Proprietary color-correction optimized for CV analytics |
| Source | Resolution | Channels/Derivatives | Purpose |
|---|---|---|---|
| Sentinel-2 | 10 m | NDVI, NIR | Differentiate vegetation from background |
| ArcticDEM | 2 m | Relative Elevation (RE), Shaded Relief (SR) | Terrain context |
| Source | Resolution | Volume | Use Case |
|---|---|---|---|
| Maxar (Yang et al. 2023) | 4 m | 900+ images | Cross-sensor generalisation experiments |
Note: Sentinel-2, Maxar, and other sensors exhibit domain shift from PlanetScope. Cross-sensor models require separate experimentation.
- ARTS (Arctic RTS compilation dataset) provides initial polygon locations
- Polygons were manually refined on 2024 PlanetScope imagery
| Category | Count | Notes |
|---|---|---|
| Positive tiles | TBD | From ARTS polygon refinement |
| Negative tiles | TBD | From ARTS confirmed negatives + hard negatives |
Include in label (AND):
- Visible headwall with cast shadow (visually concaved landscape)
- Barren slump floor (indicates active RTS)
- Clear morphological distinction from surrounding terrain
Exclude from label (OR):
- Features too small to show clear diagnostic characteristics at PlanetScope resolution
- Ambiguous features lacking obvious headwall shadow
- Inactive/stabilized RTS without barren floor
- Long debris tongue or mudflow that is distance from the slump floor (no thawing)
HOW TO DEFINE BOUNDARY: enforcing perfect consistency would be exhausting and arguably impossible for gradational geomorphic boundaries. The label boundary always try to follow the visible colour/texture contrast between disturbed and undisturbed ground; where no clear contrast exists, connect the endpoints of the headwall arc with a natural curve.
PATTERN: made sure the overall morphology read as a collapsed landscape (concaved).
This is critical for training data quality:
| Scenario | Action |
|---|---|
| Complete RTS fully within tile | Label as RTS |
| Partial RTS with both headwall and floor visible | Label as RTS |
| Partial RTS with only floor visible (no headwall in tile) | Ignore Index:255 |
| Partial RTS with only headwall visible (no floor in tile) | Ignore Index:255 |
Rationale: The model learns that "only barren floor associated with a headwall with shadow is RTS." Overlapping inference tiles ensure partial targets are detected where both features are visible. Use an Ignore Index (255) for pixels that are part of an RTS but lack the diagnostic headwall in that specific tile. This prevents the model from learning conflicting information while maintaining strict detection criteria. This is feature engineering with domain knowledge which especially important when training data is limited.
How to make the decision of whether a partial object should be trained: put the partial object to isolation (masking the adjacent tiles when labelling), if the partial object can be confirmed without neighbour tiles context, train it.
| Value | Meaning |
|---|---|
| 0 | Background (no RTS) |
| 1 | RTS (positive class) |
| 255 | Ignore |
The ignore values could be applied to several conditions, for example:
- partial object that can't be confirmed without neighbouring tile context, even if it's obvious if with context
- RTS-like features that can't be confirmed under the Planet image quality/resolution, even if it's clear in Esri basemap
| Parameter | Value |
|---|---|
| Tile size | 512 × 512 pixels |
| Spatial coverage | ~1.5 km × 1.5 km (at 3 m resolution) |
| CRS | EPSG:3857 (Pseudo-Mercator -- Spherical Mercator, Google Maps, OpenStreetMap, Bing, ArcGIS, ESRI) |
| Format | GeoTIFF |
| Grid alignment | Planet tile grid (same grid used for polygon refinement) |
{tile_id}.tif
Data lives in the GCS bucket, mounted via gcsfuse at training time. All paths are configured in the YAML config — no hardcoded paths in code:
data:
root: "gs://abruptthawmapping/training/v2.0" # configure per environment
rgb_dir: "PLANET-RGB"
extra_dir: "EXTRA"
labels_dir: "labels"
metadata_file: "metadata.csv"
splits_file: "splits.yaml"GCS directory layout:
{data.root}/
├── PLANET-RGB/
│ ├── 000001.tif
│ ├── 000002.tif
│ └── ...
├── EXTRA/
│ ├── 000001.tif ← multi-band, channel order per §9
│ ├── 000002.tif
│ └── ...
├── labels/
│ ├── 000001.tif
│ └── ...
├── metadata.csv
├── splits.yaml ← lists region names per split (see below)
└── splits_summary.json ← generated by scripts/create_splits.py; reports tile counts
splits.yaml example format — lists Arctic region names per split (not individual tile IDs): more details see domain/inference_domain.md
train:
- elias range tundra
val:
- arctic foothills tundra
test:
- beringia lowland tundraThe DataLoader cross-references metadata.csv (which has RegionName per tile) to resolve tile IDs for each split. scripts/create_splits.py generates both splits.yaml and splits_summary.json (with per-split tile counts); both are committed to the repo for reproducibility.
metadata.csv:
| Tile_id | centroid_lat | centroid_lon | TrainClass | RegionName | UIDs |
|---|---|---|---|---|---|
| 0001 | xx | xx | Negative | yakutia | |
| 0002 | xx | xx | Positive | yakutia | xxx, xxx |
Note: TrainClass values are Positive or Negative only. Hard negatives, if exist (e.g. from Lingcao Huang's model false positives), are stored as Negative — no separate class needed.
UIDs are RTS UIDs contained within the tile (used for tracking individual RTS); empty for Negative tiles.
RegionName is Arctic subregion defined by ecology/permafrost extent (boundaries provided by Heidi Rodenhizer, see files in '/domain').
split.yaml (e.g.):
train:
- elias range tundra
-
-
val:
- beringia lowland tundra
-
test:
- arctic foothills tundra
- PLANET-RGB: derived from PlanetScope Basemap
Image: (512, 512, 3) — RGB
Label: (512, 512, 1) — uint8, values {0, 1, 255}
EXTRA: derived from other sources, resolution resampled to match the RGB
Image: (512, 512, N) — multi-band GeoTIFF, band count N and band order chosen per experiment
Label: (512, 512, 1) — uint8, values {0, 1, 255}
Example bands currently in use: NDVI, NIR (from Sentinel-2), RE (Relative Elevation), SR (Shaded Relief) from ArcticDEM. These are examples, not a fixed contract — any multi-band EXTRA raster works (e.g. slope, aspect from ArcticDEM, NDMI, NBR, SAR, GEE satellite embeddings).
Channel selection at training time: Specified in the YAML config (see configs/baseline.yaml §channels) as a list of {name, band} entries. name is an arbitrary label used in normalization_stats.json and logs; band is the 0-indexed position inside the EXTRA GeoTIFF. Example:
channels:
extra:
- {name: ndvi, band: 0}
- {name: nir, band: 1}
- {name: re, band: 2}
- {name: sr, band: 3}Changing the stacked EXTRA set = edit the YAML. No code change. §9 below shows one concrete layout but is not authoritative.
Build order: Generate planet_rgb first for positive and negative samples, then derive EXTRA by extracting auxiliary channels with the planet_rgb extent (footprint).
All auxiliary data must be:
- Reprojected to EPSG:3857
- Resampled to match PlanetScope nominal resolution (~3 m) using bilinear interpolation for all channels
- Co-registered with RGB using GeoTIFF bounding box information
- Stacked as channels in an order you keep stable across the dataset (that same order is what you reference by
bandindex in the YAML config). §9 shows one example layout.
Both PLANET-RGB and EXTRA should store raw values
Both PLANET-RGB and EXTRA store raw values (no normalization applied to stored files).
Before computing statistics, apply percentage clipping to remove outliers. This is a one-off step during statistics computation (not applied per-image at load time):
- Run
scripts/check_data.pyfirst to visualise per-channel histograms and choose appropriate percentile bounds - PlanetScope pre-processing may have already handled normalisation and outliers, the normalisation in this step is mostly for value alignment with the pretrained backbone, rather than improving image quality.
- Pass clipping percentiles as arguments to
scripts/compute_normalization_stats.py - Clipping percentile decision should be decided by looking at the histogram. to use histogram to decide the clipping optimal value: calculate a histogram with all available postive and negative tiles and save the figure and raw data for manual assessment.
- The computed mean/std (on clipped data) are saved in
normalization_stats.json— the DataLoader uses only those stored values
Normalisation Should be calculated per-dataset, rather than per-image, to:
- Consistent inference behavior regardless of batch composition
- Satellite imagery has consistent acquisition conditions within a sensor
Normalisation for EXTRA should be done channel-specific to respect the physical signal meanings
Use per-dataset statistics computed once over the entire training set. This preserves absolute radiometric information critical for distinguishing RTS features.
Compute mean and standard deviation for each channel across all training tiles:
- For RGB: compute over all training images (both positive and negative)
- For EXTRA: compute separately for each channel respecting physical meaning
Store normalization statistics in a JSON file that travels with the model:
models/
├── experiment_name/
│ ├── normalization_stats.json
normalization_stats.json structure:
| Field | Description |
|---|---|
| dataset_version | Version string from data/version.json (e.g. "2.0"). This file is created as part of the data pipeline and committed to the repo. |
| computed_date | ISO timestamp of computation |
| n_tiles_used | Number of tiles used in computation |
| rgb.channel_names | Fixed: ["R", "G", "B"] |
| rgb.mean | List of 3 values, order matches rgb.channel_names |
| rgb.std | List of 3 values, order matches rgb.channel_names |
| extra.channel_names | List of N names declared in the config (e.g. ["ndvi", "nir", "re", "sr"]). Omit the whole extra block when training RGB-only. |
| extra.mean | List of N values, order matches extra.channel_names |
| extra.std | List of N values, order matches extra.channel_names |
Note: the above extracts mean and std for z-score standardisation, can also get mins and maxs for 0-1 normalisation.
compute: Loading terabytes of GeoTIFFs to calculate mean/std can be challenging. Suggestion: Use Welford’s Online Algorithm to compute mean/variance in a single pass without loading all data
| Estimation | Value |
|---|---|
| Within Positive tiles | 5–70% of tile area |
| Real Arctic prevalance | 0.1-0.5% |
| Split | Positive | Negative | Purpose |
|---|---|---|---|
| Train | 80% of N | 80% * M / f1 | Model training by curriculum learning |
| Val-Balanced | 10% of N | 10% * M / f2a | quick val during training |
| Val-Realistic | 10% of N | 10% * M / f2b | Early stopping, full val during training |
| Test-Realistic | 10% of N | 10% * M / f3 | Final test score to report |
| Factor | Value |
|---|---|
| N | Number of positive tiles |
| M | Number of negative tiles |
| f1 (training) | Start at 1:1, warm up to 1:20 |
| f2a (val-balanced) | 1:20 (physical pool size; subsampled to 1:1 at evaluation time) |
| f2b (val-realistic) | 1:200, 1:1000 |
| f3 (test-realistic) | 1:200, 1:1000 |
Prevent geographic data leakage between train/val/test splits. RTS in nearby tiles may share visual characteristics due to:
- Similar geology and permafrost conditions
- Similar vegetation patterns
- Correlated image acquisition conditions
Group tiles by Arctic subregion based on ecology/permafrost extent. Entire regions are assigned to train, val, or test — no region spans multiple splits.
Region definitions: Provided by Heidi Rodenhizer (boundaries complete). Storage format is geojson. see files in '/domain'.
Implementation: scripts/create_splits.py reads metadata.csv (which has RegionName per tile) and assigns each region to train/val/test according to the target ratios. The output splits.yaml is committed to the repo for reproducibility.
- Whole-region assignment and 80/10/10 tile-count targets cannot both be satisfied exactly when regions vary in size and RTS density. ‘scripts/create_splits.py’ resolves conflicts using the following priority order. Constraints earlier in the list take precedence; when one fails, the script exits with an error rather than silently producing a degenerate split.
- Test set minimum positives — Test set must contain at least 100 positive tiles to enable statistically meaningful PR-AUC reporting at 1:1000 prevalence. If no whole-region assignment achieves this, fail loudly.
- Validation set ecoregion diversity — Val set must span at least 2 distinct ecoregions, so that early-stopping decisions are not tied to a single regional artifact. If only one region can be assigned to val without breaking constraint 1, fail loudly.
- Train set positive coverage — Train set should hold at least 70% of total positive tiles. Below this, emit a warning; do not fail.
- Tile-count ratio drift tolerance — Once constraints 1–3 are satisfied, accept up to ±10% drift from the 80/10/10 target (train: 70–90% of total tiles). Beyond this, fail loudly.
- Geographic priority: assign the largest, most RTS-dense regions to train, the most diverse subset to val, and morphologically representative regions to test. The script logs its assignment reasoning.
Outputs:
- splits.yaml — region assignments per split (committed) splits_summary.json — per-split tile counts, positive counts, region list, observed vs. target drift, and the constraint-resolution log (committed)
If the available regions cannot satisfy constraints 1–2, the project needs more labelled regions before splitting — not a softer split rule.
- ARTS confirmed negatives: Known non-RTS locations
- Hard negatives: False positive locations from Lingcao Huang's model (could be a source, but no plan for implement now)
Both are stored with TrainClass = Negative in metadata.csv. No separate class distinction is needed in code — treated identically during sampling.
Negative samples can be inflated on-the-fly through augmentation to achieve desired imbalance ratios. See Training Guide for augmentation strategy.
Run before training:
- All positive tiles contain RTS pixels (label sum > 0)
- All negative tiles contain no RTS pixels (label sum = 0)
- RGB values are in valid range (0-255 for uint8)
- All tiles have matching image and label dimensions
- No NaN or infinite values in EXTRA channels
- GeoTIFF metadata (CRS, bounds) is consistent across tiles
- metadata.csv has entries for all tiles
- Spatial blocking is respected (no region spans train/val/test)
RGB band order is fixed. EXTRA is declared per-experiment in the YAML config — the table below is one example, not a contract. Keep whatever layout you write to disk consistent across the dataset and referenced correctly in configs/*.yaml §channels.extra.
| Index | Channel |
|---|---|
| 0 | Red |
| 1 | Green |
| 2 | Blue |
| Index | Channel | Source |
|---|---|---|
| 0 | NDVI | Sentinel-2 |
| 1 | NIR | Sentinel-2 |
| 2 | Relative Elevation (RE) | ArcticDEM |
| 3 | Shaded Relief (SR) | ArcticDEM |
Other examples: slope, aspect, NDMI, NBR, SAR backscatter, GEE satellite embeddings. Any combination works as long as the YAML points at the right band indices.
| Value | Meaning |
|---|---|
| 0 | Background |
| 1 | RTS |
| 255 | ignore |