SDG6 Tracker

This repository contains the code for the paper:

Monitoring access to piped water and sanitation infrastructure in Africa at disaggregated scales using satellite imagery and self-supervised learning https://arxiv.org/abs/2411.19093

It provides a unified pipeline for:

pretraining DINO/DINOv2 backbones
extracting embeddings
k‑NN evaluation/classification
large‑scale inference on new imagery

The core design is a model‑agnostic CLI and adapter layer, so you can switch encoders without touching the data/metric code.

Repository layout

src/models: single‑file adapters (dino, dinov2, dinov3, prithvi, galileo)
src/sdg6: data loader, embedding extraction, k‑NN logic, CLI, inference
scripts/configs: YAML configs for train/eval/inference
scripts/slurm: SLURM launchers
scripts/training: Python wrappers for pretraining
scripts/analysis: Python analysis workflows (including converted notebooks)
outputs: generated artifacts (figures/, tables/, reports/)

External Sources

DINO adapter: https://github.com/facebookresearch/dino (Caron et al., 2021, https://arxiv.org/abs/2104.14294)
DINOv2 adapter: https://github.com/facebookresearch/dinov2 (Oquab et al., 2023, https://arxiv.org/abs/2304.07193)
DINOv3 adapter: https://github.com/facebookresearch/dinov3 and HF model cards under https://huggingface.co/facebook
Prithvi adapter: TerraTorch registry (https://github.com/IBM/terratorch) + IBM/NASA Geospatial checkpoints (https://huggingface.co/ibm-nasa-geospatial)
Galileo adapter: adapted from https://github.com/nasaharvest/galileo

Data and Weights

Galileo model weights: hosted on Hugging Face (base model used here), e.g. https://huggingface.co/nasaharvest/galileo
DINO and DINOv2 weights: Zenodo, https://zenodo.org/records/19156085
Inference results: Zenodo, https://zenodo.org/records/19156085
Population density patches: Zenodo, https://zenodo.org/records/19156085
Afrobarometer imagery tiles: Zenodo, https://zenodo.org/records/14740420

Environment

Use uv to sync dependencies in this repo:

uv sync

Pretraining (DINOv2)

Pretraining is driven by a YAML config and a wrapper that launches torchrun.

Edit the pretraining config:

scripts/configs/dinov2_pt.yaml

Key fields:

dinov2_repo: local clone of the DINOv2 repo
config_file: DINOv2 training config YAML (satellite config)
output_dir: output directory for checkpoints and logs
gpus_per_node, master_port

Launch pretraining:

sbatch scripts/slurm/dinov2_pt.sbatch

k‑NN evaluation/classification

Evaluation uses the unified CLI and the DINOv2‑aligned k‑NN logic (cosine similarity + softmax voting).

Configure the eval:

scripts/configs/dinov2.yaml

Key fields:

data_dir: dataset with train/val/test class folders
weights: DINOv2 checkpoint (teacher_checkpoint.pth)
dinov2_config: DINOv2 training config
knn_classifier_path: where to save the k‑NN classifier artifact

Run k‑NN evaluation:

sbatch scripts/slurm/dinov2.sbatch

Outputs:

Embeddings (optional): output_dir/embeddings
Confusion reports: output_dir/confusion
k‑NN classifier artifact: path set in knn_classifier_path

Inference on arbitrary images

Inference uses the saved k‑NN classifier artifact and a DINOv2 encoder to produce predictions on new images.

Configure inference:

scripts/configs/dinov2_infer.yaml

Key fields:

weights, dinov2_config: DINOv2 encoder settings
knn_classifier_path: saved artifact from evaluation
input_dir or input_list: images to score
output_csv: where predictions are written

Run inference on many countries:

sbatch scripts/slurm/dinov2_infer.sbatch

This sbatch loops over data/countries and writes SW/PW predictions under data/inference/.

Adding a new model

Create a new adapter in src/models/.py that returns a ModelAdapter:
- transform(image, path=None)
- reader(path)
- collate_fn
- encode(batch) returning L2‑normalized features
Register it in src/models/init.py
Run the unified CLI:

python -m sdg6.cli --model <name> ...

Notes

The unified dataloader expects ImageFolder structure for classification: data_dir/train|val|test/<class_name>/*.tif
The k‑NN logic is aligned to DINOv2: cosine similarity + softmax voting (temperature configurable).

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
outputs		outputs
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
gitignore		gitignore
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDG6 Tracker

Repository layout

External Sources

Data and Weights

Environment

Pretraining (DINOv2)

k‑NN evaluation/classification

Inference on arbitrary images

Adding a new model

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SDG6 Tracker

Repository layout

External Sources

Data and Weights

Environment

Pretraining (DINOv2)

k‑NN evaluation/classification

Inference on arbitrary images

Adding a new model

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages