hiilikartta-data-service

Backend service for Hiilikartta / climate-map that calculates vegetation + soil carbon estimates for zoning-plan areas.

The service exposes a FastAPI HTTP API that accepts a zipped vector dataset (polygons), runs a PostGIS-backed calculation asynchronously, and stores results to a “state” Postgres database. The heavy spatial work (rasters, segment aggregation) is done against a separate PostGIS GIS database that already contains the required datasets.

Architecture (runtime)

API (app/main.py, FastAPI): HTTP endpoints, plan persistence, serves results.
Worker (app/saq_worker.py, SAQ): background jobs; calculates per-feature results and updates the state DB.
Redis: SAQ job queue + distributed GIS throttling semaphore.
State DB (Postgres): stores uploaded plans + calculation outputs (JSONB).
GIS DB (external PostGIS): provides rasters/segments/regions needed by the calculation.

docker-compose.*.yml spins up everything except the GIS DB.

Tech stack (selected)

Web/API: FastAPI, Uvicorn (dev) / Gunicorn+UvicornWorker (prod)
Async DB: SQLAlchemy 2.x (async) + asyncpg
Migrations: Alembic (+ alembic-postgresql-enum)
Geo: GeoPandas + Shapely + Rasterio stack
Async jobs: SAQ + Redis
Auth: Zitadel token introspection via Authlib + Requests

API (service usage)

Endpoints

Large payload endpoints (GET /calculation, GET /plan, GET /plan/external) return gzip-compressed bodies (Content-Encoding: gzip). Many HTTP clients handle this automatically; with curl use --compressed.

POST /calculation?id=<uuid>&visible_id=<string>&name=<string>&forestry_scenario=<1|2|3>
- multipart/form-data with field file (a zipped dataset readable by GeoPandas).
- Creates/updates a plan and enqueues a background job (calculate_piece).
- forestry_scenario is optional and defaults to 1.
- Auth is optional; if a valid token is provided, the plan is associated with that user.
GET /calculation?id=<uuid>
- 202 while processing, 200 when finished, 206 if the plan ended in an error state.
- Returns the stored forestry_scenario; when finished it also returns data.totals and data.areas (GeoJSON stored in DB).
GET /plan/external?id=<uuid>
- Public “share” endpoint: returns {id, name, forestry_scenario, report_data?}.

Authenticated (Zitadel) endpoints:

PUT /plan?id=<uuid>&visible_id=<string>&name=<string> (upload/replace plan data)
GET /plan?id=<uuid> (fetch a user’s plan)
DELETE /plan?id=<uuid>
GET /user/plans

FastAPI docs: GET /docs

Typical client flow

Upload a plan to POST /calculation ...
Poll GET /calculation?id=... until calculation_status == FINISHED
Parse data.totals + data.areas

Example (dev):

PLAN_ID=$(python -c 'import uuid; print(uuid.uuid4())')

curl --compressed -F "file=@tests/data/test-data-small-polygon.zip" \
  "http://localhost:8000/calculation?id=$PLAN_ID&visible_id=demo&name=Demo&forestry_scenario=1"

curl --compressed "http://localhost:8000/calculation?id=$PLAN_ID"

Calculation model

The latest implementation is documented in documentation/calculation_2026_03.md. Historical snapshots are in documentation/calculation_2025.md and documentation/calculation_2024.md. In short, for each polygon the calculator produces:

biomass base stock from segment Carbon in luke_mvmisegmentit_muuttujat_kokomaa (2021),
soil base stock from the weighted soil raster hiilikartta_maaperanhiili_2023_tcha,
future existing-land stocks by scaling those source stocks with the final 2026 curve tables keyed by Scen,
future deltas on changed land from annual sequestration coefficients (CSV),
outputs for nochange vs planned scenarios for current_year and 2030..2080 (5y steps),
the stored plan-level forestry_scenario in frontend-facing responses and report metadata.

Input expectations (high level)

From documentation/calculation_2026_03.md:

geometry: polygon/multipolygon (input assumed EPSG:4326; reprojected for area math)
zoning_code: land-use code used for coefficient lookup
optional land-cover shares (percentages) and soil-change factor; see the doc for defaults and accepted aliases

Data inputs

GIS DB (PostGIS) tables/rasters (see documentation/calculation_2026_03.md for details):

hiilikartta_kasvillisuudenhiili_2021_tcha
hiilikartta_maaperanhiili_2023_tcha
luke_mvmisegmentit_id_kokomaa
luke_mvmisegmentit_muuttujat_kokomaa
maakunta (geom, natcode)

Notes:

the latest biomass calculation uses luke_mvmisegmentit_muuttujat_kokomaa.Carbon as the actual biomass stock source and for scenario-1 cut detection
the latest soil calculation uses the 2023 soil raster as the actual soil stock source
the vegetation raster remains available as a 2021 GIS source dataset, but it is not used directly in the latest biomass stock scaling path

Runtime data files (loaded from /app/data by app/utils/data_loader.py and warmed into curve caches on API + worker startup). In production, /app/data is the private ${DATA_PATH} bind mount configured in Dokploy:

data/Hiilikartta_Veg_20260415.csv
data/Hiilikartta_Soil_20260415.csv
data/Hiilikartta_Kasvillisuuden_ja_maaperan_hiilensidonta_kayttotarkoitusluokittain_20260420.csv

Operational behavior (GIS throttling)

GIS operations are intentionally throttled to protect the GIS DB:

local (per-process) semaphore: GIS_LOCAL_MAX_CONCURRENT
distributed (cross-process) semaphore via Redis: GIS_DISTRIBUTED_MAX_CONCURRENT, GIS_SLOT_TTL
Postgres statement_timeout: GIS_STATEMENT_TIMEOUT_SECONDS

When the GIS DB is at capacity, jobs are re-enqueued later (GisRetryLaterError). If a single feature times out, the worker skips that feature and continues.

Large plans switch to a simplified GIS aggregation path. When editing raw SQL in app/db/gis.py, prefer CAST(:param AS type) over :param::type so SQLAlchemy + asyncpg bind parameters correctly in worker queries.

The soil-by-segment soil-raster lookup is performance-sensitive: keep the explicit raster bbox predicate (ST_ConvexHull(r.rast) && sample_point) so PostgreSQL can use the existing raster GiST index.

Each calculation logs one phase timing summary line with natcode, segment, soil-raster, curve-prep, segment-loop, and final-assembly timings.

Running locally (Docker Compose)

Prerequisites

Docker Engine / Docker Desktop + Compose v2 (docker compose)
Access to a PostGIS GIS DB with the required datasets

One-time: create the external Docker network used by docker-compose.dev.yml:

docker network create climate-map-network

Configure env

Create your local env file:

cp .env.template .env

At minimum you must set the GIS connection values:

GIS_PG_USER, GIS_PG_PASSWORD, GIS_PG_DB, GIS_PG_HOST, GIS_PG_PORT

Optional tuning:

SAQ_WORKERS_COUNT (worker process count; dev default is 3)

Safety rails:

Dev containers refuse to start unless STATE_PG_DB contains dev.
Tests refuse to run unless STATE_PG_TEST_DB contains test (tests run Alembic downgrade/upgrade against the test DB).

For authenticated endpoints, also set:

ZITADEL_DOMAIN, ZITADEL_CLIENT_ID, ZITADEL_CLIENT_SECRET

Start services

docker compose up --build

Default URLs (from .env.template):

API: http://localhost:${APP_PORT} (docs at /docs)
Jupyter: http://localhost:${NOTEBOOK_PORT} (token: NOTEBOOK_TOKEN)
SAQ Web UI: http://localhost:${SAQ_WEB_PORT}

State DB migrations (Alembic)

The state DB schema is managed via Alembic (alembic/). Migrations will attempt to create the required pgcrypto extension (for gen_random_uuid()). If your DB role cannot create extensions, enable it once manually:

docker compose exec state-db-dev sh -lc \
  'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "CREATE EXTENSION IF NOT EXISTS \"pgcrypto\";"'

Then run migrations:

docker compose exec app-dev poetry run alembic upgrade head

Production-ish Compose

docker-compose.prod.yml runs the API + worker + Redis and is designed to be attached to an existing reverse-proxy network (proxy-net) with Traefik.

To support running multiple stacks on the same Docker host (e.g. prod + test) without Redis cross-talk, the worker + Redis live on an internal per-stack network (app-net), and only the API is attached to proxy-net.

The prod API and worker run from the built image, not from a bind-mounted repository checkout. Dokploy should rebuild the image on deploy (pull_policy: build, build.no_cache: true), while private CSV inputs are still supplied through ${DATA_PATH}:/app/data.

By default, prod containers refuse to start if the state DB is not at the latest Alembic revision. To run migrations automatically on API startup, set STATE_DB_MIGRATION_MODE=upgrade (the worker is check-only and never runs migrations).

Key env vars:

DOMAIN (Traefik host rule)
APP_PORT (host port for the API container)
DATA_PATH (private host directory mounted to /app/data; must contain the active Hiilikartta CSV inputs)
REDIS_DATA_PATH (Redis persistence path for prod)
STATE_DB_MIGRATION_MODE (check to refuse start; upgrade to run alembic upgrade head on API startup)

Project structure

app/: application code
- app/main.py: FastAPI app + routes
- app/saq_worker.py: SAQ queue + worker functions
- app/calculator/: calculation implementation
- app/db/: async DB engines, GIS queries, state DB access, throttling
- app/auth/: Zitadel token introspection
alembic/: Alembic migrations for the state DB
data/: lookup tables + curve inputs used by the calculator
documentation/calculation_2026_03.md: authoritative latest calculation spec
documentation/calculation_2025.md: 2025 calculation snapshot
documentation/calculation_2024.md: legacy calculation snapshot
docker-compose.*.yml, docker-entrypoint*.sh: local/prod wiring
tests/: integration/smoke tests
sql/: reference SQL snippets (not the migration source of truth)

Development conventions

Python: 3.11 + Poetry (pyproject.toml)
Formatting: poetry run black .
Types: keep/extend existing type hints; avoid introducing untyped public APIs where practical
GIS DB safety: use the throttled helpers in app/db/gis.py (don’t open raw GIS sessions without a good reason)
Logging: module loggers in app/utils/logger.py use a single non-propagating RichHandler; keep propagation disabled to avoid duplicate lines under Gunicorn/SAQ/root logging

Devcontainer

.devcontainer/devcontainer.json uses docker-compose.dev.yml and starts: app-dev, worker-dev, redis-dev, state-db-dev, state-db-test.

It also configures common VS Code extensions for Python, Jupyter, Docker, and formatting.

Tests

Tests are few but cover the most important API flows:

tests/api/main_test.py: calculation lifecycle + output checks + retry/timeout behaviors
tests/modules/db/test_gis.py: smoke tests for GIS query helpers

Running tests requires:

a running state-db-test (started by docker-compose.dev.yml), and
a reachable GIS DB containing the required datasets (tests execute real GIS queries).

Run:

docker compose exec app-dev poetry run pytest

Name		Name	Last commit message	Last commit date
Latest commit History 422 Commits
.claude		.claude
.devcontainer		.devcontainer
alembic		alembic
app		app
documentation		documentation
notebooks		notebooks
sql		sql
tests		tests
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
README.md		README.md
alembic.ini		alembic.ini
conftest.py		conftest.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-entrypoint-worker.dev.sh		docker-entrypoint-worker.dev.sh
docker-entrypoint-worker.prod.sh		docker-entrypoint-worker.prod.sh
docker-entrypoint.dev.sh		docker-entrypoint.dev.sh
docker-entrypoint.prod.sh		docker-entrypoint.prod.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hiilikartta-data-service

Architecture (runtime)

Tech stack (selected)

API (service usage)

Endpoints

Typical client flow

Calculation model

Input expectations (high level)

Data inputs

Operational behavior (GIS throttling)

Running locally (Docker Compose)

Prerequisites

Configure env

Start services

State DB migrations (Alembic)

Production-ish Compose

Project structure

Development conventions

Devcontainer

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hiilikartta-data-service

Architecture (runtime)

Tech stack (selected)

API (service usage)

Endpoints

Typical client flow

Calculation model

Input expectations (high level)

Data inputs

Operational behavior (GIS throttling)

Running locally (Docker Compose)

Prerequisites

Configure env

Start services

State DB migrations (Alembic)

Production-ish Compose

Project structure

Development conventions

Devcontainer

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages