diff --git a/CONTEXT.md b/CONTEXT.md new file mode 100644 index 000000000..bc747efd8 --- /dev/null +++ b/CONTEXT.md @@ -0,0 +1,119 @@ +# ModelSkill + +Internal domain glossary for ModelSkill — a Python package for evaluating model skill by comparing simulation output with observations, primarily for MIKE 21/3 (Flexible Mesh) hydrodynamic and oceanographic models. The user-facing version lives in `docs/user-guide/terminology.qmd`; this file is the dev-facing canonical reference and additionally documents internal/alpha vocabulary that is not yet stable enough for the public docs. + +Upstream vocabulary (file formats, geometry concepts, layered meshes) lives in [mikeio's CONTEXT.md](../mikeio/CONTEXT.md). Where the two glossaries overlap, mikeio is authoritative — modelskill aligns rather than diverges. + +## Language + +### Core entities + +**Observation**: +Measured data from a real instrument or station, used as the reference truth. Concrete types: `PointObservation`, `TrackObservation` (plus `VerticalObservation`, alpha). +_Avoid_: ground truth, target, label. + +**ModelResult**: +Simulation output to be evaluated against an Observation. Concrete types: `PointModelResult`, `TrackModelResult`, `GridModelResult`, `DfsuModelResult`, `DummyModelResult` (plus `VerticalModelResult`, alpha). +_Avoid_: prediction, forecast, model output. + +**Comparer**: +The pairing of one Observation with one or more ModelResults after spatial and temporal matching. Holds the matched data alongside the raw model data. +_Avoid_: comparison, matched dataset. + +**ComparerCollection**: +A collection of Comparers, typically one per station, used for cross-station skill assessment. +_Avoid_: multi-comparer, comparer set. + +**Baseline**: +A reference ModelResult representing a naive prediction (mean, persistence, climatology). Realized as `DummyModelResult`. Anchors skill scores like NSE that quantify "is the real model better than predicting nothing clever". +_Avoid_: null model, dummy model (use "Baseline" in prose; `DummyModelResult` is the class). + +### Geometry types (gtype) + +**Geometry type (gtype)**: +The label describing the spatial structure of an Observation or ModelResult. Determines match-compatibility. Values: `point`, `track`, `grid`, `unstructured` (flexible mesh), `vertical` (alpha), `node` / `reach` (future). Accessible via `.gtype`. + +**Point**: +A fixed (x, y) location, single z, timeseries of values. e.g. tide gauge, single-depth mooring. + +**Track**: +A moving (x, y) location, single z, timeseries of values. e.g. satellite altimeter, ship transect. + +**Grid**: +A regular axis-aligned spatial extent with a value at each (x, y) per timestep. ModelResult-only (no GridObservation). Backed by `GridModelResult` (xarray / nc / dfs2). + +**Flexible Mesh** (FM): +An unstructured spatial extent of Nodes + Elements with a value per Element per timestep. ModelResult-only. Backed by `DfsuModelResult` (dfsu files). Use **flexible mesh** for the geometry concept and **dfsu** for the file format — never "unstructured grid". Aligns with mikeio's vocabulary. + +**Vertical** (alpha): +A fixed (x, y) location with values varying along z and time — a water column at one station. e.g. CTD cast, moored profiler, modeled column extracted from a 3D dfsu. +_Avoid_: profile, water column, column, depth profile (all overloaded). + +### Skill assessment + +**Skill**: +The ability of a model to reproduce observations. As an API, `Comparer.skill()` returns a `SkillTable` of metrics grouped by (observation, model, variable). Conceptually a Comparer-level concept; also available on ComparerCollection. + +**Score**: +A single numerical value summarizing model performance for one metric, computed as a weighted average across all time-steps, observations and variables. Conceptually a ComparerCollection-level concept; also available on `Comparer.score` as the degenerate single-observation case. + +**Metric**: +A mathematical expression evaluated on matched (obs, model) pairs (bias, RMSE, R², NSE, …). Defined in `modelskill.metrics`. Both circular and scalar metrics are supported. + +**SkillTable**: +Skill metrics aggregated by categorical groupings (model, observation, variable). DataFrame-like. + +**SkillGrid**: +Skill metrics binned in horizontal space (x, y). + +**SkillProfile** (alpha): +Skill metrics binned in z. Single-station today; will gain an `observation` dim for multi-station before release. + +### Physical quantities + +**Quantity**: +A physical quantity (name + unit, e.g. *Water Level [m]*) attached to an Observation or ModelResult. Compatibility gates matching — water level cannot be matched against wind speed. When reading dfs files via mikeio, derived from `EUMType` + `EUMUnit`. Class: `Quantity`. + +**Field**: +Data defined over a spatial extent (Grid or Flexible Mesh) with a value at each location and timestep — i.e. the gtype=grid or gtype=unstructured shape. Contrasts with **timeseries** (point/track). Not directly comparable to observations; `match()` extracts a timeseries per observation location before producing a Comparer. + +**Timeseries**: +A sequence of values in time at a single location (point) or a single moving location (track). Univariate by convention in ModelSkill; multivariate timeseries are assessed one variable at a time. + +## Relationships + +- An **Observation** and a **ModelResult** of compatible **gtype** match into a **Comparer** (or **ComparerCollection** for multiple observations). +- A **Field** (Grid / FM ModelResult) is *not* directly comparable to an Observation; `match()` extracts a **Timeseries** at each observation location. +- **Skill** is naturally per-Comparer (per-(obs, model) breakdown); **Score** is naturally per-ComparerCollection (cross-observation weighted average). Both methods exist on both classes for convenience. +- A **Comparer** of `gtype="vertical"` exposes a `.vertical` accessor for column-specific operations (`skill`, `plot.profile`, `plot.hovmoller`). *Alpha.* +- A **ComparerCollection** of vertical members will expose `cc.vertical.skill()` (release blocker, follow-up PR) returning a **SkillProfile** with `(z, observation, model)` dims. *Alpha.* + +## Conventions + +**z-axis direction**: +Default `positive="up"` (MIKE 3 convention — z=0 at datum, below-surface negative). Read from `z.attrs["positive"]` if set; user can override via the `positive` constructor kwarg. Plotter inverts the y-axis iff `positive="down"`. mikeio-sourced datasets do not currently carry this attribute. + +**Flexible Mesh vs dfsu**: +*Flexible mesh* (or FM) is the geometry/engine concept; *dfsu* is the binary file format (extension `.dfsu`). Class names anchored to the file use `Dfsu*`; prose about the geometry uses "flexible mesh". Aligns with mikeio. + +## Flagged ambiguities + +- **"performance" vs "skill"**: Treated as aliases in user docs (terminology.qmd folds "performance" into Skill). Internally, prefer "skill". +- **"unstructured" vs "flexible mesh"**: The code's `GeometryType.UNSTRUCTURED` enum value is a legacy name. Prose and docs say "flexible mesh"; the enum rename was deferred (flag in API reviews, do not change unprompted). +- **(x, y) on a VerticalObservation vs VerticalModelResult**: legitimately differ. Obs holds the true CTD/instrument position; model holds the nearest mesh element center. No equality check is performed at match time — the mismatch is expected, not a bug. *Alpha.* +- **"profile"**: Rejected as an entity name — overloaded across `SkillProfile`, `plot.profile()`, Taylor profiles, and (informally) CTD output. The entity is **Vertical**; "profile" survives only when describing the *output* (e.g., `SkillProfile` reads as "skill as a function of depth"). *Alpha.* +- **3D dfsu → VerticalModelResult extraction**: not provided in-package. Extraction is an offline preprocessing step because it is slow (mesh navigation, sigma-z layer reconstruction). Users extract one column with mikeio, write a dfs0, and pass that to `VerticalModelResult`. *Alpha.* + +## Example dialogue + +> **Dev:** "I have a tide-gauge timeseries and a dfsu of surface elevation. What's the workflow?" +> +> **Maintainer:** "Wrap the gauge as a `PointObservation` and the dfsu as a `DfsuModelResult` (gtype `unstructured`, i.e. flexible mesh). Call `match()` — it'll extract a timeseries from the mesh at the gauge location and hand you back a `Comparer`. Then `cmp.skill()` gives you a `SkillTable` of metrics, or `cmp.score(metric=...)` for a single number." +> +> **Dev:** "And if I have ten gauges?" +> +> **Maintainer:** "`match()` with a list of observations returns a `ComparerCollection`. `cc.score(...)` is the natural one-number summary — it weights across observations. `cc.skill()` still gives you the per-observation breakdown." +> +> **Dev:** "Is my model better than just predicting the mean?" +> +> **Maintainer:** "Add a `DummyModelResult` as a baseline alongside your real model, run `match()`, and compare scores. NSE is the metric that's literally defined as 'how much better than the mean'." diff --git a/docs/user-guide/terminology.qmd b/docs/user-guide/terminology.qmd index a9a804942..e8dd0daca 100644 --- a/docs/user-guide/terminology.qmd +++ b/docs/user-guide/terminology.qmd @@ -8,10 +8,6 @@ format-links: false ModelSkill is a library for assessing the skill of numerical models. It provides tools for comparing model results with observations, plotting the results and calculating validation metrics. This page defines some of the key terms used in the documentation. -## {{< fa ruler >}} Skill -**Skill** refers to the ability of a numerical model to accurately represent the real-world phenomenon it aims to simulate. It is a measure of how well the model performs in reproducing the observed system. Skill can be assessed using various metrics, such as accuracy, precision, and reliability, depending on the specific goals of the model and the nature of the data. In ModelSkill, [`skill`](`modelskill.Comparer.skill`) is also a specific method on [Comparer](`modelskill.Comparer`) objects that returns a [`SkillTable`](`modelskill.SkillTable`) with aggregated skill scores per observation and model for a list of selected [metrics](`modelskill.metrics`). - - ## {{< fa check >}} Validation **Validation** is the process of assessing the model's performance by comparing its output to real-world observations or data collected from the system being modeled. It helps ensure that the model accurately represents the system it simulates. Validation is typically performed before the model is used for prediction or decision-making. @@ -20,16 +16,20 @@ ModelSkill is a library for assessing the skill of numerical models. It provides **Calibration** is the process of adjusting the model's parameters or settings to improve its performance. It involves fine-tuning the model to better match observed data. Calibration aims to reduce discrepancies between model predictions and actual measurements. At the end of the calibration process, the calibrated model should be validated with independent data. -## {{< fa ruler >}} Performance -**Performance** is a measure of how well a numerical model operates in reproducing the observed system. It can be assessed using various metrics, such as accuracy, precision, and reliability, depending on the specific goals of the model and the nature of the data. In this context, **performance** is synonymous with **skill**. +## {{< fa chart-line >}} Timeseries +A **timeseries** is a sequence of data points in time at a single location (or a single moving location). In ModelSkill, the data can either be from [observations](#observation) or [model results](#model-result). Timeseries can be univariate or multivariate; ModelSkill primarily supports univariate timeseries — multivariate timeseries can be assessed one variable at a time. Spatially, a timeseries is either *point* (fixed location) or *track* (location varies with time). Gridded model results ([`GridModelResult`](`modelskill.GridModelResult`)) and flexible-mesh model results ([`DfsuModelResult`](`modelskill.DfsuModelResult`)) are not themselves timeseries; they are [fields](#field) that yield a timeseries at each location after extraction. -## {{< fa chart-line >}} Timeseries -A **timeseries** is a sequence of data points in time. In ModelSkill, The data can either be from [observations](#observation) or [model results](#model-result). Timeseries can univariate or multivariate; ModelSkill primarily supports univariate timeseries. Multivariate timeseries can be assessed one variable at a time. Timeseries can also have different spatial dimensions, such as point, track, line, or area. +## {{< fa globe >}} Field +A **field** is data defined over a spatial extent — a regular [grid](#model-result) or a flexible mesh — with a value at each location and timestep. ModelSkill supports gridded fields ([`GridModelResult`](`modelskill.GridModelResult`)) and flexible-mesh fields ([`DfsuModelResult`](`modelskill.DfsuModelResult`)). Fields are not directly comparable to observations; the [`match`](`modelskill.match`) function extracts a [timeseries](#timeseries) at each observation location and produces a [Comparer](#comparer) from the result. + + +## {{< fa shapes >}} Geometry type (gtype) +The **geometry type** (`gtype`) is a label describing the spatial structure of an [observation](#observation) or [model result](#model-result). The values used in ModelSkill are `point`, `track`, `grid`, and `unstructured` (i.e. flexible mesh). The geometry type determines match-compatibility: an observation and a model result can be matched only when their geometry types are compatible. Accessible via the `.gtype` attribute on observations, model results, and [Comparers](#comparer). ## {{< fa temperature-half >}} Observation -An **observation** refers to real-world data or measurements collected from the system you are modeling. Observations serve as a reference for assessing the model's performance. These data points are used to compare with the model's predictions during validation and calibration. Observations are usually based on field measurements or laboratory experiments, but for the purposes of model validation, they can also be derived from other models (e.g. a reference model). ModelSkill supports [point](`modelskill.PointModelResult`) and [track](`modelskill.TrackModelResult`) observation types. +An **observation** refers to real-world data or measurements collected from the system you are modeling. Observations serve as a reference for assessing the model's performance. These data points are used to compare with the model's predictions during validation and calibration. Observations are usually based on field measurements or laboratory experiments, but for the purposes of model validation, they can also be derived from other models (e.g. a reference model). ModelSkill supports [point](`modelskill.PointObservation`) and [track](`modelskill.TrackObservation`) observation types. ## {{< fa temperature-half >}} Measurement @@ -37,31 +37,43 @@ A **measurement** is called [observation](#observation) in ModelSkill. ## {{< fa database >}} Model result -A **model result** is the output of any type of numerical model. It is the data generated by the model during a simulation. Model results can be compared with observations to assess the model's performance. In the context of validation, the term "model result" is often used interchangeably with "model output" or "model prediction". ModelSkill supports [point](`modelskill.PointModelResult`), [track](`modelskill.TrackModelResult`), [dfsu](`modelskill.DfsuModelResult`) and [grid](`modelskill.GridModelResult`) model result types. +A **model result** is the output of any type of numerical model. It is the data generated by the model during a simulation. Model results can be compared with observations to assess the model's performance. In the context of validation, the term "model result" is often used interchangeably with "model output" or "model prediction". ModelSkill supports [point](`modelskill.PointModelResult`), [track](`modelskill.TrackModelResult`), flexible-mesh ([dfsu](`modelskill.DfsuModelResult`)) and [grid](`modelskill.GridModelResult`) model result types, plus [`DummyModelResult`](`modelskill.DummyModelResult`) for [baselines](#baseline). -## {{< fa ruler >}} Metric -A **metric** is a quantitative measure (a mathematical expression) used to evaluate the performance of a numerical model. Metrics provide a standardized way to assess the model's accuracy, precision, and other attributes. A metric aggregates the skill of a model into a single number. See list of [metrics](`modelskill.metrics`) supported by ModelSkill. +## {{< fa chart-simple >}} Baseline +A **baseline** is a reference [model result](#model-result) representing a naive or trivial prediction — for example, the observation mean, persistence, or climatology. Baselines anchor skill scores such as the [Nash–Sutcliffe efficiency](`modelskill.metrics.nse`) that quantify how much better a real model is than doing nothing clever. ModelSkill provides [`DummyModelResult`](`modelskill.DummyModelResult`) for this purpose. -## {{< fa ruler >}} Score -A **score** is a numerical value that summarizes the model's performance based on chosen metrics. Scores can be used to rank or compare different models or model configurations. In the context of validation, the "skill score" or "validation score" often quantifies the model's overall performance. The score of a model is a single number, calculated as a weighted average for all time-steps, observations and variables. If you want to perform automated calibration, you can use the score as the objective function. In ModelSkill, [`score`](`modelskill.ComparerCollection.score`) is also a specific method on [Comparer](`modelskill.Comparer`) objects that returns a single number aggregated score using a specific [metric](#metric). - - -## Matched data -In ModelSkill, observations and model results are *matched* when they refer to the same positions in space and time. If the [observations](#observation) and [model results](#model-result) are already matched, the [`from_matched`](`modelskill.from_matched`) function can be used to create a [Comparer](#comparer) directly. Otherwise, the [match](#match) function can be used to match the observations and model results in space and time. +## {{< fa scale-balanced >}} Quantity +A **[Quantity](`modelskill.Quantity`)** is a physical quantity (a name plus a unit, e.g. *Water Level [m]*, *Wind Speed [m/s]*) attached to an [observation](#observation) or [model result](#model-result). Observations and model results must share a compatible Quantity to be matched — you cannot validate water level against wind speed. When reading dfs files via [mikeio](https://github.com/DHI/mikeio), Quantity is derived from `EUMType` and `EUMUnit`. ## {{< fa link >}} match() The function [`match`](`modelskill.match`) is used to match a model result with observations. It returns a [`Comparer`](`modelskill.Comparer`) object or a [`ComparerCollection`](`modelskill.ComparerCollection`) object. +## Matched data +In ModelSkill, observations and model results are *matched* when they refer to the same positions in space and time. If the [observations](#observation) and [model results](#model-result) are already matched, the [`from_matched`](`modelskill.from_matched`) function can be used to create a [Comparer](#comparer) directly. Otherwise, the [match](#match) function can be used to match the observations and model results in space and time. + + ## Comparer -A [**Comparer**](`modelskill.Comparer`) is an object that stores the matched observation and model result data for a *single* observation. It is used to calculate validation metrics and generate plots. A Comparer can be created using the [`match`](`modelskill.match`) function. +A [**Comparer**](`modelskill.Comparer`) is an object that stores the matched observation and model result data for a *single* observation. It is used to calculate validation metrics and generate plots. A Comparer can be created using the [`match`](`modelskill.match`) function. ## ComparerCollection -A [**ComparerCollection**](`modelskill.ComparerCollection`) is a collection of [Comparer](#comparer)s. It is used to compare *multiple* observations with one or more model results. A ComparerCollection can be created using the [`match`](`modelskill.match`) function or by passing a list of Comparers to the [`ComparerCollection`](`modelskill.ComparerCollection`) constructor. +A [**ComparerCollection**](`modelskill.ComparerCollection`) is a collection of [Comparer](#comparer)s. It is used to compare *multiple* observations with one or more model results. A ComparerCollection can be created using the [`match`](`modelskill.match`) function or by passing a list of Comparers to the [`ComparerCollection`](`modelskill.ComparerCollection`) constructor. + + +## {{< fa ruler >}} Skill +**Skill** refers to the ability of a numerical model to accurately represent the real-world phenomenon it aims to simulate. It is a measure of how well the model performs in reproducing the observed system. Skill can be assessed using various metrics, such as accuracy, precision, and reliability, depending on the specific goals of the model and the nature of the data. In ModelSkill, [`skill`](`modelskill.Comparer.skill`) is also a specific method on [Comparer](`modelskill.Comparer`) objects that returns a [`SkillTable`](`modelskill.SkillTable`) with aggregated skill scores per observation and model for a list of selected [metrics](`modelskill.metrics`). Also called *performance*. + + +## {{< fa ruler >}} Metric +A **metric** is a quantitative measure (a mathematical expression) used to evaluate the performance of a numerical model. Metrics provide a standardized way to assess the model's accuracy, precision, and other attributes. A metric aggregates the skill of a model into a single number. See list of [metrics](`modelskill.metrics`) supported by ModelSkill. + + +## {{< fa ruler >}} Score +A **score** is a numerical value that summarizes the model's performance based on a chosen [metric](#metric). Scores can be used to rank or compare different models or model configurations, and are well suited as the objective function in automated calibration. A score is a single number, calculated as a weighted average over all time-steps, observations and variables. In ModelSkill, [`score`](`modelskill.ComparerCollection.score`) is naturally a method on [ComparerCollection](`modelskill.ComparerCollection`) — the cross-observation weighting is what makes the result a single representative number. It is also available on [Comparer](`modelskill.Comparer`) as the degenerate single-observation case. ## Abbreviations @@ -76,4 +88,3 @@ A [**ComparerCollection**](`modelskill.ComparerCollection`) is a collection of [ | `sk` | `SkillTable` | | `mtr` | Metric | | `q` | `Quantity` | -