Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions docs/source/Quadratic_cost_function.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
=======================
Quadratic cost function
=======================

The quadratic cost function of a simulation is used to evaluate how well a
calibration performs. OpenDA searches for a set of calibration parameters for
which this cost function becomes minimal.

For every observation location and every time step, we compute the difference
between the model result and the observation, square this, and divide it by the
square of the observation's standard deviations. Summing over all locations
and times yields:

.. math::
L(x) = \sum_{i=1}^I \sum_{t=1}^T \frac{(y_{t,i} - f_{t,i}(x))^2}{\sigma_i^2},

where :math:`y` is the set of observations, and :math:`f(x)` the corresponding
set of model results, and :math:`\sigma_i` the user-defined standard deviation
of the observations at location :math:`i`.

In the Java implementation, this cost function is defined in the class
``SimulationKwadraticCostFunction``. The default calculation is activated
using the following XML input::

<costFunction weakParameterConstraint="false"
class="org.openda.algorithms.SimulationKwadraticCostFunction"
stdRemoval="false" biasRemoval="false" factor="1"/>

If the attribute ``factor`` is omitted, the default value 0.5 is used.
The computed cost is multiplied by this factor.

OpenDA supports several special options that modify how the cost function is
calculated.

#. ``stdRemoval="true"``

With this option, calibration is performed *using only the bias*, meaning
the standard deviation of the differences between the model results and the
observations is removed from the calculation. This is not the same as the
standard deviation of the observations :math:`\sigma_i`.

For each station, the bias is computed as the average difference between the
model results and observations over all times:

.. math::
\mathrm{bias}_i = \frac{\sum_{t=1}^T (y_{t,i} - f_{t,i}(x))}{T}.

In the cost function, the model-observation difference is replaced by this
bias:

.. math::
L_{\mathrm{bias}}(x) = \sum_{i=1}^I \sum_{t=1}^T
\frac{\mathrm{bias}_i^2}{\sigma_i^2} = \sum_{i=1}^I T
\frac{\mathrm{bias}^2}{\sigma_i^2}.

The final value is multiplied by the chosen factor.

Below is an example with five observation stations (:math:`I=5`).

========================== ==================== ===================== ===================== =========== ==================== ====================
ObsID RMS Bias STD Num of data Min Max
Pannerdense-kop.waterlevel 8.765000563570873E-6 -8.765000563570881E-6 1.705030836774512E-21 78 -8.76500056357088E-6 -8.76500056357088E-6
Zaltbommel.waterlevel 2.553417424167047E-6 2.553417424167031E-6 5.511568187681813E-21 471 2.55341742416703E-6 2.55341742416703E-6
Vuren.waterlevel 4.557510343339312E-6 -4.557510343339312E-6 2.374213988539862E-20 471 -4.55751034333928E-6 -4.55751034333928E-6
Nijmegen-haven.waterlevel 2.171564102226854E-5 2.171564102226849E-5 3.068557404185452E-20 80 2.17156410222685E-5 2.17156410222685E-5
Tiel-Waal.waterlevel 4.144963131675898E-4 -4.144963131675885E-4 8.729040419079085E-19 79 -4.14496313167589E-4 -4.14496313167589E-4
========================== ==================== ===================== ===================== =========== ==================== ====================

All values in the table are computed per observation location, based on the
differences between model results and observations over time. The table
columns have the following meaning:

- *ObsID*: the observation location;
- *RMS*: the root-mean-square error (cost) of the difference;
- *Bias*: the bias (kind of mean difference);
- *STD*: the standard deviation of the difference;
- *Num of data*: the number of data points at the observation location;
that are included in the cost calculation (:math:`T`);
- *Min*: the minimum value of the difference at the location;
- *Max*: the maximum value of the difference at the location.

OpenDA uses a data-assimilation algorithm to minimize the cost function. In
this example, the algorithm `Dud
<https://www.tandfonline.com/doi/abs/10.1080/00401706.1978.10489610>`__ has
been used. It needs 9 iteration to arrive at a minimum cost, generating the
following output::

% costObserved{9} =6.814670088199292E-6;
% costTotal{9} =6.814670088199292E-6;
% SimulationKwadraticCostFunction: evaluation 9 : cost = 6.815E-6

#. ``biasRemoval="true"``

With this option, calibration uses only the *standard deviation* (not the
bias). First, the bias is computed for each station using the same formula
as above. Then, for each time step, this bias is subtracted from the
model-observation difference:

.. math::
L_{\mathrm{std}}(x) = \sum_{i=1}^I \sum_{t=1}^T \frac{(y_{t,i} -
f_{t,i}(x) - \mathrm{bias}_i)^2}{\sigma_i^2},

again multiplied by the chosen factor.


#. ``weakParameterConstraint="true"``

This option adds an extra term to the cost function to penalize deviations
between the current and initial estimates of the calibration parameters.
This helps stabilize calibration when parameters drift too far from their
initial values.

This resulting cost function is:

.. math::
L_{\mathrm{weakParam}}(x) = \sum_{i=1}^I \sum_{t=1}^T \frac{( y_{t,i} -
f_{t,i}(x))^2}{\sigma_i^2} + \sum_{x=1}^X \frac{(\mathrm{ln}(x) -
\mathrm{ln}(x_0))^2}{\sigma_x^2},

and this expression is also multiplied by the chosen factor.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Index
estimate_missing_observations.rst
DFlowFM_info.rst
calibration_twin.rst
Quadratic_cost_function.rst

.. toctree::
:caption: Contributing to the source
Expand Down
Loading