While reviewing the data assimilation configuration, I noticed that the current implementation uses fixed absolute noise values corresponding to the diagonal elements of the observation covariance matrix (R). This raises questions regarding the treatment and justification of observation uncertainties, especially when comparing data types with inherently different units and scales.
Current Noise Values:
ERT (Electrical Resistivity Tomography): 1e4
SMC (Soil Moisture Content): 1e-2
Points for Discussion:
These values are treated as absolute noise levels, but should we instead (or additionally) consider relative noise, e.g., a percentage of the measurement value?
In many data assimilation frameworks, noise is modeled as a fraction of the observed value (relative error), which may provide a more balanced treatment across variables with different units or magnitudes.
For example, a 1e4 noise level in ERT could be meaningful depending on the resistivity range (e.g., 10^2–10^5 ohm.m), but might be excessive or insufficient depending on the context.
Similarly, 1e-2 for SMC corresponds to a 1% absolute error, which might be realistic, but should be reviewed in the context of sensor accuracy and expected variability.
Suggestions:
import numpy as np
import warnings
def build_observation_covariance(obs_values: np.ndarray,
noise_level: float,
relative: bool = False,
warn_threshold: float = 0.5) -> np.ndarray:
"""
Build a diagonal observation covariance matrix R, with an optional warning
when absolute noise seems too high relative to observations.
Parameters
----------
obs_values : np.ndarray
Array of observed values.
noise_level : float
Noise level. If relative=False, treated as absolute noise.
If relative=True, treated as a fraction of the observation (e.g., 0.05 for 5%).
relative : bool
Whether to treat noise level as relative to observation value.
warn_threshold : float
Threshold for warning when absolute noise exceeds this fraction of obs_values.
Returns
-------
R : np.ndarray
Observation covariance matrix (diagonal).
"""
if relative:
noise_std = noise_level * np.abs(obs_values)
else:
noise_std = np.full_like(obs_values, noise_level)
# Check if noise is suspiciously high compared to observation values
ratio = noise_std / (np.abs(obs_values) + 1e-12) # Avoid division by zero
if np.any(ratio > warn_threshold):
warnings.warn(
f"Absolute noise level {noise_level} is greater than "
f"{warn_threshold*100:.0f}% of some observed values. "
f"Consider using relative noise instead."
)
variance = noise_std**2
R = np.diag(variance)
return R
While reviewing the data assimilation configuration, I noticed that the current implementation uses fixed absolute noise values corresponding to the diagonal elements of the observation covariance matrix (R). This raises questions regarding the treatment and justification of observation uncertainties, especially when comparing data types with inherently different units and scales.
Current Noise Values:
ERT (Electrical Resistivity Tomography): 1e4
SMC (Soil Moisture Content): 1e-2
Points for Discussion:
These values are treated as absolute noise levels, but should we instead (or additionally) consider relative noise, e.g., a percentage of the measurement value?
In many data assimilation frameworks, noise is modeled as a fraction of the observed value (relative error), which may provide a more balanced treatment across variables with different units or magnitudes.
For example, a 1e4 noise level in ERT could be meaningful depending on the resistivity range (e.g., 10^2–10^5 ohm.m), but might be excessive or insufficient depending on the context.
Similarly, 1e-2 for SMC corresponds to a 1% absolute error, which might be realistic, but should be reviewed in the context of sensor accuracy and expected variability.
Suggestions: