Clarify and Justify Noise Levels Used in Data Assimilation (Absolute vs. Relative)

While reviewing the data assimilation configuration, I noticed that the current implementation uses fixed absolute noise values corresponding to the diagonal elements of the observation covariance matrix (R). This raises questions regarding the treatment and justification of observation uncertainties, especially when comparing data types with inherently different units and scales.

## Current Noise Values:
ERT (Electrical Resistivity Tomography): 1e4
SMC (Soil Moisture Content): 1e-2

## Points for Discussion:
These values are treated as absolute noise levels, but should we instead (or additionally) consider relative noise, e.g., a percentage of the measurement value?
In many data assimilation frameworks, noise is modeled as a fraction of the observed value (relative error), which may provide a more balanced treatment across variables with different units or magnitudes.

For example, a 1e4 noise level in ERT could be meaningful depending on the resistivity range (e.g., 10^2–10^5 ohm.m), but might be excessive or insufficient depending on the context.
Similarly, 1e-2 for SMC corresponds to a 1% absolute error, which might be realistic, but should be reviewed in the context of sensor accuracy and expected variability.

## Suggestions:
- [ ] Allow the user to define whether the noise should be treated as absolute or relative via a configuration flag or metadata field.
- [ ] Include an automatic check/warning if the noise levels appear inconsistent with typical observation magnitudes.
- [ ]  Add documentation or inline comments explaining the rationale for current default values.


```python
import numpy as np
import warnings

def build_observation_covariance(obs_values: np.ndarray,
                                 noise_level: float,
                                 relative: bool = False,
                                 warn_threshold: float = 0.5) -> np.ndarray:
    """
    Build a diagonal observation covariance matrix R, with an optional warning
    when absolute noise seems too high relative to observations.

    Parameters
    ----------
    obs_values : np.ndarray
        Array of observed values.
    noise_level : float
        Noise level. If relative=False, treated as absolute noise.
        If relative=True, treated as a fraction of the observation (e.g., 0.05 for 5%).
    relative : bool
        Whether to treat noise level as relative to observation value.
    warn_threshold : float
        Threshold for warning when absolute noise exceeds this fraction of obs_values.

    Returns
    -------
    R : np.ndarray
        Observation covariance matrix (diagonal).
    """
    if relative:
        noise_std = noise_level * np.abs(obs_values)
    else:
        noise_std = np.full_like(obs_values, noise_level)
        
        # Check if noise is suspiciously high compared to observation values
        ratio = noise_std / (np.abs(obs_values) + 1e-12)  # Avoid division by zero
        if np.any(ratio > warn_threshold):
            warnings.warn(
                f"Absolute noise level {noise_level} is greater than "
                f"{warn_threshold*100:.0f}% of some observed values. "
                f"Consider using relative noise instead."
            )

    variance = noise_std**2
    R = np.diag(variance)
    return R
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify and Justify Noise Levels Used in Data Assimilation (Absolute vs. Relative) #31

Current Noise Values:

Points for Discussion:

Suggestions:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarify and Justify Noise Levels Used in Data Assimilation (Absolute vs. Relative) #31

Description

Current Noise Values:

Points for Discussion:

Suggestions:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions