Skip to content

Commit cb53022

Browse files
committed
Address PR review feedback
- Fix MAXIMUM reduction to return NaN (not 0.0) for all-empty bins (CodeRabbit) - Enhance docstrings with 'Why Calibration Matters' section explaining that probabilities should match observed accuracy - Add paper references: Guo et al. 2017 (ICML primary source), MICCAI 2024, - Add Sphinx autodoc entries to metrics.rst and handlers.rst - Improve parameter documentation and usage examples
1 parent 202b25f commit cb53022

File tree

6 files changed

+232
-154
lines changed

6 files changed

+232
-154
lines changed

docs/source/handlers.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,12 @@ Panoptic Quality metrics handler
8383
:members:
8484

8585

86+
Calibration Error metrics handler
87+
---------------------------------
88+
.. autoclass:: CalibrationError
89+
:members:
90+
91+
8692
Mean squared error metrics handler
8793
----------------------------------
8894
.. autoclass:: MeanSquaredError

docs/source/metrics.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,15 @@ Metrics
180180
.. autoclass:: MetricsReloadedCategorical
181181
:members:
182182

183+
`Calibration Error`
184+
-------------------
185+
.. autofunction:: calibration_binning
186+
187+
.. autoclass:: CalibrationReduction
188+
:members:
189+
190+
.. autoclass:: CalibrationErrorMetric
191+
:members:
183192

184193

185194
Utilities

monai/handlers/calibration.py

Lines changed: 64 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -22,30 +22,72 @@
2222

2323
class CalibrationError(IgniteMetricHandler):
2424
"""
25-
Computes Calibration Error and reports the aggregated value according to `metric_reduction`
26-
over all accumulated iterations. Can return the expected, average, or maximum calibration error.
25+
Ignite handler to compute Calibration Error during training or evaluation.
26+
27+
**Why Calibration Matters:**
28+
29+
A well-calibrated model produces probability estimates that match the true likelihood of correctness.
30+
For example, predictions with 80% confidence should be correct approximately 80% of the time.
31+
Modern neural networks often exhibit poor calibration (typically overconfident), which can be
32+
problematic in medical imaging where probability estimates may inform clinical decisions.
33+
34+
This handler wraps :py:class:`~monai.metrics.CalibrationErrorMetric` for use with PyTorch Ignite
35+
engines, automatically computing and aggregating calibration errors across iterations.
36+
37+
**Supported Calibration Metrics:**
38+
39+
- **Expected Calibration Error (ECE)**: Weighted average of per-bin errors (most common).
40+
- **Average Calibration Error (ACE)**: Unweighted average across bins.
41+
- **Maximum Calibration Error (MCE)**: Worst-case calibration error.
2742
2843
Args:
29-
num_bins: number of bins to calculate calibration. Defaults to 20.
30-
include_background: whether to include calibration error computation on the first channel of
31-
the predicted output. Defaults to True.
32-
calibration_reduction: Method for calculating calibration error values from binned data.
33-
Available modes are `"expected"`, `"average"`, and `"maximum"`. Defaults to `"expected"`.
34-
metric_reduction: Mode of reduction to apply to the metrics.
35-
Reduction is only applied to non-NaN values.
36-
Available reduction modes are `"none"`, `"mean"`, `"sum"`, `"mean_batch"`,
37-
`"sum_batch"`, `"mean_channel"`, and `"sum_channel"`.
38-
Defaults to `"mean"`. If set to `"none"`, no reduction will be performed.
39-
output_transform: callable to extract `y_pred` and `y` from `ignite.engine.state.output` then
40-
construct `(y_pred, y)` pair, where `y_pred` and `y` can be `batch-first` Tensors or
41-
lists of `channel-first` Tensors. the form of `(y_pred, y)` is required by the `update()`.
42-
`engine.state` and `output_transform` inherit from the ignite concept:
43-
https://pytorch.org/ignite/concepts.html#state, explanation and usage example are in the tutorial:
44-
https://github.com/Project-MONAI/tutorials/blob/master/modules/batch_output_transform.ipynb.
45-
save_details: whether to save metric computation details per image, for example: calibration error
46-
of every image. default to True, will save to `engine.state.metric_details` dict with the
47-
metric name as key.
44+
num_bins: Number of equally-spaced bins for calibration computation. Defaults to 20.
45+
include_background: Whether to include the first channel (index 0) in computation.
46+
Set to ``False`` to exclude background in segmentation tasks. Defaults to ``True``.
47+
calibration_reduction: Calibration error reduction mode. Options: ``"expected"`` (ECE),
48+
``"average"`` (ACE), ``"maximum"`` (MCE). Defaults to ``"expected"``.
49+
metric_reduction: Reduction across batch/channel after computing per-sample errors.
50+
Options: ``"none"``, ``"mean"``, ``"sum"``, ``"mean_batch"``, ``"sum_batch"``,
51+
``"mean_channel"``, ``"sum_channel"``. Defaults to ``"mean"``.
52+
output_transform: Callable to extract ``(y_pred, y)`` from ``engine.state.output``.
53+
See `Ignite concepts <https://pytorch.org/ignite/concepts.html#state>`_ and
54+
the batch output transform tutorial in the MONAI tutorials repository.
55+
save_details: If ``True``, saves per-sample/per-channel metric values to
56+
``engine.state.metric_details[name]``. Defaults to ``True``.
57+
58+
References:
59+
- Guo, C., et al. "On Calibration of Modern Neural Networks." ICML 2017.
60+
https://proceedings.mlr.press/v70/guo17a.html
61+
- Barfoot, T., et al. "Average Calibration Error: A Differentiable Loss for Improved
62+
Reliability in Image Segmentation." MICCAI 2024.
63+
https://papers.miccai.org/miccai-2024/091-Paper3075.html
4864
65+
See Also:
66+
- :py:class:`~monai.metrics.CalibrationErrorMetric`: The underlying metric class.
67+
- :py:func:`~monai.metrics.calibration_binning`: Low-level binning for reliability diagrams.
68+
69+
Example:
70+
>>> from monai.handlers import CalibrationError, from_engine
71+
>>> from ignite.engine import Engine
72+
>>>
73+
>>> def evaluation_step(engine, batch):
74+
... # Returns dict with "pred" (probabilities) and "label" (one-hot)
75+
... return {"pred": model(batch["image"]), "label": batch["label"]}
76+
>>>
77+
>>> evaluator = Engine(evaluation_step)
78+
>>>
79+
>>> # Attach calibration error handler
80+
>>> CalibrationError(
81+
... num_bins=15,
82+
... include_background=False,
83+
... calibration_reduction="expected",
84+
... output_transform=from_engine(["pred", "label"]),
85+
... ).attach(evaluator, name="ECE")
86+
>>>
87+
>>> # After evaluation, access results
88+
>>> evaluator.run(val_loader)
89+
>>> ece = evaluator.state.metrics["ECE"]
90+
>>> print(f"Expected Calibration Error: {ece:.4f}")
4991
"""
5092

5193
def __init__(
@@ -64,8 +106,4 @@ def __init__(
64106
metric_reduction=metric_reduction,
65107
)
66108

67-
super().__init__(
68-
metric_fn=metric_fn,
69-
output_transform=output_transform,
70-
save_details=save_details,
71-
)
109+
super().__init__(metric_fn=metric_fn, output_transform=output_transform, save_details=save_details)

0 commit comments

Comments
 (0)