2222
2323class CalibrationError (IgniteMetricHandler ):
2424 """
25- Computes Calibration Error and reports the aggregated value according to `metric_reduction`
26- over all accumulated iterations. Can return the expected, average, or maximum calibration error.
25+ Ignite handler to compute Calibration Error during training or evaluation.
26+
27+ **Why Calibration Matters:**
28+
29+ A well-calibrated model produces probability estimates that match the true likelihood of correctness.
30+ For example, predictions with 80% confidence should be correct approximately 80% of the time.
31+ Modern neural networks often exhibit poor calibration (typically overconfident), which can be
32+ problematic in medical imaging where probability estimates may inform clinical decisions.
33+
34+ This handler wraps :py:class:`~monai.metrics.CalibrationErrorMetric` for use with PyTorch Ignite
35+ engines, automatically computing and aggregating calibration errors across iterations.
36+
37+ **Supported Calibration Metrics:**
38+
39+ - **Expected Calibration Error (ECE)**: Weighted average of per-bin errors (most common).
40+ - **Average Calibration Error (ACE)**: Unweighted average across bins.
41+ - **Maximum Calibration Error (MCE)**: Worst-case calibration error.
2742
2843 Args:
29- num_bins: number of bins to calculate calibration. Defaults to 20.
30- include_background: whether to include calibration error computation on the first channel of
31- the predicted output. Defaults to True.
32- calibration_reduction: Method for calculating calibration error values from binned data.
33- Available modes are `"expected"`, `"average"`, and `"maximum"`. Defaults to `"expected"`.
34- metric_reduction: Mode of reduction to apply to the metrics.
35- Reduction is only applied to non-NaN values.
36- Available reduction modes are `"none"`, `"mean"`, `"sum"`, `"mean_batch"`,
37- `"sum_batch"`, `"mean_channel"`, and `"sum_channel"`.
38- Defaults to `"mean"`. If set to `"none"`, no reduction will be performed.
39- output_transform: callable to extract `y_pred` and `y` from `ignite.engine.state.output` then
40- construct `(y_pred, y)` pair, where `y_pred` and `y` can be `batch-first` Tensors or
41- lists of `channel-first` Tensors. the form of `(y_pred, y)` is required by the `update()`.
42- `engine.state` and `output_transform` inherit from the ignite concept:
43- https://pytorch.org/ignite/concepts.html#state, explanation and usage example are in the tutorial:
44- https://github.com/Project-MONAI/tutorials/blob/master/modules/batch_output_transform.ipynb.
45- save_details: whether to save metric computation details per image, for example: calibration error
46- of every image. default to True, will save to `engine.state.metric_details` dict with the
47- metric name as key.
44+ num_bins: Number of equally-spaced bins for calibration computation. Defaults to 20.
45+ include_background: Whether to include the first channel (index 0) in computation.
46+ Set to ``False`` to exclude background in segmentation tasks. Defaults to ``True``.
47+ calibration_reduction: Calibration error reduction mode. Options: ``"expected"`` (ECE),
48+ ``"average"`` (ACE), ``"maximum"`` (MCE). Defaults to ``"expected"``.
49+ metric_reduction: Reduction across batch/channel after computing per-sample errors.
50+ Options: ``"none"``, ``"mean"``, ``"sum"``, ``"mean_batch"``, ``"sum_batch"``,
51+ ``"mean_channel"``, ``"sum_channel"``. Defaults to ``"mean"``.
52+ output_transform: Callable to extract ``(y_pred, y)`` from ``engine.state.output``.
53+ See `Ignite concepts <https://pytorch.org/ignite/concepts.html#state>`_ and
54+ the batch output transform tutorial in the MONAI tutorials repository.
55+ save_details: If ``True``, saves per-sample/per-channel metric values to
56+ ``engine.state.metric_details[name]``. Defaults to ``True``.
57+
58+ References:
59+ - Guo, C., et al. "On Calibration of Modern Neural Networks." ICML 2017.
60+ https://proceedings.mlr.press/v70/guo17a.html
61+ - Barfoot, T., et al. "Average Calibration Error: A Differentiable Loss for Improved
62+ Reliability in Image Segmentation." MICCAI 2024.
63+ https://papers.miccai.org/miccai-2024/091-Paper3075.html
4864
65+ See Also:
66+ - :py:class:`~monai.metrics.CalibrationErrorMetric`: The underlying metric class.
67+ - :py:func:`~monai.metrics.calibration_binning`: Low-level binning for reliability diagrams.
68+
69+ Example:
70+ >>> from monai.handlers import CalibrationError, from_engine
71+ >>> from ignite.engine import Engine
72+ >>>
73+ >>> def evaluation_step(engine, batch):
74+ ... # Returns dict with "pred" (probabilities) and "label" (one-hot)
75+ ... return {"pred": model(batch["image"]), "label": batch["label"]}
76+ >>>
77+ >>> evaluator = Engine(evaluation_step)
78+ >>>
79+ >>> # Attach calibration error handler
80+ >>> CalibrationError(
81+ ... num_bins=15,
82+ ... include_background=False,
83+ ... calibration_reduction="expected",
84+ ... output_transform=from_engine(["pred", "label"]),
85+ ... ).attach(evaluator, name="ECE")
86+ >>>
87+ >>> # After evaluation, access results
88+ >>> evaluator.run(val_loader)
89+ >>> ece = evaluator.state.metrics["ECE"]
90+ >>> print(f"Expected Calibration Error: {ece:.4f}")
4991 """
5092
5193 def __init__ (
@@ -64,8 +106,4 @@ def __init__(
64106 metric_reduction = metric_reduction ,
65107 )
66108
67- super ().__init__ (
68- metric_fn = metric_fn ,
69- output_transform = output_transform ,
70- save_details = save_details ,
71- )
109+ super ().__init__ (metric_fn = metric_fn , output_transform = output_transform , save_details = save_details )
0 commit comments