Skip to content

Pre-ranking for multivariate calibration assessment #1064

@sbfnk

Description

@sbfnk

Summary

Implement pre-ranking methods from Gneiting et al. (2008) to assess calibration of multivariate forecasts via univariate rank histograms.

Motivation

When forecasting multiple correlated quantities (e.g., multiple locations, age groups, or forecast horizons), standard univariate calibration checks miss dependency structure. A forecast could be perfectly calibrated marginally (each location's PIT is uniform) but miss the correlation structure entirely. Gneiting et al. (2008) propose "pre-ranking" approaches that reduce multivariate calibration assessment to univariate ranks, which can then be visualised via rank histograms. This could complement the new forecast_multivariate_sample class by providing calibration diagnostics.

Pre-ranking methods

The paper describes two main pre-ranking approaches:

  1. Multivariate rank (Section 2.1): Based on pre-ranks using an orthant semi-ordering. For each point, count how many other points lie "to the lower left" in all dimensions. The observation's rank among samples is then interpreted as a calibration metric. Ties are resolved randomly.

  2. Minimum spanning tree (MST) rank (Section 2.2): Remove each point in turn and compute the MST length of the remaining points. Rank observation by where its MST length falls among sample MST lengths.

Design options

This could be implemented in two ways

Option A: Include in score()

Add coverage columns to score.forecast_multivariate_sample() output:

  • multivariate_coverage_50, multivariate_coverage_90, etc.
  • Each is 0/1 per forecast unit (like univariate interval_coverage_*)
  • Aggregation via summarise_scores() gives coverage proportions
  • Existing visualisation tools can display coverage deviations

This parallels how univariate interval coverage works and feels most natural.

Option B: Standalone function

get_multivariate_ranks(forecast_multivariate, method = "average_rank")

Returns the observation's rank among samples (1 to M+1). This can be passed to get_pit_histogram() for visualisation. Useful for detailed calibration diagnostics beyond coverage.

Reference

Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., & Johnson, N. A. (2008). Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test, 17(2), 211-235. https://doi.org/10.1007/s11749-008-0114-x -- available at https://stat.uw.edu/sites/default/files/files/reports/2008/tr537.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions