Summary
Implement pre-ranking methods from Gneiting et al. (2008) to assess calibration of multivariate forecasts via univariate rank histograms.
Motivation
When forecasting multiple correlated quantities (e.g., multiple locations, age groups, or forecast horizons), standard univariate calibration checks miss dependency structure. A forecast could be perfectly calibrated marginally (each location's PIT is uniform) but miss the correlation structure entirely. Gneiting et al. (2008) propose "pre-ranking" approaches that reduce multivariate calibration assessment to univariate ranks, which can then be visualised via rank histograms. This could complement the new forecast_multivariate_sample class by providing calibration diagnostics.
Pre-ranking methods
The paper describes two main pre-ranking approaches:
-
Multivariate rank (Section 2.1): Based on pre-ranks using an orthant semi-ordering. For each point, count how many other points lie "to the lower left" in all dimensions. The observation's rank among samples is then interpreted as a calibration metric. Ties are resolved randomly.
-
Minimum spanning tree (MST) rank (Section 2.2): Remove each point in turn and compute the MST length of the remaining points. Rank observation by where its MST length falls among sample MST lengths.
Design options
This could be implemented in two ways
Option A: Include in score()
Add coverage columns to score.forecast_multivariate_sample() output:
multivariate_coverage_50, multivariate_coverage_90, etc.
- Each is 0/1 per forecast unit (like univariate
interval_coverage_*)
- Aggregation via
summarise_scores() gives coverage proportions
- Existing visualisation tools can display coverage deviations
This parallels how univariate interval coverage works and feels most natural.
Option B: Standalone function
get_multivariate_ranks(forecast_multivariate, method = "average_rank")
Returns the observation's rank among samples (1 to M+1). This can be passed to get_pit_histogram() for visualisation. Useful for detailed calibration diagnostics beyond coverage.
Reference
Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., & Johnson, N. A. (2008). Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test, 17(2), 211-235. https://doi.org/10.1007/s11749-008-0114-x -- available at https://stat.uw.edu/sites/default/files/files/reports/2008/tr537.pdf
Summary
Implement pre-ranking methods from Gneiting et al. (2008) to assess calibration of multivariate forecasts via univariate rank histograms.
Motivation
When forecasting multiple correlated quantities (e.g., multiple locations, age groups, or forecast horizons), standard univariate calibration checks miss dependency structure. A forecast could be perfectly calibrated marginally (each location's PIT is uniform) but miss the correlation structure entirely. Gneiting et al. (2008) propose "pre-ranking" approaches that reduce multivariate calibration assessment to univariate ranks, which can then be visualised via rank histograms. This could complement the new
forecast_multivariate_sampleclass by providing calibration diagnostics.Pre-ranking methods
The paper describes two main pre-ranking approaches:
Multivariate rank (Section 2.1): Based on pre-ranks using an orthant semi-ordering. For each point, count how many other points lie "to the lower left" in all dimensions. The observation's rank among samples is then interpreted as a calibration metric. Ties are resolved randomly.
Minimum spanning tree (MST) rank (Section 2.2): Remove each point in turn and compute the MST length of the remaining points. Rank observation by where its MST length falls among sample MST lengths.
Design options
This could be implemented in two ways
Option A: Include in score()
Add coverage columns to
score.forecast_multivariate_sample()output:multivariate_coverage_50,multivariate_coverage_90, etc.interval_coverage_*)summarise_scores()gives coverage proportionsThis parallels how univariate interval coverage works and feels most natural.
Option B: Standalone function
Returns the observation's rank among samples (1 to M+1). This can be passed to
get_pit_histogram()for visualisation. Useful for detailed calibration diagnostics beyond coverage.Reference
Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., & Johnson, N. A. (2008). Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test, 17(2), 211-235. https://doi.org/10.1007/s11749-008-0114-x -- available at https://stat.uw.edu/sites/default/files/files/reports/2008/tr537.pdf