Summary
Before any release, every scoring function should be cross-validated against the Python scoringrules package on shared reference datasets. This ensures numerical agreement and catches implementation bugs.
Approach
- Generate reference datasets in Python (
scoringrules + numpy):
- 5 synthetic ensemble forecast scenarios (well-calibrated, biased, overdispersed, underdispersed, multivariate)
- Save as JSON or CSV
- Compute all scores with
scoringrules and save expected values
- Julia tests load the same datasets and assert
≈ (atol/rtol) agreement
Reference Data Location
Store in test/reference/ as JSON files that CI can consume without Python.
Acceptance Criteria
Summary
Before any release, every scoring function should be cross-validated against the Python
scoringrulespackage on shared reference datasets. This ensures numerical agreement and catches implementation bugs.Approach
scoringrules+ numpy):scoringrulesand save expected values≈(atol/rtol) agreementReference Data Location
Store in
test/reference/as JSON files that CI can consume without Python.Acceptance Criteria