Skip to content

Cross-validation against Python scoringrules #8

@jc-macdonald

Description

@jc-macdonald

Summary

Before any release, every scoring function should be cross-validated against the Python scoringrules package on shared reference datasets. This ensures numerical agreement and catches implementation bugs.

Approach

  1. Generate reference datasets in Python (scoringrules + numpy):
    • 5 synthetic ensemble forecast scenarios (well-calibrated, biased, overdispersed, underdispersed, multivariate)
    • Save as JSON or CSV
  2. Compute all scores with scoringrules and save expected values
  3. Julia tests load the same datasets and assert (atol/rtol) agreement

Reference Data Location

Store in test/reference/ as JSON files that CI can consume without Python.

Acceptance Criteria

  • Reference data generated and committed
  • Every scoring function tested against Python reference
  • Tolerance documented (expected: ~1e-12 for non-stochastic, ~1e-6 for sample-based)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions