Cross-validation against Python scoringrules

## Summary

Before any release, every scoring function should be cross-validated against the Python `scoringrules` package on shared reference datasets. This ensures numerical agreement and catches implementation bugs.

## Approach

1. Generate reference datasets in Python (`scoringrules` + numpy):
   - 5 synthetic ensemble forecast scenarios (well-calibrated, biased, overdispersed, underdispersed, multivariate)
   - Save as JSON or CSV
2. Compute all scores with `scoringrules` and save expected values
3. Julia tests load the same datasets and assert `≈` (atol/rtol) agreement

## Reference Data Location

Store in `test/reference/` as JSON files that CI can consume without Python.

## Acceptance Criteria

- [ ] Reference data generated and committed
- [ ] Every scoring function tested against Python reference
- [ ] Tolerance documented (expected: ~1e-12 for non-stochastic, ~1e-6 for sample-based)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-validation against Python scoringrules #8

Summary

Approach

Reference Data Location

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cross-validation against Python scoringrules #8

Description

Summary

Approach

Reference Data Location

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions