Skip to content

Regenerate forecast_outputs.rda: contains stale data with negative sample values #62

@annakrystalli

Description

@annakrystalli

Summary

forecast_outputs.rda contains a negative sample value (-2) that no longer exists in the source CSV. The example-complex-forecast-hub source data was fixed in commit 4a75b82 ("Regenerate model output with sample values >= 0"), but the .rda in hubExamples was never regenerated afterwards.

Reproduction

# Packaged data has a negative value:
data <- hubExamples::forecast_outputs
neg <- data[data$output_type == "sample" & data$value < 0, ]
neg
#>            model_id reference_date          target horizon location
#> 73 Flusight-baseline     2022-11-19 wk inc flu hosp       0       25
#>    target_end_date output_type output_type_id value
#> 73      2022-11-19      sample           2101    -2

# But regenerating from source produces no negatives:
hub_path <- "../example-complex-forecast-hub"
fresh <- hubData::connect_hub(hub_path) |>
  dplyr::filter(
    location %in% c("25", "48"),
    output_type == "sample",
    reference_date == "2022-11-19"
  ) |>
  hubData::collect_hub()
any(fresh$value < 0)
#> [1] FALSE

Impact

The hub's tasks.json specifies "minimum": 0 for sample values, so the negative value shouldn't be present. This causes issues downstream when applying scale transformations (e.g., sqrt, log_shift) that expect non-negative inputs.

Fix

Rerun data-raw/generate_example_forecast_data.R to regenerate the .rda file from the current source data.

Context

Discovered while implementing sample scoring in hubverse-org/hubEvals#94.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions