Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions docs/user-guide/advanced-forecasting-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Advanced Forecasting Guide

This guide explains the interactive controls landed by **PRP-37 — Forecast
Intelligence C** (the operator-facing surface for the V2 feature contract and
the model zoo introduced by PRP-35 and PRP-36). It is RAG-indexable: ask the
Chat agent any question about model families, feature packs, horizon buckets,
or champion/challenger workflows and it will cite this document.

## Model families

ForecastLabAI groups its models into three families. The Family is a
property of the model code, not a label you pick — it is what the segmented
**Family** Tabs control on `/visualize/forecast` and `/visualize/backtest`
filter the Model Select against.

| Family | Members | When it shines |
|----------|----------------------------------------------------------------------------------------------------|----------------|
| Baseline | `naive`, `seasonal_naive`, `moving_average`, `weighted_moving_average`, `seasonal_average` | Sanity check, target-only history, very short windows |
| Tree | `regression` (HistGBR), `lightgbm`, `xgboost`, `random_forest` | Mid-to-long horizons with rich feature signal |
| Additive | `prophet_like` (Ridge additive), `trend_regression_baseline` | Strong yearly seasonality, interpretable coefficients |

Baselines do **not consume features**. Tree and additive families do — and only
those families surface the V2 feature-frame option.

## Feature frame: V1 vs V2

The **Feature frame** Select is the second control in the Train-a-new-model
row. It chooses how the model sees the past.

- **V1 — target-only.** The classic lags + same-DOW mean. Every model in
every family can train on V1.
- **V2 — feature-aware.** The PRP-35 contract. Adds eleven optional
*feature packs* (see below). Available for tree and additive families only;
baselines reject it with a tooltip explanation.

The backend default is V1; the UI only sends `feature_frame_version=2` when
the operator explicitly picks V2. A V1 train with `feature_groups` is
rejected by the backend with a 422.

## Feature packs (V2 only)

When V2 is picked, the **Feature packs** toggle row appears. Each pack is a
named subset of the V2 feature columns:

| Pack ID | What it carries |
|----------------------|------------------|
| `target_history` | Lag features and same-day-of-week mean |
| `rolling` | Rolling means over multiple windows |
| `trend` | 30-day and 90-day trend |
| `calendar` | Day-of-week, month, sin/cos calendar signals |
| `price_promo` | Price level and promotion indicators |
| `inventory` | On-hand stock and stockout flags |
| `lifecycle` | Product lifecycle stage |
| `replenishment` | Inbound stock cadence |
| `returns` | Return intensity |
| `exogenous_weather` | Weather signals (when seeded) |
| `exogenous_macro` | Macro signals (when seeded) |

Use the **Use defaults** button to load the six packs the V2 contract uses by
default (`target_history`, `calendar`, `rolling`, `trend`, `price_promo`,
`lifecycle`). The **Clear** button removes every pack; submitting with an
empty selection forwards `feature_groups: undefined` to the backend (treated
as the default set on the server).

A pack may carry a per-row safety chip (`Safe`, `Conditionally safe`,
`Requires supplied data`). The chip is rendered when the server returns a
`feature_safety_classes` map for the run. A `Requires supplied data` chip
means the pack reads a column the production pipeline must supply (e.g.
inventory or replenishment) — promote a run that uses it only if your
production pipeline can keep that column populated.

## Per-horizon-bucket metrics

The backtest visualization now surfaces a **Per-horizon-bucket** card under
the existing fold-metric chart, rendered only when the response carries
`bucketed_aggregated_metrics`. It splits the forecast error by horizon
distance:

| Bucket id | Horizon range |
|-------------|----------------|
| `h_1_7` | Days 1-7 |
| `h_8_14` | Days 8-14 |
| `h_15_28` | Days 15-28 |
| `h_29_plus` | Days 29+ |

Empty buckets are dropped from the response. Unknown bucket ids (a forward-
compatible bucket from a newer backend) are appended to the end of the table
alphabetically.

Pick the displayed metric (MAE / sMAPE / WAPE / Bias / RMSE) with the
Select to the right of the card title. **RMSE** is a key inside the
`aggregated_metrics` dict — surfaced as a fourth tile on the Aggregated
Metrics card when the backend emits it.

## Baseline vs feature-aware comparison

When the backtest response carries `baseline_results` (a non-empty list of
ModelBacktestResult rows), a **Baseline vs feature-aware** table renders
below the bucket card. Every baseline runs on the **same folds, identical
splits** as the main model — so MAE / sMAPE / WAPE / RMSE comparisons are
apples-to-apples. Lower wins.

## Champion compatibility

Two runs are **comparable** for champion/challenger evaluation iff
ALL three hold:

1. Same grain (`store_id`, `product_id`).
2. Overlapping data windows.
3. Same `feature_frame_version` (legacy runs without the field default to V1).

The Compare runs page renders a **Champion compatibility** badge that
surfaces the verdict, and the metrics diff table adds a **Feature frame
version** row when at least one of the two runs declares it.

## Stale aliases

The Control Center page now surfaces stale aliases as their own card with a
**Reason** chip per row:

| Reason chip | What it means |
|-----------------------------------|-----------------------------------------------------------------------|
| `newer success run` | A newer successful run for this grain has landed. |
| `artifact not verified` | The alias's run artifact failed SHA-256 verification. |
| `run not success` | The alias is pointing at a non-success run (failed or archived). |
| `V mismatch` | The newest comparable run uses a different `feature_frame_version`. |

Alongside each chip, the row shows the **Alias V** and **Comparable V**
columns so the operator can read the version drift at a glance.

## Safer Promote dialog

The Control Center's **Promote** action now opens a confirmation dialog that
gates the promotion on three conditions:

1. **Artifact verifies.** The dialog auto-fetches the candidate run's
SHA-256 verification result. A failure renders a red callout and the
Promote button stays disabled — no operator override.
2. **Worse-WAPE acknowledgement.** When the candidate's latest WAPE is
HIGHER than the current champion's, a red callout appears with the
exact deltas and a checkbox the operator must explicitly tick.
3. **Feature-frame-version mismatch acknowledgement.** When the candidate's
`feature_frame_version` differs from the champion's, an amber callout
warns that the alias's feature contract will silently change. A
checkbox the operator must tick releases the Promote button.

The alias name input remains; the dialog defaults the alias to
`production`. Cancel preserves no state — both acknowledgements reset.

## Batch sweep presets

The Batch Runner page now hosts a **Sweep preset** Select with five built-in
presets. Picking a preset overwrites the matrix; the matrix can still be
hand-edited afterward.

| Preset | What it loads |
|---------------------------------|---------------|
| Quick baseline sweep | All five baseline models on V1 |
| Feature-aware comparison | Regression / LightGBM / XGBoost / RandomForest / Prophet-like on V2 with default packs |
| Champion/challenger refresh | Champion + strongest challenger from the registry (supplied by the page) |
| Stockout-sensitive products | Regression on V2 with the inventory + replenishment + returns packs |
| High-WAPE recovery | Every feature-aware model on V2 with default packs |

Below the preset Select is the **Sweep matrix** picker — a checkbox grid of
model × V1/V2. Toggling a V2 cell adds a per-row feature-packs editor below
the grid. The matrix caps at 24 rows by default (configurable on the
picker).

## Anti-patterns

- **Do not** pick V2 for a baseline model — V2 has no effect on a model that
ignores features. The UI disables this combination with a tooltip.
- **Do not** promote a worse run without checking the explicit
acknowledgement checkbox. The gate exists for a reason.
- **Do not** promote across a feature-frame-version boundary without
verifying your production pipeline supplies the columns the new V demands.
- **Do not** read RMSE from `aggregated_metrics["rmse"]` for old jobs —
RMSE landed in PRP-36, and pre-PRP-36 backtest jobs in the registry will
not carry it. The UI omits the RMSE tile in that case.
29 changes: 25 additions & 4 deletions docs/user-guide/dashboard-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,13 @@ row opens a detail page.
and (for non-baseline runs) the canonical feature columns plus a feature
importance panel — see
[Advanced Model Metadata](./feature-reference.md#advanced-model-metadata) in the
Feature Reference for the data model and error semantics. Two runs can be
compared side by side (config diff, metrics diff with deltas, and same-family
feature importance side-by-side).
Feature Reference for the data model and error semantics. The detail page also
hosts a **Feature frame** panel that renders V1/V2 + per-group columns +
per-column safety classes when the run carries that metadata (PRP-35/36).
Two runs can be compared side by side: a **Champion compatibility** badge
surfaces the comparable-run verdict (same grain + overlapping data windows +
same feature_frame_version), and the metrics-diff table now includes a
**Feature frame version** row.
- **Jobs** (`/explorer/jobs`) — submitted train/predict/backtest jobs. A job
**detail page** shows parameters, result JSON, error details, the linked run, a
cancel action, and live status polling.
Expand All @@ -59,8 +63,25 @@ The Visualize menu holds the analytical, chart-heavy pages.
inventory required to cover it. Includes a lead-time selector and a single-SKU
drill-in. Answers "how much will this SKU sell, and do I have enough stock?"
- **Forecast** (`/visualize/forecast`) — visualizes a model's horizon predictions.
The top of the page now also hosts a **Train a new model** card: a segmented
family picker (Baseline / Tree / Additive), a model-type Select filtered by the
picked family, a Feature frame V1/V2 Select, and (when V2 is picked) a feature-
pack toggle group. See [Advanced Forecasting Guide](./advanced-forecasting-guide.md).
- **Backtest Results** (`/visualize/backtest`) — charts backtest folds and the
accuracy metrics (MAE, sMAPE, WAPE, bias, stability) for a model run.
accuracy metrics (MAE, sMAPE, WAPE, bias, stability) for a model run. When the
backtest response carries per-horizon-bucket metrics, a separate **Per-horizon-
bucket** card surfaces those (`Days 1-7 / 8-14 / 15-28 / 29+`) and a metric
switcher (MAE / sMAPE / WAPE / Bias / RMSE). When the response carries
baseline competitors, a **Baseline vs feature-aware** comparison table renders.
- **What-If Planner** (`/visualize/planner`) — the existing scenario simulation
view; impact card now carries a **method badge**
(`model-driven re-forecast` vs `heuristic adjustment`) so the planner
always sees how the scenario was produced.
- **Batch Runner** (`/visualize/batch`) — the existing batch runner now hosts a
**Sweep preset** Select (5 presets — quick baseline sweep, feature-aware
comparison, champion/challenger refresh, stockout-sensitive products, high-WAPE
recovery) and a **Sweep matrix** picker (multi-model × V1/V2). Picking a preset
prefills the matrix; rows can still be hand-edited.

## Knowledge (`/knowledge`)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import { afterEach, beforeAll, describe, expect, it } from 'vitest'
import { cleanup, render, screen } from '@testing-library/react'
import { BacktestHorizonBucketsChart } from './backtest-horizon-buckets-chart'

// Recharts' ResponsiveContainer requires ResizeObserver; jsdom doesn't ship it.
beforeAll(() => {
if (typeof globalThis.ResizeObserver === 'undefined') {
globalThis.ResizeObserver = class {
observe() {}
unobserve() {}
disconnect() {}
} as unknown as typeof globalThis.ResizeObserver
}
})

afterEach(cleanup)

describe('BacktestHorizonBucketsChart', () => {
it('renders empty state when bucketed is undefined', () => {
render(
<BacktestHorizonBucketsChart bucketed={undefined} metric="wape" />,
)
expect(screen.getByTestId('horizon-buckets-chart-empty')).toBeTruthy()
})

it('renders empty state for an empty bucketed dict', () => {
render(<BacktestHorizonBucketsChart bucketed={{}} metric="wape" />)
expect(screen.getByTestId('horizon-buckets-chart-empty')).toBeTruthy()
})

it('renders the chart container when bucketed has data', () => {
render(
<BacktestHorizonBucketsChart
bucketed={{
h_1_7: { wape: 0.12 },
h_29_plus: { wape: 0.41 },
}}
metric="wape"
/>,
)
expect(screen.getByTestId('horizon-buckets-chart')).toBeTruthy()
})
})
127 changes: 127 additions & 0 deletions frontend/src/components/charts/backtest-horizon-buckets-chart.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
import { Bar, BarChart, CartesianGrid, XAxis, YAxis } from 'recharts'
import {
ChartConfig,
ChartContainer,
ChartTooltip,
ChartTooltipContent,
} from '@/components/ui/chart'
import {
Card,
CardContent,
CardDescription,
CardHeader,
CardTitle,
} from '@/components/ui/card'
import { labelForBucket, sortBuckets } from '@/lib/horizon-bucket-utils'

/**
* PRP-37 Slice C — per-horizon-bucket bar chart. Sibling to BacktestFoldsChart
* (the data shape is different — bucket-aggregate vs per-fold — so this is
* NOT a metricKey toggle on the existing component). Empty state matches the
* HorizonBucketTable empty state.
*/

export type HorizonBucketChartMetric =
| 'mae'
| 'smape'
| 'wape'
| 'bias'
| 'rmse'

interface BacktestHorizonBucketsChartProps {
bucketed:
| Record<string, Record<string, number>>
| null
| undefined
metric: HorizonBucketChartMetric
height?: number
title?: string
description?: string
}

const METRIC_COLOR: Record<HorizonBucketChartMetric, string> = {
mae: 'var(--chart-1)',
smape: 'var(--chart-2)',
wape: 'var(--chart-3)',
bias: 'var(--chart-4)',
rmse: 'var(--chart-5)',
}

const METRIC_LABEL: Record<HorizonBucketChartMetric, string> = {
mae: 'MAE',
smape: 'sMAPE',
wape: 'WAPE',
bias: 'Bias',
rmse: 'RMSE',
}

export function BacktestHorizonBucketsChart({
bucketed,
metric,
height = 240,
title = 'Metric by horizon bucket',
description,
}: BacktestHorizonBucketsChartProps) {
if (!bucketed || Object.keys(bucketed).length === 0) {
return (
<Card>
<CardHeader>
<CardTitle>{title}</CardTitle>
{description && <CardDescription>{description}</CardDescription>}
</CardHeader>
<CardContent>
<p
className="text-muted-foreground text-sm"
data-testid="horizon-buckets-chart-empty"
>
No horizon-bucket metrics available.
</p>
</CardContent>
</Card>
)
}

const sortedIds = sortBuckets(Object.keys(bucketed))
const data = sortedIds.map((id) => ({
bucket: id,
label: labelForBucket(id),
value: bucketed[id]?.[metric] ?? 0,
}))

const chartConfig: ChartConfig = {
value: {
label: METRIC_LABEL[metric],
color: METRIC_COLOR[metric],
},
}

return (
<Card>
<CardHeader>
<CardTitle>{title}</CardTitle>
{description && <CardDescription>{description}</CardDescription>}
</CardHeader>
<CardContent>
<ChartContainer
config={chartConfig}
className="w-full"
style={{ height: `${height}px` }}
data-testid="horizon-buckets-chart"
>
<BarChart data={data} accessibilityLayer>
<CartesianGrid strokeDasharray="3 3" />
<XAxis dataKey="label" tickLine={false} axisLine={false} />
<YAxis tickLine={false} axisLine={false} />
<ChartTooltip content={<ChartTooltipContent />} />
<Bar
dataKey="value"
name={METRIC_LABEL[metric]}
fill={METRIC_COLOR[metric]}
radius={[4, 4, 0, 0]}
/>
</BarChart>
</ChartContainer>
</CardContent>
</Card>
)
}
Loading