From 0e091c7f1bea7b6942a230d06724c1dd4a237b45 Mon Sep 17 00:00:00 2001
From: Gabor Szabo <shellsnake@icloud.com>
Date: Tue, 26 May 2026 07:22:31 +0200
Subject: [PATCH] docs(prp): refresh prp36 after feature frame v2 (#295)

Contract Refresh gate (PRP-36 Task 1) executed against dev @ f2bf7c8.
Patches PRP-36 to match what PRP-35 actually shipped vs what it assumed.

- Add PRPs/ai_docs/prp-35-final-contract-snapshot.md (one-off, authoritative)
- Item 7: backtesting V2 dispatch deferred to #299 (not in PRP-35 final scope)
- Task 8: re-scoped to V1 fold-path bucket metrics only; V2 lands with #299
- Item 10: DEFAULT_V2_GROUPS order corrected (CALENDAR at position 2)

No implementation code touched. PRP-36 execution remains gated until this lands.
---
 ...st-intelligence-B-model-zoo-backtesting.md |  33 +-
 .../ai_docs/prp-35-final-contract-snapshot.md | 286 ++++++++++++++++++
 2 files changed, 310 insertions(+), 9 deletions(-)
 create mode 100644 PRPs/ai_docs/prp-35-final-contract-snapshot.md

diff --git a/PRPs/PRP-36-forecast-intelligence-B-model-zoo-backtesting.md b/PRPs/PRP-36-forecast-intelligence-B-model-zoo-backtesting.md
index 443b3891..289ba42a 100644
--- a/PRPs/PRP-36-forecast-intelligence-B-model-zoo-backtesting.md
+++ b/PRPs/PRP-36-forecast-intelligence-B-model-zoo-backtesting.md
@@ -769,15 +769,15 @@ Task 7 — EXTEND backtesting metrics with RMSE + per-horizon-bucket helper:
   - EXTEND MetricsCalculator.calculate_all to include rmse alongside mae/smape/wape/bias.
   - DO NOT change aggregate_fold_metrics signature; ADD a sibling aggregate_bucket_metrics(fold_bucket_metrics: list[dict[str, dict[str, float]]]) -> dict[str, dict[str, float]] that returns per-bucket means across folds, skipping NaN.
 
-Task 8 — WIRE backtesting service to emit per-fold horizon_bucket_metrics:
+Task 8 — WIRE backtesting service to emit per-fold horizon_bucket_metrics (V1 fold path only):
+  - SCOPE: this Task lands bucket metrics on the EXISTING V1 fold path only. V2 backtesting dispatch was DEFERRED in PRP-35 (tracked at #299), so there is no V1/V2 fold-loop dispatch to preserve in `backtesting/service.py` today. V2 bucket-metric emission lands jointly with the V2 dispatch follow-up under #299 — DO NOT attempt to add it here.
   - app/features/backtesting/service.py:
     - For each fold, compute `horizon_offsets = [(test_dates[i] - test_dates[0]).days + 1 for i in range(len(test_dates))]` (test_dates[0] is horizon day 1).
     - After computing the existing per-fold metrics, call compute_bucket_metrics(actuals, predictions, horizon_offsets) and attach to FoldResult.horizon_bucket_metrics.
     - After the fold loop, compute aggregate_bucket_metrics across all fold_bucket_metric dicts → main_model_results.bucketed_aggregate_metrics.
     - Mirror for baseline_results when baselines are run alongside.
-  - PRESERVE the V1/V2 dispatch PRP-35 Task 13 added — no change to it.
   - PRESERVE leakage_check_passed flow.
-  - LOG: per-fold metric log lines now include feature_frame_version (already added by PRP-35) AND the bucket count.
+  - LOG: per-fold metric log lines include the bucket count. `feature_frame_version` will be added to the log line when V2 dispatch ships under #299; do NOT add it here.
 
 Task 9 — EXTEND backtesting schemas:
   - FoldResult: add horizon_bucket_metrics: dict[str, dict[str, float]] = Field(default_factory=dict, ...).
@@ -1267,10 +1267,20 @@ patch the relevant Task in this PRP file BEFORE writing any new code.
    `TrainRequest.feature_groups: list[str] | None = None` exist on the
    schema with the V1-rejects-feature_groups validator (the post-patch
    wording from this conversation). PRP-35 Task 7 promises this.
-7. `backtesting/service.py` already reads
-   `bundle.metadata.get("feature_frame_version", 1)` BEFORE the fold loop
-   AND dispatches the build_*_feature_rows_v2 calls at the V1 call sites
-   (lines 493 / 553 in the V1 codebase). PRP-35 Task 13 promises this.
+7. ⚠️ **VERIFIED FAILED — backtesting V2 dispatch was DEFERRED.** PRP-35
+   Task 13 ("read `feature_frame_version` from the fitted bundle BEFORE the
+   fold loop") was NOT part of PRP-35's final shipped scope — see
+   `PRPs/ai_docs/prp-35-final-contract-snapshot.md` § 10. The literal
+   wording is incompatible with `backtesting/service.py`'s architecture
+   (trains fresh per fold from `BacktestConfig.model_config_main`, never
+   loads a bundle); the re-designed surface — additive
+   `feature_frame_version` + `feature_groups` on `BacktestConfig` plus V2
+   fold feature construction — is tracked at issue **#299**. On `dev` today,
+   `backtesting/service.py` calls only the V1 `build_historical_feature_rows`
+   / `build_future_feature_rows` builders at lines 493 / 553; it does not
+   read `bundle.metadata` and has zero mentions of `feature_frame_version`.
+   Task 8 has been re-scoped to the V1 fold path only; V2 bucket-metric
+   emission lands jointly with #299.
 8. `forecasting/service.py` already writes `feature_frame_version` AND
    `feature_groups` into `extra_metadata` (and thence `model_run.runtime_info`
    via the registry create_run path). PRP-35 Task 9 promises this.
@@ -1280,8 +1290,13 @@ patch the relevant Task in this PRP file BEFORE writing any new code.
    `assemble_v2_historical_sidecar`, `assemble_v2_future_sidecar`. PRP-35
    Task 8 promises this. The model_zoo backtest path reuses them.
 10. `FeatureGroup` enum names match the values used in
-    `DEFAULT_V2_GROUPS = (TARGET_HISTORY, ROLLING, TREND, CALENDAR,
-    PRICE_PROMO, LIFECYCLE)`. PRP-35 Task 1 promises this.
+    `DEFAULT_V2_GROUPS = (TARGET_HISTORY, CALENDAR, ROLLING, TREND,
+    PRICE_PROMO, LIFECYCLE)`. Verified on `dev @ f2bf7c8`: membership
+    matches; canonical ordering has CALENDAR at position 2 (corrected
+    here — the original assumption had CALENDAR at position 4). Group
+    ordering is documentation-only; the canonical column manifest in
+    `app/shared/feature_frames/contract_v2.py` drives output order, not
+    the enum tuple. See snapshot § 3.
 
 If ANY assumption above fails Task 1 verification: open a `chore(docs):
 refresh PRP-36 against PRP-35 final contract (#<this-issue>)` PR that
diff --git a/PRPs/ai_docs/prp-35-final-contract-snapshot.md b/PRPs/ai_docs/prp-35-final-contract-snapshot.md
new file mode 100644
index 00000000..0633b424
--- /dev/null
+++ b/PRPs/ai_docs/prp-35-final-contract-snapshot.md
@@ -0,0 +1,286 @@
+# PRP-35 Final Contract Snapshot
+> Captured for PRP-36 Task 1 (Contract Refresh) and PRP-37 Task 1 (Contract Probe).
+> Source: `dev @ f2bf7c8` (PR #300 merge commit). Probed 2026-05-26.
+> Read this file before writing any PRP-36 / PRP-37 code — it is the
+> authoritative record of what PRP-35 actually shipped, not what the PRPs
+> assumed it would ship.
+
+---
+
+## 1. Import-probe (verbatim from PRP-36 Task 1)
+
+```
+$ uv run python -c "from app.shared.feature_frames import \
+    FEATURE_FRAME_VERSION_V2, FeatureGroup, \
+    build_historical_feature_rows_v2, build_future_feature_rows_v2, \
+    v2_feature_groups_dict, v2_feature_safety_classes; \
+    print('PRP-35 surface OK')"
+PRP-35 surface OK
+```
+
+`FEATURE_FRAME_VERSION_V2 = 2`. All six symbols resolve.
+
+---
+
+## 2. `FeatureGroup` enum — 11 members
+
+`app/shared/feature_frames/contract_v2.py`. Declaration order:
+
+```
+TARGET_HISTORY, CALENDAR, ROLLING, TREND, PRICE_PROMO, INVENTORY,
+LIFECYCLE, REPLENISHMENT, RETURNS, EXOGENOUS_WEATHER, EXOGENOUS_MACRO
+```
+
+---
+
+## 3. `DEFAULT_V2_GROUPS` — actual ordering
+
+```python
+DEFAULT_V2_GROUPS = (
+    TARGET_HISTORY, CALENDAR, ROLLING, TREND, PRICE_PROMO, LIFECYCLE,
+)  # 6 groups
+```
+
+⚠️ **Drift vs PRP-36 § "Unresolved Contract Assumptions" item 10.** PRP-36
+cites `(TARGET_HISTORY, ROLLING, TREND, CALENDAR, PRICE_PROMO, LIFECYCLE)`
+— CALENDAR at position 4. Shipped order has CALENDAR at position 2.
+Membership is identical; only ordering differs. Documentation-only drift
+(canonical column ordering is driven by the manifest, not by enum order),
+but PRP-36 item 10 should be patched to match.
+
+---
+
+## 4. Column counts (actual, probed at runtime)
+
+| View | Count |
+|------|-------|
+| `canonical_feature_columns_v2()` (default 6 groups) | **38** |
+| `canonical_feature_columns_v2(groups=tuple(g for g in FeatureGroup))` (all 11) | **51** |
+
+⚠️ HANDOFF.md and the PRP-35 PR body both cite "max 53 columns". Actual
+runtime is **51**. Doc-only drift; the running code is authoritative.
+
+### Default 38 columns (in canonical manifest order)
+
+```
+lag_1, lag_7, lag_14, lag_28, lag_56, lag_364,
+same_dow_mean_4, same_dow_mean_8,
+dow_sin, dow_cos, month_sin, month_cos,
+is_weekend, is_month_end,
+week_of_year_sin, week_of_year_cos,
+day_of_month_sin, day_of_month_cos,
+is_holiday,
+rolling_mean_7, rolling_mean_28, rolling_mean_90,
+rolling_median_28, rolling_std_28,
+trend_30, trend_90,
+rolling_mean_7_vs_28, rolling_mean_28_vs_prev_28,
+price_factor, promo_active, promo_discount_pct,
+promo_kind_markdown_active, promo_kind_bundle_active,
+days_since_launch, is_new_product, is_mature_product,
+is_discontinued, days_until_discontinue
+```
+
+---
+
+## 5. Pinned constants — `v2_pinned_constants()` actual output
+
+```python
+{
+    "exogenous_lags":               [1, 7, 14, 28, 56, 364],
+    "same_dow_mean_lookbacks":      [4, 8],
+    "rolling_windows":              [7, 28, 90],
+    "trend_windows":                [30, 90],
+    "stockout_windows":             [7, 28],
+    "replenishment_window":         [14],
+    "replenishment_qty_window":     [28],
+    "returns_windows":              [7, 28],
+    "returns_rate_window":          [28],
+    "inventory_availability_window":[28],
+    "history_tail_days":            [400],
+}
+```
+
+⚠️ PRP-36 cites only `EXOGENOUS_LAGS_V2`, `ROLLING_WINDOWS_V2`,
+`TREND_WINDOWS_V2`, `HISTORY_TAIL_DAYS_V2` in the HANDOFF / PRP. The
+shipped dict has **11 keys** (Phase-2 sidecar groups carry their own
+window constants). No PRP-36 task depends on the missing keys directly;
+flagged for completeness.
+
+---
+
+## 6. `TrainRequest` V2 fields — `app/features/forecasting/schemas.py:324-365`
+
+```python
+class TrainRequest(BaseModel):
+    model_config = ConfigDict(strict=True)
+    ...
+    # PRP-35 — opt-in, additive. NOT on ModelConfigBase (would orphan hashes).
+    feature_frame_version: int = Field(default=1, ge=1, le=2, ...)
+    feature_groups: list[str] | None = Field(default=None, ...)
+
+    @model_validator(mode="after")
+    def validate_feature_frame_version_and_groups(self) -> TrainRequest:
+        # V1 + feature_groups → ValueError → FastAPI 422
+        # V2 + unknown FeatureGroup name → ValueError → FastAPI 422
+        ...
+```
+
+✅ Matches PRP-36 Assumption 6.
+
+---
+
+## 7. Bundle `metadata` — V1 vs V2 shape
+
+Written in `app/features/forecasting/service.py:312-321` (V2) and `:335-341` (V1).
+
+### V1 bundles (`feature_frame_version=1`)
+
+```python
+extra_metadata = {
+    "feature_columns":       list[str],
+    "history_tail":          list[float],
+    "history_tail_dates":    list[str],   # ISO YYYY-MM-DD
+    "launch_date":           str | None,  # ISO
+    "feature_frame_version": 1,
+}
+```
+
+✅ Matches PRP-36 Assumptions 1, 2, 6, 8 for V1.
+
+### V2 bundles (`feature_frame_version=2`)
+
+V1 keys above **PLUS**:
+
+```python
+"feature_groups":           dict[str, list[str]],   # v2_feature_groups_dict(...)
+"feature_safety_classes":   dict[str, str],         # v2_feature_safety_classes(...)
+"feature_pinned_constants": dict[str, list[int]],   # v2_pinned_constants()
+```
+
+`feature_frame_version` becomes `2`.
+
+✅ Matches PRP-36 Assumptions 1, 2, 3, 4, 5.
+
+Back-compat read pattern is `bundle.metadata.get("feature_frame_version", 1)`
+— legacy bundles without the key default to V1.
+
+---
+
+## 8. `app/features/forecasting/v2_loaders.py` exports
+
+Verified via `grep '^async def \|^def '`:
+
+```
+async def load_lifecycle_attrs        (db, product_id) -> tuple[date|None, date|None, str|None]
+async def load_inventory_history      (db, store_id, product_id, start_date, end_date) -> dict
+async def load_replenishment_history  (db, store_id, product_id, start_date, end_date) -> tuple[list[date], list[int]]
+async def load_returns_history        (db, store_id, product_id, start_date, end_date) -> dict
+async def load_promotion_history      (db, store_id, product_id, start_date, end_date) -> dict
+async def load_exogenous_history      (db, store_id, start_date, end_date, signal_names=None) -> dict
+def       assemble_v2_historical_sidecar (**inputs) -> V2HistoricalSidecar
+def       assemble_v2_future_sidecar     (**inputs) -> V2FutureSidecar
+```
+
+✅ Matches PRP-36 Assumption 9 (all 8 names exposed). Module-level
+`__all__` re-exports them.
+
+---
+
+## 9. Scenarios slice V2 dispatch — `app/features/scenarios/`
+
+`feature_frame.py:235-341` — `build_future_frame(...)` now takes
+`feature_frame_version: int = 1` + `history_tail_dates: list[date] | None = None`
++ `feature_groups: dict[str, list[str]] | None = None`. Branches:
+- `version == 1` → preserved V1 `assemble_future_frame` path (byte-stable).
+- `version == 2` → new `_build_future_frame_v2` helper at `:344-471`,
+  delegating to `build_future_feature_rows_v2`.
+
+`service.py:231-262` — `_simulate_model_exogenous` reads
+`feature_frame_version` + `history_tail_dates` + `feature_groups` from
+`bundle.metadata`, threads them through `build_future_frame`. V1 bundles
+default to 1 via `.get(..., 1)`.
+
+✅ Matches PRP-35 acceptance criteria; PRP-36 has no Task that depends on
+this directly, but PRP-37 (UI) will.
+
+---
+
+## 10. ⚠️ Backtesting slice V2 dispatch — **NOT SHIPPED** (deferred)
+
+This is the **only failing assumption** from PRP-36 Task 1.
+
+### What PRP-36 Assumption 7 expects
+
+> `backtesting/service.py` already reads
+> `bundle.metadata.get("feature_frame_version", 1)` BEFORE the fold loop AND
+> dispatches the `build_*_feature_rows_v2` calls at the V1 call sites
+> (lines 493 / 553 in the V1 codebase). PRP-35 Task 13 promises this.
+
+### What `dev @ f2bf7c8` actually has
+
+Probed `app/features/backtesting/service.py` with `grep`:
+- **Zero** mentions of `feature_frame_version`.
+- **Zero** mentions of `build_*_feature_rows_v2`.
+- **Zero** reads of `bundle.metadata`.
+- Lines 493 / 553 still hard-call V1 `build_historical_feature_rows` /
+  `build_future_feature_rows`.
+
+### Why
+
+PRP-35 Task 13 was **DEFERRED** in PR #300. Reason: the literal Task 13
+wording ("read from fitted bundle BEFORE the fold loop") is incompatible
+with backtesting's architecture, which trains fresh per fold from
+`BacktestConfig.model_config_main` and never loads a bundle. The correct
+opt-in surface is a request-time field on `BacktestConfig` itself — a
+re-design Task 13 did not spec. Tracked at **GitHub issue #299** with full
+follow-up scope (additive `feature_frame_version` + `feature_groups` on
+`BacktestConfig`, V2 fold feature construction, leakage-focused V2
+backtest tests, loader-sharing design decision).
+
+### Impact on PRP-36
+
+- **Task 8** ("WIRE backtesting service to emit per-fold
+  `horizon_bucket_metrics`") instructs: *"PRESERVE the V1/V2 dispatch
+  PRP-35 Task 13 added — no change to it."* — there is no V1/V2 dispatch to
+  preserve. Task 8 needs to be either (a) reworded to "V2 dispatch lives in
+  follow-up #299; this Task only adds bucket metrics on the existing V1
+  path; the V2 bucket-metric emission lands when #299 lands" or (b)
+  deferred jointly with #299.
+- **Task 1** (Contract Refresh) — this very gate — must record the
+  failure here and, per PRP-36's own instruction (line 1286), open a
+  `chore(docs): refresh PRP-36 against PRP-35 final contract (#<this-issue>)`
+  PR before Task 2 starts.
+
+---
+
+## 11. Verdict — PRP-36 readiness
+
+| Assumption | Status | Notes |
+|------------|--------|-------|
+| 1. `bundle.metadata["feature_frame_version"]: int` defaults to 1 absent | ✅ | V2=2, V1=1 written; consumer pattern `.get(..., 1)` honoured |
+| 2. `bundle.metadata["feature_columns"]: list[str]` for V1 and V2 | ✅ | Both write the key |
+| 3. `bundle.metadata["feature_groups"]` for V2 only | ✅ | V2 writes `v2_feature_groups_dict(...)`; V1 omits |
+| 4. `bundle.metadata["feature_safety_classes"]` for V2 only | ✅ | V2 writes `v2_feature_safety_classes(...)`; V1 omits |
+| 5. `bundle.metadata["feature_pinned_constants"]` for V2 only | ✅ | V2 writes `v2_pinned_constants()`; V1 omits |
+| 6. `TrainRequest.feature_frame_version` + `feature_groups` + validator | ✅ | Exactly as spec'd |
+| 7. **Backtesting V2 dispatch lines 493/553** | ❌ | **DEFERRED** to issue #299; PRP-36 Task 8 wording needs patch |
+| 8. `forecasting/service.py` writes V2 metadata via `extra_metadata` | ✅ | Both V1 and V2 metadata blocks confirmed |
+| 9. `v2_loaders.py` exports 8 named functions | ✅ | All 8 exposed; module-level `__all__` re-exports |
+| 10. `DEFAULT_V2_GROUPS = (TARGET_HISTORY, ROLLING, TREND, CALENDAR, PRICE_PROMO, LIFECYCLE)` | ⚠️ | Membership correct; ordering differs (CALENDAR is 2nd, not 4th) — doc-only drift |
+
+### Required PRP-36 patches before Task 2
+
+1. **§ Unresolved Contract Assumptions item 7** — replace with the
+   "DEFERRED to #299" wording above; cite the architectural mismatch.
+2. **Task 8** — drop "PRESERVE the V1/V2 dispatch PRP-35 Task 13 added —
+   no change to it." Either: (a) scope Task 8 to the V1 fold path only and
+   defer V2 bucket-metric emission to #299, or (b) defer Task 8 itself
+   jointly with #299. Recommend (a) — bucket metrics on the V1 path are
+   independently valuable.
+3. **§ Unresolved Contract Assumptions item 10** — patch the ordering to
+   `(TARGET_HISTORY, CALENDAR, ROLLING, TREND, PRICE_PROMO, LIFECYCLE)`.
+4. **(Optional) HANDOFF / PR-body cite** of "53 max columns" — actual is
+   **51**. Cosmetic; PRP-36 does not assert this number.
+
+Once those patches land in a `chore(docs): refresh PRP-36 against PRP-35
+final contract` commit, Task 2 onward is unblocked.