Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
3ee6aac
feat(evaluations): add @agenta/evaluations package
ardaerzin Jun 7, 2026
cf33dac
refactor(entities): move evaluationRun/queue API to Fern client
ardaerzin Jun 7, 2026
4283b7f
refactor(frontend): route eval creation through @agenta/evaluations
ardaerzin Jun 7, 2026
f7c5f87
fix(entities): stop silently stripping unknown fields in eval schemas…
ardaerzin Jun 7, 2026
a507c0b
refactor(entities): pass projectId into eval run molecule instead of …
ardaerzin Jun 7, 2026
37f9c36
test(evaluations): add gated backend integration tests for eval atoms…
ardaerzin Jun 7, 2026
e4b8c7c
refactor(frontend): route eval per-run batcher through the package Fe…
ardaerzin Jun 8, 2026
0e9280a
refactor: delete previewRunBatcher, read eval runs through the packag…
ardaerzin Jun 8, 2026
8bad3fa
refactor(frontend): dedup queryStepResults onto the package Fern query
ardaerzin Jun 8, 2026
ecf30a9
fix(frontend): register @agenta/evaluations in Next transpilePackages
ardaerzin Jun 8, 2026
c6f6d6e
fix(frontend): register @agenta/evaluations for the EE app
ardaerzin Jun 8, 2026
d8c35a6
fix(frontend): stop eval-run Zod schema from nuking runs on unknown m…
ardaerzin Jun 8, 2026
2181e58
test(frontend): make eval-run integration test representative of real…
ardaerzin Jun 8, 2026
8248415
test(frontend): add read-only drift smoke test for existing eval runs
ardaerzin Jun 8, 2026
69b1c72
test(frontend): cover eval molecules against populated real backend data
ardaerzin Jun 8, 2026
0774155
fix(frontend): persist evaluator-revision write-back via Fern editRun
ardaerzin Jun 8, 2026
63b0214
refactor(frontend): Fern-migrate eval result mutations to the real ba…
ardaerzin Jun 8, 2026
49a0aa3
refactor(frontend): Fern-migrate the eval runs LIST fetch to the real…
ardaerzin Jun 8, 2026
3d1aaf1
refactor(frontend): Fern-migrate eval scenario + run-status service
ardaerzin Jun 8, 2026
d9f573d
refactor(frontend): Fern-migrate eval invocations persistence helpers
ardaerzin Jun 8, 2026
2a43765
refactor(frontend): Fern-migrate the live table run-delete
ardaerzin Jun 8, 2026
1da72fb
refactor(frontend): Fern-migrate eval metrics query + delete-modal ru…
ardaerzin Jun 8, 2026
ab452ef
docs(frontend): add evaluations→packages migration architecture plan
ardaerzin Jun 8, 2026
bcd26df
docs(frontend): add zero-OSS-residue cleanup gate to evaluations migr…
ardaerzin Jun 8, 2026
af1d3df
docs(frontend): require real-API/real-atom integration tests per work…
ardaerzin Jun 8, 2026
ec747be
feat(frontend): scaffold @agenta/evaluations-ui package (WP-0)
ardaerzin Jun 8, 2026
c1abc61
refactor(frontend): promote evaluationScenario to a first-class entit…
ardaerzin Jun 8, 2026
6e98274
docs(frontend): move eval-run ETL into the evaluations packages (plan…
ardaerzin Jun 8, 2026
1eb36b9
docs(frontend): re-scope WP-1 — session engine takes an injected scen…
ardaerzin Jun 9, 2026
155582a
feat(frontend): add generic evaluation session engine (WP-1, additive)
ardaerzin Jun 9, 2026
4cf5c2f
feat(frontend): reactive scenario-source injection for the session en…
ardaerzin Jun 9, 2026
cdd10f7
refactor(frontend): re-point annotationSessionController onto the eva…
ardaerzin Jun 9, 2026
715d19a
fix(frontend): sort annotation queues table newest-first by created_at
ardaerzin Jun 9, 2026
addb711
refactor(frontend): extract generic scenario-data/evaluator/metrics s…
ardaerzin Jun 9, 2026
9709261
refactor(frontend): move list-column tier to @agenta/evaluations, re-…
ardaerzin Jun 9, 2026
8f43d45
test(frontend): integration test driving shipped evaluations scenario…
ardaerzin Jun 9, 2026
5040bd2
docs(frontend): track batch-add-to-queue time-window bug as a migrati…
ardaerzin Jun 9, 2026
49e6d2b
refactor(frontend): extract metric/schema extraction to @agenta/evalu…
ardaerzin Jun 9, 2026
68e675d
test(frontend): integration test driving shipped evaluations metricSc…
ardaerzin Jun 9, 2026
9da4f0d
refactor(frontend): move run-list store + generic table to evaluation…
ardaerzin Jun 9, 2026
2e3543f
test(frontend): integration test driving shipped evaluations run-list…
ardaerzin Jun 9, 2026
083819f
refactor(frontend): move headless eval-run ETL primitives to @agenta/…
ardaerzin Jun 9, 2026
b0787eb
refactor(frontend): move clean ETL filtering hooks OSS→@agenta/evalua…
ardaerzin Jun 10, 2026
fa197a2
refactor(frontend): move buildRunIndex + evaluationKind to @agenta/ev…
ardaerzin Jun 10, 2026
e7c4d8e
refactor(frontend): promote eval-needed shared types/utils to package…
ardaerzin Jun 10, 2026
bc39420
refactor(frontend): move active eval mutation-service APIs → @agenta/…
ardaerzin Jun 10, 2026
3061a60
refactor(frontend): move usePreviewEvaluations → @agenta/evaluations/…
ardaerzin Jun 10, 2026
6f29fc4
docs(frontend): persist entity-state consolidation plan; record WP-4 …
ardaerzin Jun 10, 2026
5fa85a1
refactor(frontend): eval-run injection seam module + move eval types/…
ardaerzin Jun 10, 2026
7f99580
fix(frontend): type-check EvalRunDetails atom layer in place (WP-4e-2…
ardaerzin Jun 10, 2026
cdaee91
refactor(frontend): relocate EvalRunDetails atom layer → @agenta/eval…
ardaerzin Jun 10, 2026
19ec8cd
refactor(frontend): move EvalRunDetails ETL hooks/UI/tableRows out of…
ardaerzin Jun 10, 2026
98eaac7
refactor(frontend): move evaluationPreviewTableStore + useScenarioLiv…
ardaerzin Jun 10, 2026
83169f0
refactor(frontend): move EvaluationRunsTablePOC data layer → @agenta/…
ardaerzin Jun 10, 2026
608173c
Merge origin/main (v0.103.1) into fe-chore/move-evals-to-packages
ardaerzin Jun 10, 2026
b6d610d
refactor(frontend): clear eval metrics residue from OSS (WP-4 residue A)
ardaerzin Jun 10, 2026
ad9f050
refactor(frontend): clear remaining eval ledger residue from OSS (WP-…
ardaerzin Jun 10, 2026
4827610
docs(frontend): §11.1 batch-add root cause falsified by inspection — …
ardaerzin Jun 10, 2026
43523a6
fix(api,frontend): order annotation queues by created_at with correct…
ardaerzin Jun 11, 2026
e6289a8
docs(frontend): close §11.1 — transport verified correct, over-add wa…
ardaerzin Jun 11, 2026
5ab8fa0
test(frontend): restore combined paginatedStore+molecule leak test in…
ardaerzin Jun 11, 2026
96165a7
refactor(frontend): consolidate EvaluationRunsTablePOC components + a…
ardaerzin Jun 11, 2026
4fdb03a
docs(frontend): track §11.6 — eval render trees still on the OSS Infi…
ardaerzin Jun 11, 2026
c2a420b
refactor(frontend): switch eval render trees onto @agenta/ui Infinite…
ardaerzin Jun 11, 2026
c7baf6d
refactor(frontend): delete the stale OSS InfiniteVirtualTable copy (§…
ardaerzin Jun 11, 2026
ec390b0
docs(frontend): close §11.6 — OSS InfiniteVirtualTable copy deleted, …
ardaerzin Jun 11, 2026
e31529d
refactor(frontend): move MetricDetails popover/charts OSS→@agenta/eva…
ardaerzin Jun 11, 2026
554954b
feat(frontend): add eval-view host registry seam infra (WP-4h-2)
ardaerzin Jun 11, 2026
329aa64
refactor(frontend): relocate eval run-list view OSS→@agenta/evaluatio…
ardaerzin Jun 12, 2026
0f09fb9
docs(frontend): track WP-4h progress — seam infra + RunsTable relocat…
ardaerzin Jun 12, 2026
c179f3b
docs(frontend): bank WP-4h-5 RunDetails execution recipe (atomic whol…
ardaerzin Jun 12, 2026
55639c3
refactor(frontend): relocate RunDetails OSS→@agenta/evaluations-ui (W…
ardaerzin Jun 12, 2026
e52578c
fix(frontend): re-point OSS References to @agenta/shared/utils for re…
ardaerzin Jun 12, 2026
901195b
fix(frontend): wrap function-valued eval-run injection seams to avoid…
ardaerzin Jun 13, 2026
7eb5fc6
fix(frontend): make onboarding-widget injection atoms writable primit…
ardaerzin Jun 13, 2026
4081b15
refactor(frontend): de-globalize eval focus-drawer mount (WP-4h follo…
ardaerzin Jun 13, 2026
dfac71b
docs(frontend): record eval focus-drawer de-globalization + lesson (W…
ardaerzin Jun 13, 2026
ebf7c08
fix(frontend): enable Immer MapSet for @agenta/ui table column-visibi…
ardaerzin Jun 13, 2026
53f7bf4
fix(frontend): don't app-scope the project-level evaluation runs list
ardaerzin Jun 13, 2026
728f9d5
fix(frontend): make dark-mode compare-row tints opaque so sticky colu…
ardaerzin Jun 13, 2026
e7faf72
fix(frontend): align @agenta/evaluations-ui recharts to ^3.1.0 (chart…
ardaerzin Jun 13, 2026
f7ebfab
fix(frontend): add @agenta/evaluations(-ui) to Tailwind content globs
ardaerzin Jun 13, 2026
03cde32
fix(frontend): unwrap testcase entity .data for eval scenario drawer …
ardaerzin Jun 13, 2026
7964e0a
chore(frontend): dedupe eval slop — drop dead deprecated facades, uni…
ardaerzin Jun 13, 2026
70d9a1c
refactor(entities): collapse result/metric scenario-cache molecules i…
ardaerzin Jun 13, 2026
c1fd68a
refactor(annotation): dedupe column walkers and base-row indexers
ardaerzin Jun 13, 2026
ba93170
refactor(evaluations): route scenario-steps fetch through the typed r…
ardaerzin Jun 13, 2026
bed79bf
refactor(evaluations): delete orphaned runList paginated store
ardaerzin Jun 13, 2026
0c81a0c
refactor(evaluations): move run-view injection seams from headless pk…
ardaerzin Jun 14, 2026
7f67493
refactor(evaluations): split oversized metrics + scenarioColumnValues…
ardaerzin Jun 14, 2026
6825ab8
fix(evaluations): delete dead run-level metrics path (applyAggregates…
ardaerzin Jun 14, 2026
8a64e5f
refactor(annotation): route eval axios calls through entities Fern wr…
ardaerzin Jun 14, 2026
2cdb04e
merge: integrate main (eval base shipped in v0.103.x) into move-evals…
ardaerzin Jun 14, 2026
321c557
refactor(entities): type eval Fern request bodies against generated r…
ardaerzin Jun 14, 2026
67e7b51
Merge remote-tracking branch 'origin/main' into fe-chore/move-evals-t…
ardaerzin Jun 15, 2026
606660b
fix(evaluations): remove dead metricProcessor run-level-gap branch (R…
ardaerzin Jun 15, 2026
83e33ba
refactor(entities,annotation): remove dead eval surface + dedup queue…
ardaerzin Jun 15, 2026
ebcfba7
refactor(annotation): extract add-to-testset/sync export out of sessi…
ardaerzin Jun 15, 2026
1e82dce
refactor(annotation): delegate metric persistence to shared upsertSce…
ardaerzin Jun 15, 2026
ef3754a
refactor(evaluations): derive PreviewEvaluationType from canonical Ev…
ardaerzin Jun 15, 2026
a0c6a7c
refactor(evaluations): route scenarioData metric query through typed …
ardaerzin Jun 15, 2026
578386e
refactor(entities): rename simpleQueue EvaluationStatus type -> Simpl…
ardaerzin Jun 15, 2026
53cff0c
refactor(evaluations,entities,annotation): remove dead atoms/helpers …
ardaerzin Jun 15, 2026
9262cd0
refactor(evaluations): remove dead etl scaffolding (~723 LOC)
ardaerzin Jun 15, 2026
33c8aff
merge: integrate main (release v0.103.5) into move-evals-to-packages
ardaerzin Jun 15, 2026
c86eeb9
chore(evaluations): remove stray ungated console logging (Q10)
ardaerzin Jun 15, 2026
7adb510
fix(frontend): mirror function-valued injected atoms without invoking…
ardaerzin Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 24 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# TODOS

## Backend: atomic create-evaluation-run endpoint

- **What:** Add a transactional backend endpoint that creates an evaluation run plus its
scenarios and step results in a single operation (`createEvaluationRunAtomic` or
equivalent), instead of the current separate `createRuns` → `createScenarios` →
`setResults`/steps calls.
- **Why:** The frontend evaluations migration (branch `fe-chore/move-evals-to-packages`)
has to build a client-side orchestration controller with rollback (`deleteRuns` on
partial failure) purely because no atomic create exists. An atomic endpoint deletes the
entire FE rollback path and the orphaned-scenario / rollback-failure edge cases.
- **Pros:** FE `createEvaluationRun` controller collapses to one call; no orphan runs; no
rollback-failure reconciliation story; transactional integrity owned where it belongs
(the DB), per "systems over heroes."
- **Cons:** Backend work + a new endpoint contract; FE must then migrate off the
multi-call path (small follow-up).
- **Context:** During `/plan-eng-review` (2026-06-07) the FE chose controller-owned
rollback as the pragmatic FE-only solution. This TODO is the documented path to remove
that complexity later. See design doc
`~/.gstack/projects/Agenta-AI-agenta/ardaerzin-fe-chore-move-evals-to-packages-design-20260607-192109.md`
(Eng Review Decisions → run-creation orchestration).
- **Depends on / blocked by:** Backend team; relates to the FE evaluations migration
landing first (FE rollback is the interim state).
3 changes: 2 additions & 1 deletion api/oss/src/apis/fastapi/evaluations/router.py
Original file line number Diff line number Diff line change
Expand Up @@ -2842,8 +2842,9 @@ async def query_simple_queues(

windowing = compute_next_windowing(
entities=queues,
attribute="id",
attribute="created_at",
windowing=queue_query_request.windowing,
order="descending",
)

return SimpleQueuesResponse(
Expand Down
4 changes: 3 additions & 1 deletion api/oss/src/dbs/postgres/evaluations/dao.py
Original file line number Diff line number Diff line change
Expand Up @@ -2829,7 +2829,9 @@ async def query_queues(
stmt = apply_windowing(
stmt=stmt,
DBE=EvaluationQueueDBE,
attribute="id", # UUID7
# created_at, not id: backfilled queues carry back-dated
# timestamps, so UUID7 id order diverges from created_at.
attribute="created_at",
order="descending", # jobs-style
windowing=windowing,
)
Expand Down
8 changes: 7 additions & 1 deletion api/oss/src/dbs/postgres/shared/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,13 @@ def apply_windowing(
if order_attribute is id_attribute:
stmt = stmt.order_by(windowing_order)
else:
stmt = stmt.order_by(windowing_order, id_attribute)
# The id tie-break must follow the primary direction: the descending
# cursor filters `id < next` on equal timestamps, so ties must be
# emitted in descending id order (and ascending for `id > next`).
if windowing_order is descending_order:
stmt = stmt.order_by(windowing_order, id_attribute.desc())
else:
stmt = stmt.order_by(windowing_order, id_attribute.asc())

if windowing.limit:
stmt = stmt.limit(windowing.limit)
Expand Down
147 changes: 147 additions & 0 deletions docs/designs/entity-state-consolidation-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# OSS entity-state → `@agenta/entities` molecules consolidation

Status: **PLAN — not started.** A standalone platform initiative, surfaced while executing
WP-4 of the [evaluations→packages migration](./evaluations-packages-migration-plan.md). It is a
**prerequisite for WP-4e** (moving the eval-run atoms to `@agenta/evaluations`), but it is much
larger than the eval migration and must be run as its own deliberate, human-in-the-loop effort.

Branch context discovered on: `fe-chore/move-evals-to-packages`, 2026-06-10.

---

## 0. Why this exists (the trigger)

WP-4e (move `EvalRunDetails/atoms` → `@agenta/evaluations`) is blocked: ~18 of those atoms import
OSS entity-state (`@/oss/state/entities/{testcase,testset,shared}`). That OSS entity-state is a
**separate, older, DIVERGENT implementation that parallels the modern `@agenta/entities` molecules
that already exist** — not the same code awaiting a move. So WP-4e cannot "promote" it without
either (a) duplicating the package molecules, or (b) re-platforming OSS consumers onto the existing
molecules. (b) is the right end-state and is what this plan covers.

**Two ways out of the WP-4e block:**
1. **Injection seams** (recommended for the eval migration in isolation): the eval atoms receive
testcase/testset/References/workspace data as injected inputs from the OSS `-ui` provider; the
OSS entity layer is untouched. Unblocks WP-4e without this consolidation.
2. **This consolidation** (the broader platform goal): kill the divergent OSS entity-state, standardize
the whole app on the `@agenta/entities` molecules. Worthwhile debt-reduction, but app-wide.

This doc captures (2).

---

## 1. The core hazard (read first)

**`tsc` will NOT catch the biggest regression risk.** The OSS testcase entity uses a *flattened*
shape (`FlattenedTestcase` — user fields hoisted to the row root); the package `testcaseMolecule`
uses a *nested* shape (`data: { ...fields }`). Re-pointing an importer from the OSS flat shape to the
package nested shape **compiles cleanly but silently breaks rendering at runtime** (cells read
`row.country`; package gives `row.data.country`). ~273 importers across **playground, testsets,
annotation, eval, drawers, settings** consume this. Therefore:

- **No step of this plan is "done" on `tsc`/`lint` green alone** — each importer-touching step needs
**runtime/behavioral QA** of the affected feature.
- The OSS-deletion steps (C7) are **irreversible** and gated on that QA across all feature areas.

This is precisely why it must be human-in-the-loop, not an autonomous grind.

---

## 2. Scope (verified)

| | OSS (to retire) | Package (target) |
|---|---|---|
| shared infra | `state/entities/shared/` — `createEntityController` (743), `createEntityDraftState` (341), `createPaginatedEntityStore` (562), `createStatefulEntityAtomFamily` (168), utils — **~1,553 LOC** | `@agenta/entities/src/shared/` — `molecule/*`, `paginated/*` (createPaginatedEntityStore 680, createInfiniteTableStore 464), utils |
| testset | `state/entities/testset/` — revisionEntity (567), store (455), controller (650), testsetController (245), paginatedStore (411), mutations (387), revisionSchema (166), dirtyState (222) — **~2,790 LOC** | `@agenta/entities/src/testset/state/` — revisionMolecule (1,110), testsetMolecule (786), store (769), mutations (914), revisionTableState (511), paginatedStore (234) |
| testcase | `state/entities/testcase/` — 15 files incl. testcaseEntity (949), schema (482), columnState (661), paginatedStore (350), controller (370), queries (255), mutations (269), columnPathUtils (169) — **~5,292 LOC** | `@agenta/entities/src/testcase/state/` — molecule (1,008), store (1,005), paginatedStore (349), dataController (253), prefetch (138) |

**Totals:** ~9,573 LOC OSS to delete · ~273 importer files to re-point · ~331 files touched ·
**est. 14–18 engineering days.**

**Coverage verdict:** the package molecules are a **genuine superset** capability-wise; the gap is
mostly *organizational* (where things live) + the **data-format** and **API-shape** divergences below.

---

## 3. Gap details + divergences

### 3.1 shared infra — **coverage ~100%, risk LOW**
Every OSS export has a package equivalent (`createEntityController`, `createEntityDraftState`,
`createPaginatedEntityStore`, `EntityController*`/`DrillIn*`/`PathItem` types). Package uses a
`createMolecule` + `withController` composition layer over the same primitives; the OSS controller-only
API maps onto `molecule.controller(id)`. No OSS-only symbols. Package additionally has entity-relations
(OSS lacks) — additive, no conflict.

### 3.2 testset — **coverage ~95%, risk LOW–MEDIUM**
`revision`/`testset` controllers → `revisionMolecule`/`testsetMolecule` (molecule exposes
`atoms/selectors/actions/get/set`; controller-style use still works). Column dirty-state →
`revisionMolecule.tableReducers`. OSS-only **thin helpers to port** (~50 LOC): `getVersionDisplay`,
`isV0Revision`, `normalizeRevision` (package likely has normalization already).

### 3.3 testcase — **coverage ~80%, risk HIGH**
The hard one. Divergences:
- **Data format:** `FlattenedTestcase` (flat) vs package nested `data` — see §1. **Decision required.**
- **Column ops:** OSS has *testcase-level* column atoms (`currentColumnsAtom`, `addColumnAtom`,
`renameColumnAtom`, `deleteColumnAtom`, `expandedColumnsAtom`); package moved these to *revision
level* (`revisionMolecule.tableReducers.*`, `revisionMolecule.atoms.effectiveColumns`). Re-points
must thread `revisionId` and may change read-only-vs-driven semantics.
- **OSS-only utils to port/refactor** (~300 LOC): `flattenTestcase`, `extractTestcaseUserData`,
`deriveTestcaseColumnKeys` (package has `extractColumnsFromData`), `columnPathUtils` (package has
`DataPath`/`getValueAtPath` in `@agenta/shared/utils`).
- Package adds `testcaseDataController` + `prefetchTestcasesByIds` (additive).

**The data-format decision (make first):**
- **Option A** — keep `FlattenedTestcase`; add flat↔nested converters at the boundary. Lower importer
churn, but perpetuates two shapes + conversion cost.
- **Option B (recommended)** — refactor importers to the package nested shape; delete the flat shape.
Cleaner long-term; higher one-time churn; **this is the §1 silent-regression surface** — gate on QA.

---

## 4. Leaves-first execution plan (C1–C7)

Internal cascade (leaf → root): `shared` → `testcase` → `testset` → importers. Each step: reconcile/port,
re-point, build+lint, **and behavioral-QA the touched features**; commit; only then proceed.

- **C1 — shared controller infra.** Reconcile OSS consumers onto `@agenta/entities/shared` molecule
primitives. Mostly direct re-point (+ thin adapters if an API differs). ~1 day, LOW risk. No OSS delete yet.
- **C2 — testset schema + state.** Re-point onto `revisionMolecule`/`testsetMolecule`; port the 3 thin
version helpers. ~1 day, LOW–MED. Blocks on C1.
- **C3 — testcase schema + state + DATA FORMAT.** The crux. Execute the §3.3 data-format decision; port
`flatten`/`extract` utils or refactor importers; verify query/entity/draft/cell families map to
`testcaseMolecule`. ~2–3 days, **HIGH**. Blocks on C1 (+ C2 schema). Prototype the EvalRunDetails ETL
re-point first as the canary.
- **C4 — testcase column ops → revision level.** Re-point `currentColumnsAtom`/`add|rename|deleteColumnAtom`
→ `revisionMolecule.tableReducers`/`effectiveColumns(revisionId)`. ~1 day, MED. Blocks on C2,C3.
- **C5 — mutations.** Reconcile save/clear/batch onto molecule actions + package mutation APIs. ~0.5 day, LOW.
- **C6 — re-point all ~273 importers**, phased by feature area (testsets ~60 → testcases ~60 → shared
~60 → cross-feature ~90). Run feature QA after EACH phase. ~5–7 days, MED (large surface).
- **C7 — delete OSS `state/entities/{testcase,testset,shared}`** (~9.5k LOC). Irreversible; gated on
full-app QA passing. ~0.5 day.
- **Integration testing** across testsets UI, playground, eval details, annotations. ~2–3 days.

---

## 5. Risks (and why QA — not tsc — is the gate)

1. **Flat vs nested testcase data (HIGH, tsc-invisible)** — §1. Mitigate: Option B + ETL canary +
per-feature runtime QA + before/after screenshots; consider a temporary parallel-render check.
2. **Column ops moved to revision level (MED)** — audit every column-atom importer; thread `revisionId`;
QA column add/rename/delete in testsets UI.
3. **Molecule vs controller API (MED)** — both valid; controller-style use maps onto the molecule;
spot-check direct-controller consumers.
4. **273-file re-point surface (MED)** — phase by feature; full test run + manual QA per phase; rely on
strict TS to catch *structural* misses (but NOT the data-format ones).
5. **Missing testcase utils (LOW–MED)** — port `flatten`/`extract` or eliminate via Option B.

---

## 6. Relationship to the evaluations migration (WP-4)

- WP-4e (eval atom move) is **blocked** on this consolidation **only if** we choose to move the eval
atoms onto the package molecules directly. The **injection-seam** alternative (§0 option 1) unblocks
WP-4e *without* this consolidation and is the recommended path for completing the eval migration in
isolation.
- If this consolidation lands first, WP-4e becomes a clean re-point (eval atoms use the package
molecules like every other consumer).
- Either way, this is **not** part of WP-4's scope and should not be grafted into it; it gets its own
branch, review, and QA matrix.
Loading
Loading