[FE / chore] Move evals to packages by ardaerzin · Pull Request #4753 · Agenta-AI/agenta

ardaerzin · 2026-06-19T13:40:10Z

Summary

Testing

QA follow-up

evaluations & annotation queues testing

Checklist

Relevant tests pass locally
Relevant linting and formatting pass locally
I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

New state+logic package for evaluations, mirroring the @agenta/annotation split (headless here; React UI will follow in @agenta/evaluations-ui). Run/queue/result/ metric data molecules stay in @agenta/entities; this package owns run-config construction and the run-creation controller. Registered as an @agenta/oss dep. - core/buildRunConfig: PURE, headless port of OSS createEvaluationRunConfig. The four playground/workflow atoms it used to read via getDefaultStore are now passed in as a flat plain-data DTO (schemaContextByRevisionId), so the package imports zero jotai/playground/getDefaultStore. Unit tested without a store. - controllers/createEvaluationRun: orchestrates createRuns -> createScenarios -> setResults via Fern, with deleteRuns rollback on partial failure (backend cascade-deletes scenarios/results). Injectable client → all branches (success, scenario-fail, results-fail, rollback-fail) unit tested with a fake, no backend. - vendored slugify + extractEvaluatorMetricKeys with TODOs to consolidate onto @agenta/shared and entities extractMetrics in a later slice. 22 unit tests pass; types + lint clean. TODOS.md notes a backend atomic-create endpoint that would remove the FE rollback entirely.

Rewrite the evaluationRun and evaluationQueue API functions from raw axios (@agenta/shared/api) to the Fern-generated @agentaai/api-client via @agenta/sdk, matching the secret/gatewayTool precedent. project_id is injected through Fern's queryParams (projectScopedRequest); the Zod boundary is preserved unchanged — it now narrows Fern's all-optional generated types and remains the independent drift check. The SDK client is imported lazily (dynamic import) rather than statically: @agentaai/api-client is ESM-only (no require export), and a static import would break the tsx --test molecule/ETL suites the moment a molecule is imported. Lazy import keeps those suites green and resolves correctly via the ESM loader at call time. Existing node:test molecule (15) + ETL (9) + leak (5) suites pass.

Wire the OSS creation path to the new package and delete the duplicated config + orchestration: - usePreviewEvaluations.createNewRun now resolves per-revision schema context from the playground/workflow atoms (the app supplies inputs), calls the package's pure buildRunConfig, then the headless createEvaluationRun controller (run -> scenarios -> results with rollback). No bridge — OSS only reads atoms and hands plain data in. - Delete services/evaluationRuns/api/index.ts (createEvaluationRunConfig), the inline createScenarios helper, and the hand-rolled run/scenario/step orchestration. Drops the now-orphaned slugify/uuid/useSWRConfig/SCENARIOS_ENDPOINT usages. - NewEvaluationModalInner reads the controller's clean {runId} return shape. The rewrite also removed 8 pre-existing type errors that lived in the old orchestration (oss tsc: 593 -> 589); the migrated files are type- and lint-clean.

…; log parse failures in prod Two correctness/reliability fixes to the evaluationRun-family Zod schemas and the shared validation helper, de-risking the upcoming run-fetch consolidation (T6): - Add .passthrough() to evaluationRun/data/step/mapping/reference/result/metric schemas. The backend mounts these payloads with extra="allow", and downstream consumers (notably the OSS EvalRunDetails run enrichment: buildRunIndex, evaluator-ref patching) read fields beyond what the schema declares. The default z.object() was silently stripping them — a data-loss bug, and the specific blocker to routing the OSS per-run fetch through the package molecule. Known fields are still strictly validated; this makes the schema a validator, not a field filter. - safeParseWithLogging now logs validation failures in production too, not just dev. A Zod failure is always real signal (backend drift / a bug), never normal control flow, so it should be visible in prod logs instead of silently swallowed. The null return is preserved, so no caller's control flow changes. - Add a schema-contract test (real-response-shaped fixtures) pinning passthrough of unknown top-level/nested/ref fields and that a missing required id still fails. entities: types + lint clean; schema (6) + molecule (15) + ETL (9) + leak (5) + vitest unit (589) suites pass. oss tsc error count unchanged.

…reading app-global state The evaluationRun molecule imported projectIdAtom from @agenta/shared/state and read it from the default store inside its query atoms (with a "projectId not yet available" retry hack) — the package reaching into app-global state, and an assumption that a project is always ambient in a global store. Decouple it: callers pass projectId. - Re-key every run atom family from (runId) to ({projectId, runId}) and the scenario families to ({projectId, runId, scenarioId}), with projectId-aware areEqual. The query atoms take projectId straight from the family key — no store read, no projectIdAtom import, no retry hack (projectId is part of the key, captured at subscription, which also removes the atomWithQuery-cant-react-to-deps workaround). - Public surface threads projectId: selectors.x({projectId, runId}), get.x(projectId, runId, ...), invalidateEvaluationRunCache({projectId, runId}). - Consumers that use the changed surface are the annotation controllers / annotation-ui (already app-state-aware) — updated to pass projectId. The result/ metric molecules already took projectId from callers and are unchanged. OSS does NOT consume this surface (its local evaluationRunQueryAtomFamily is a name collision, not the package export), so no OSS changes. entities + annotation + annotation-ui types + lint clean; molecule (15) / ETL (9) / schema (6) suites pass; oss tsc unchanged at baseline.

… + createEvaluationRun Fulfils the eng-review commitment (D5 / "table store testable with actual API integration"): real-backend integration tests, skipped unless AGENTA_API_URL + AGENTA_AUTH_KEY are set (globalSetup mints an ephemeral account + API key). - @agenta/entities: extend the integration worker to also authenticate the Fern client (sets AGENTA_API_KEY/AGENTA_HOST) — the eval api goes through @agentaai/api-client, not axios, so the existing axios-only auth didn't cover it. New evaluationRun integration test exercises the atoms' data layer against a real backend: queryEvaluationRuns / fetchEvaluationRun / queryEvaluationResults / queryEvaluationMetrics / queryEvaluationQueues return well-formed, Zod-valid empty results on a fresh project, and the decoupled {projectId, runId} molecule atom fetches and resolves an absent run to null. Pins Fern auth + endpoint reachability + the Zod boundary (passthrough) + the projectId wiring against real responses. - @agenta/evaluations: stand up the integration harness (config + ephemeral-account setup, Fern-auth worker) and a createEvaluationRun controller test that covers the DIFFERENT evaluation TYPES this controller produces — a matrix over human-origin, auto-origin, and no-evaluator runs — each create→fetch (asserting the meta.evaluation_kind type marker + annotation-step origin + step shape round-trip)→delete, plus deleteRuns (the rollback cleanup primitive) removing a run. Online evals use a separate endpoint (out of scope). The orchestration branches stay unit-covered by the faked client. Both suites compile and skip cleanly with no backend (6 + 4 tests). New files lint clean.

…rn query (T6) previewRunBatcher reimplemented the package evaluationRun molecule's batch fetch — the same POST /evaluations/runs/query {run:{ids}} via raw axios. Delegate its network/query layer to the shared Fern-backed queryEvaluationRuns from @agenta/entities/evaluationRun, removing the duplicate axios query (and the last raw /runs/query call in the per-run path). The batcher keeps its own in-memory cache + the list→detail priming; only the fetch is shared now. Behavior-preserving: identical query, same snake_case run shape (the eval schemas passthrough unknown fields as of the T2 slice, so nothing the downstream enrichment reads is stripped). queryEvaluationRuns is verified against a live backend by the entities integration suite. oss tsc unchanged at baseline; file lints clean. Remaining T6 (not a dedup — no package equivalent yet): the LIST fetch (fetchPreviewRunsShared) still uses axios because its run.search / run.evaluation_kinds filters aren't modelled in Fern's generated EvaluationRunQuery. Routing it through Fern needs the OpenAPI spec extended (or a documented cast). The deeper consolidation — delete previewRunBatcher entirely and read through the package molecule — is a follow-on (touches the OSS enriched run atom + list-priming + ~6 consumers).

…e molecule (T6) Completes the run-fetch consolidation: the OSS previewRunBatcher (a per-run batched fetch + Map cache + list→detail priming, duplicating the package molecule's batcher) is deleted. Its consumers now use the package's shared batched fetch. - @agenta/entities: expose fetchEvaluationRunBatched({projectId, runId}) — the molecule's existing createBatchFetcher exposed imperatively, so async non-jotai call sites get the same batched POST /evaluations/runs/query without a second batcher. - OSS enriched run atom (EvalRunDetails/atoms/table/run.ts) + EvaluationRunsTablePOC runSummaries: fetch the raw run via fetchEvaluationRunBatched instead of getPreviewRunBatcher. - Drop the previewRunBatcher Map cache + its prime (from the list fetch + usePreviewEvaluations) + its invalidate calls (editEvaluation, PreviewEvalRunHeader, scenarios/api). These were side-cache clears; the real detail/list refetch is triggered separately (queryClient invalidate / refetchRunQueries), and with no Map every fetch is now always-fresh-but-still -batched. Behavior-preserving (a minor cross-query cache is the only thing lost). Concurrent run reads still collapse into one batched query. oss tsc unchanged at baseline (589; the 5 remaining table/run.ts errors are pre-existing — unimported axios, the ensureEvaluatorRevisions return type, snakeToCamelCaseKeys typing). Package molecule (15) / ETL (9) / schema (6) suites pass; entities + changed-file lint clean. The package query is verified against the live backend by the integration suite. NOTE: the OSS enriched-atom path has no automated view tests and wasn't UI-smoke-tested; the change is type-neutral + behavior-preserving by construction, but a manual pass over the evaluations list + run detail is worth doing before merge.

queryStepResults reimplemented POST /evaluations/results/query via raw axios — the same query the package's Fern-backed queryEvaluationResults already does. Delegate to it (behavior-preserving: same request, same snake_case rows via schema passthrough; returns [] when no project, as the package query does). Removes a duplicate axios read. The result MUTATIONS in this file stay on axios for now and are NOT migrated: Fern's generated EvaluationResultCreate under-declares fields the backend accepts (no span_id, references, or data), so routing the annotation write-back through Fern would silently drop span_id and break trace/span linking. Documented inline; unblock by extending the backend OpenAPI spec + regenerating the client. oss tsc unchanged at baseline; lint clean.

The new @agenta/evaluations workspace package wasn't added to oss/next.config.ts, so Next didn't transpile it — the OSS imports of it (buildRunConfig / createEvaluationRun) failed to resolve and the app wouldn't load (404 on the chunk). Add it to both transpilePackages and experimental.optimizePackageImports, alongside the other @agenta/* workspace packages.

EE renders OSS pages that import @agenta/evaluations, but ee/package.json didn't declare the workspace dep, so pnpm never linked it into ee/node_modules → module resolution failed and the EE app 404'd on load. Add the dependency (and the optimizePackageImports entry); transpilePackages is inherited via `{...ossConfig}` so the earlier oss/next.config fix already covers EE's transpile step.

…apping kinds The evaluations table rendered blank "Created by" and metric cells after the axios->Fern migration. Root cause: `evaluationRunMappingKindSchema` was `z.enum(["input","ground_truth", "application","evaluator","annotation"])`, but the backend emits `data.mappings[].column.kind` values of "testset"/"invocation"/"annotation". Because that field sits deep inside the optional `data` tree, a single unrecognized enum value failed the entire run parse, which failed the whole `runs: z.array(evaluationRunSchema)` envelope -> `safeParseWithLogging` returned null -> `queryEvaluationRuns` returned no runs -> the per-run summary atom resolved to null, blanking `created_by_id` and the step-reference-derived metric columns. The old axios list path did no Zod validation, so it tolerated these values. Fix: validate the three string-union "kind" fields (mapping kind, step type, step origin) as permissive `z.string()` instead of `z.enum`, keeping the known values as documented unions for autocomplete. Backend payloads use extra="allow" and the taxonomy drifts; a strict enum on a deeply-nested optional field is a catastrophic failure mode. Adds a regression test that parses a real (UUID- and key-scrubbed) /evaluations/runs/query payload.

… payloads The integration test built run configs with `data.mappings: []` and never went through the read-back/parse path the run table uses, so it could not catch the mapping-kind enum regression that blanked the table — it passed against both the broken and fixed schema. Two fixes: - Populate mappings with the real `column.kind` values the package's buildRunConfig emits ("testset"/"invocation"/"evaluator"), so the created run actually exercises schema kind validation on read-back. - Round-trip each created run through queryEvaluationRuns (the batched path the table uses) and assert the run survives the parse and its mapping kinds are preserved. Verified: this now FAILS against the old `z.enum` mapping-kind schema and passes against the fixed `z.string()` one. Note these tests are gated behind AGENTA_API_URL + AGENTA_AUTH_KEY and skip (showing as green) when unset — they must be run with a backend.

Parses a real project's EXISTING runs through the production evaluationRunSchema, per-run, so schema drift against production-shaped payloads (the class of bug that blanked the run table) is caught with the offending run id + field path. Read-only (query only), safe against a real project with a read-scoped key. Gated on AGENTA_API_URL + AGENTA_REAL_API_KEY + AGENTA_REAL_PROJECT_ID; skips when unset.

The entities eval integration suite only asserted empty-envelope/absent cases against a fresh ephemeral project, so it could never exercise run-data parsing or the molecule's derived selectors — exactly why the mapping-kind regression slipped through. Add: - A populated-run block: create a representative run via the raw Fern client (entities cannot depend on @agenta/evaluations) with testset/invocation/evaluator mappings, then assert queryEvaluationRuns + fetchEvaluationRun parse it and evaluationRunMolecule selectors (data/steps/annotationSteps/mappings/evaluatorIds) derive real values. - An evaluationQueue CRUD round-trip: create a run + queue, verify queryEvaluationQueues / fetchEvaluationQueue parse the populated queue and the molecule entity atoms resolve its name/run id. Cleans up runs + queue in afterAll. Verified: the populated-run block FAILS against the old z.enum mapping-kind schema (3 failures) and passes against the fix; 11/11 green against the live local stack.

ensureEvaluatorRevisions called `axios.patch('/evaluations/runs/{id}')` but axios was never imported in that file, so the call threw ReferenceError, was swallowed by the surrounding try/catch, and the evaluator-revision write-back silently never persisted (pre-existing). Add a Fern-backed `editEvaluationRun` to @agenta/entities/evaluationRun (PATCH /evaluations/runs/{run_id} via client.editRun, Zod-validated at the boundary) and route the OSS enrichment through it. EvaluationRunEdit accepts id + data.steps, so this is not blocked by the Fern under-declaration affecting result mutations. Adds an integration test that patches a real run's annotation-step references and re-fetches to assert the change persists. oss tsc 589 -> 588 (removes the latent `Cannot find name 'axios'`). entities: 591 unit + 12 eval integration green against the live stack.

…ckend contract Investigation showed the result-mutation "blocker" was a false premise: evaluation_results has no span_id/references/data columns (only trace_id et al.), so those FE-sent fields were silently dropped by the backend, not "accepted". The result↔trace link is trace_id. - Add Fern-backed `setEvaluationResults` to @agenta/entities/evaluationRun (POST /evaluations/results/, the upsert-on-natural-key setter) carrying only real columns. - Route OSS `upsertStepResultWithAnnotation` through it, dropping the vestigial span_id (behavior-preserving — backend never persisted it). Removes the last axios usage from services/evaluations/results/api.ts. - Delete dead `createStepResults` + `updateStepResults` (zero callers). - Integration test: create run + scenario, upsert a result, read it back, assert trace_id persists. 13/13 eval integration green against the live stack; 591 unit; oss tsc 588.

… contract fetchPreviewRunsShared was the last axios eval read. Add a Fern-backed `queryEvaluationRunsList` to @agenta/entities (POST /evaluations/runs/query with the filters query_runs actually supports — references/flags/statuses + windowing) and route the OSS list fetch through it, keeping the OSS request-dedup cache + camelCasing wrapper. Drops `search` and `evaluation_kinds` from the request: the backend has no such filters (silently dropped), and free-text/kind filtering is client-side per the eval-filtering RFC — so this is behavior-preserving. windowing is read off the raw envelope (the Zod envelope doesn't model it) and returned for the paginating consumer (fetchAutoEvaluationRuns). Integration test: create runs, list them through the parse, assert presence + windowing cursor + limit. 15/15 eval integration green; 591 unit; oss tsc 588.

Add Fern-backed scenario primitives to @agenta/entities/evaluationRun: a minimal evaluationScenario schema (passthrough) + `queryEvaluationScenarios` (POST /evaluations/scenarios/query) and `setEvaluationScenarioStatuses` (PATCH /evaluations/scenarios/, id+status only). Route OSS services/evaluations/scenarios/api.ts through them; the run-status rollup (checkAndUpdateRunStatus) now reuses queryEvaluationRuns + editEvaluationRun. Removes the last axios from that file (and the bespoke SSRF id-guard — Fern encodes path params). Integration tests: query a run's scenarios, edit a scenario status, re-query and assert it persists. 17/17 eval integration green against the live stack; 591 unit; oss tsc 588.

Route services/evaluations/invocations/api.ts through the Fern package functions: upsertStepResultWithInvocation -> setEvaluationResults (drops the vestigial span_id / references / outputs that have no columns; keeps trace_id + error, both real columns); updateScenarioStatus -> setEvaluationScenarioStatuses (deduped onto the same primitive as services/evaluations/scenarios). Extends EvaluationResultSetInput with the real `error` column. Removes the last axios from the file. Behavior covered by the existing setEvaluationResults + setEvaluationScenarioStatuses integration tests. oss tsc 588; 591 unit green.

The EvaluationRunsTablePOC delete action used raw axios.delete('/evaluations/runs/'). Add Fern-backed `deleteEvaluationRuns` to @agenta/entities (DELETE /evaluations/runs/; backend cascade-deletes scenarios/results/metrics) and route deletePreviewRuns through it. Integration test: create a run, delete via the package fn, assert fetch returns null. 18/18 eval integration green; 591 unit; oss tsc 588.

…n delete - Add Fern `queryEvaluationMetricsBatch` to @agenta/entities (POST /evaluations/metrics/query with the backend projection flags run_ids / scenario_ids / timestamps) and route the EvalRunDetails runMetrics batcher through it (run-level + temporal). Behavior-preserving: identical payload, and the metric schema is passthrough (only id/run_id required, both real columns) so no field stripping. - Route DeleteEvaluationModalContent's run delete onto deleteEvaluationRuns (dedupes its private axios copy). Both files now axios-free. Metrics are worker-computed (can't be made in the ephemeral harness), so verified the populated path against the real project via the read-only smoke test: every existing metric parses through evaluationMetricSchema with the exact batch payload. entities 591 unit + 18 eval integration; evaluations 22 unit; oss tsc 588.

Locks the structure for relocating the evaluation-run engine into a layered package architecture (entities ← evaluations ← annotations, + -ui mirrors), with annotation queue and human eval as presets over one evaluation engine. Key decisions captured: extract the generic engine FROM @agenta/annotation (source of truth) into @agenta/evaluations, keep annotation green throughout, prove parity vs the OSS EvalRunDetails/EvaluationRunsTablePOC baseline before deleting OSS dups, move (not rewrite) the single configurable run table from AnnotationQueuesView, keep etl in entities. Includes §0 guardrails (anti-stray), the unified entity model, the controller generic-vs-annotation decomposition map, sequenced Work Packages each keeping annotation green, the regression methodology, and definition of done.

…ation plan Adds an enforceable "clean up after yourself" requirement so agents can't leave eval services/utils/data-layer atoms behind in OSS: - §0 cardinal rule 7: each WP deletes its OSS counterpart in the same WP; migration is not done until the cleanup ledger is checked off. - §7 cleanup ledger: explicit list of every OSS eval service/lib/atom path that must be deleted, mapped to the WP that deletes it; legacy bridge + onlineEvaluations tracked as terminal WPs (never silently left). - §7.2 verification gate: concrete grep/find commands that must return empty at final DoD. - §9 Definition of done now requires the zero-residue gate to pass.

… package Closes a testing gap in the migration plan: WP-2 had only unit tests and WP-3 had none. Now every WP that moves state/logic must ship a real-API integration test that drives the SHIPPED atoms/molecules/controllers — not a test-local replica. - §5: testing is part of every WP's DoD; adds an "Integration test (real API, real atoms)" line to WP-0..4, each naming the exact shipped surface to drive. - §8: hard rule — import and exercise the real surface (if you delete the package code the test must fail to compile), run against the real backend, seed via raw client but assert through the package; bans the hand-built-payload anti-pattern that caused the mapping-kind bug; adds a per-WP coverage table; clarifies "tests green" means ran-with-backend not skipped.

Empty React UI package mirroring @agenta/annotation-ui, registered in OSS+EE (package.json deps, next.config transpilePackages + optimizePackageImports). Will receive the run list table, run detail view, scenario table, and metric cells in later work packages (see docs/designs/evaluations-packages-migration-plan.md). No behavior change.

…y (WP-0) Moves the scenario schema + queryEvaluationScenarios/setEvaluationScenarioStatuses out of evaluationRun into a standalone @agenta/entities/evaluationScenario module (core/api/state), adds a reactive {projectId, runId}-keyed molecule (list/ids/statuses selectors), and a subpath export. evaluationRun no longer owns scenario code; OSS consumers (services/evaluations/{scenarios,invocations}) re-point to the new module. Integration test (real API, real atoms): drives the shipped evaluationScenario api + molecule selectors against a real run's scenarios (the WP-0 DoD). entities 591 unit + 19 eval integration (run 16 + scenario 3) green against the live stack; oss tsc 588.

… update) Reverses the earlier "etl stays in entities" decision. The ETL filtering is a feature where OSS EvalRunDetails is ahead of annotation (annotation has no filtering — verified, it imports none of the etl filtering), so: - entities keeps only entity definitions; the eval-run ETL (hydration, mapping/column resolution, client-side filtering) moves to @agenta/evaluations (+ filter bar / column headers / resolved cells to @agenta/evaluations-ui). - §4 source-of-truth exception: the ETL is extracted from OSS EvalRunDetails/etl, NOT from annotation; annotation gains filtering by depending on evaluations. - New WP-3.5 (move the ETL, sourced from OSS) with its own real-API/real-atom integration test (hydrate real scenarios + apply a real rowPredicateFilter). - Cleanup ledger + §7.2 gate now require OSS EvalRunDetails/etl gone and the entities evaluationRun/etl subpath removed; §10 records the reversal.

…ario source Verified from code (no assumptions): the annotation session engine is founded on simpleQueueMolecule, and the two consumers source the scenario LIST from different endpoints — annotation via POST /simple/queues/{id}/scenarios/query (queue-scoped, optional user_id annotator filter → may be a subset) and EvalRunDetails via POST /evaluations/scenarios/query by run_id (run-scoped). Both return EvaluationScenario rows; scenario DATA is derived by {projectId, runId, scenarioId} from the entities molecules in both. Therefore the generic evaluations session engine must NOT hardcode a scenario molecule — it takes an INJECTED source {projectId, runId, scenarios[], scenariosQuery} and owns navigation/progress/current/focus/view. Annotation keeps feeding the QUEUE source (user-scoped — do not swap to run-scoped); only the engine code is shared. §3.1 decomposition + WP-1 Move updated; the truly-shared core is the scenario-data selectors keyed by {projectId,runId,scenarioId}.

Extract the scenario navigation/progress/focus/view engine from @agenta/annotation's annotationSessionController into @agenta/evaluations/state (the navigation logic is moved verbatim) with two genericizing changes: - the scenario LIST + query state are INJECTED via actions.setScenarios (no scenario molecule imported), so annotation can inject its queue-scoped source and the eval-run view a run-scoped one; - run/project context comes from openSession({projectId, runId}), decoupled from any store. This is additive — @agenta/annotation is untouched (re-pointing it is the next WP-1 slice, which needs annotation-route QA). Integration test drives the SHIPPED engine atoms over a real run's scenarios (navigate next/prev, markCompleted → progress/status, hideCompletedInFocus → navigable filtering). 22 unit + 3 session-engine integration green vs the live stack.

…s blank in Overview) The WP-4h-5 relocation pinned @agenta/evaluations-ui to recharts ^2.13.0 (resolved 2.15.4), but the eval chart components are recharts-3 code (OSS/EE/main use ^3.1.0 → 3.8.1). Run under recharts 2.x, the Overview spider chart + per-evaluator distribution charts rendered nothing while numeric stats showed — the chart APIs differ across the major. It typechecked green under 2.x because the used API subset overlaps. Bump to ^3.1.0 (resolves the shared 3.8.1, same as main) and fix the recharts-3 Tooltip/formatter callback signatures the stricter v3 types surfaced. oss tsc 363 (unchanged).

WP-4h moved the eval views into @agenta/evaluations-ui, but the Tailwind content globs (oss/tailwind.config.ts, reused by ee via createConfig) were never updated to scan it. So Tailwind didn't generate the package's utility classes — only ones that also appear in already-scanned packages survived. Package-unique classes were dropped: the run-overview spider's lg:flex-row + lg:w-7/12|w-5/12 (so it stacked under the table instead of beside it) and its h-[480px]/h-full container (so the chart collapsed to 0 height and recharts rendered nothing — spider + per-evaluator distribution charts blank while text showed). Add agenta-evaluations + agenta-evaluations-ui to the content array.

…inputs The scenario focus drawer fed the whole testcase ENTITY ({id, created_at, data:{...}, testset_id, ...}) to TestcaseDataEditor, but the editor addresses values by bare column key (valueKey, e.g. 'country') while the user columns live nested under .data. So every input rendered empty when the testcase-entity branch was taken (row click resolves sourceTestcaseId immediately); reload appeared to work because it rendered via the flat embedded-steps fallback first. Unwrap to the inner .data record so the testcase-entity branch matches the editor's bare keys, consistent with the embedded-steps fallback (also flat). Diagnostic logging removed.

@deprecated

…fy casing/run-kind, cut debug log - delete unused @deprecated facades getEvaluationKindWithFallback and CACHE_AWARE_HYDRATE_FETCHERS (+ their barrel re-exports); zero consumers - collapse the duplicate snakeToCamelCaseKeys: delete the usePreviewEvaluations copy, re-point its sole importer at the canonical evalRun/utils/casing - derive runsTable EvaluationRunKind from core (CoreEvaluationRunKind | "all") instead of restating the literal union - remove the unconditional [runInvocationAction] Starting invocation debug log

…nto one factory evaluationResultMolecule and evaluationMetricMolecule were ~95% identical cache machinery (byScenario read, cache-aware prefetchByScenarioIds, invalidate, evictByRunId, evictByScenarioIds, cacheKey). Extract the shared logic into createScenarioCacheMolecule<T, K>; the two molecules now just bind their element type, fetcher, cache-key prefix, and outcome list-key. Metrics opts into skipItemsWithoutScenarioId for run-level aggregates (null scenario_id). Public surface unchanged: same exported molecules, the Prefetch{Results,Metrics} {Args,Outcome} types, the results/metrics outcome fields, and _internal.cacheKey all preserved. Entities unit suite green (658 tests).

- annotationSessionController: collectColumnPathValues and collectDataColumnKeys were the same depth-first leaf traversal differing only in accumulator; both now delegate to a single walkLeafColumns(data, visit) visitor. - testsetSync: buildAddToTestsetOperations and remapTargetRowsToBaseRevision both built baseRowIds + baseRowIdByDedup from baseRows; extract a shared indexBaseRows(baseRows, {guardAmbiguous}) parameterized to preserve each caller's exact behavior. guardAmbiguous=true keeps the add-to-testset ambiguous-dedup guard; =false keeps the sync path's legacy last-writer-wins (the missing guard there is a documented latent gap, left unchanged given the AGE-3761 write-back sensitivity). Behavior-preserving; annotation unit suite green (90 tests).

…esults fetcher scenarioStepsBatcherFamily re-implemented POST /evaluations/results/query with raw axios + manual envelope parsing (results ?? steps) — a duplicate of the canonical typed/zod queryEvaluationResults the entities layer already owns. Delegate the network call to queryEvaluationResults; the atomWithQuery shell keeps caching + live 5s polling and the ScenarioStepsBatchResult/camelCase output shape is unchanged, so consumers and polling behavior are preserved. Note: the TanStack caches of the live-polling path and the cache-first evaluationResultMolecule remain separate by design — the run-details poll needs a fresh fetch each tick, which the cache-first molecule prefetch would skip. Full single-cache unification would need a molecule cache-bypass mode + QA; out of scope here. evaluations unit suite green (133 tests).

evaluationRunPaginatedStore (state/runList) had ZERO production consumers — only its barrel re-export and one integration test referenced it. The live run-list is the feature-rich runsTable engine (fetchAutoEvaluationRuns + previewRunSummary, with subject-filter / fillToLimit / references); the generic EvaluationListView takes its store as a prop and its sole renderer (AnnotationQueuesView) passes simpleQueuePaginatedStore, not this one. Its EvaluationRunTableRow type was a separate same-named shape; the ~35 live consumers use the runsTable/types.ts EvaluationRunTableRow via @agenta/evaluations/state/runsTable, unaffected. Removed: state/runList/ (store + filter atoms), its top-barrel re-export, and runListStore.integration.test.ts. ~190 LOC. evaluations suite green (133).

…g to -ui The headless @agenta/evaluations package carried 16 injection seams that only the relocated VIEWS (run-list + run-details, in @agenta/evaluations-ui) ever read — URL/route/app-state, saved-queries, current-workflow, metric-blueprint / resolved-label / evaluator-reference families, workspace-member-by-id, navigation-request, and the onboarding-widget seams. Pure view/routing concerns do not belong in the framework-agnostic state package. Moved those 16 seams + their types into a new @agenta/evaluations-ui/src/host/runViewInjection.ts with its own registerRunViewInjections write-atom. The 6 seams the headless runtime atoms actually read (workspaceMembers, testcaseQueryFamily, referenceResolver, runInvalidate, clearMetricSelection, annotationTransform) plus the shared ReferenceQueryResult and Query*Payload types stay in evalRunInjection.ts. OSS hosts now split registration: register(...) for headless seams + registerView(...) for view seams. 17 -ui consumers re-pointed to the local module. evaluations + evaluations-ui green (tsc/lint/133 tests); oss tsc at its pre-existing 363-error baseline with zero new host/seam errors. Manual QA: run-list + run-details views (onboarding widget, navigation, URL focus drawer, metric columns, online-eval start/stop).

… atoms files Verbatim extraction of pure helpers into sibling files — no logic changes. - metrics.ts (973 -> 421): pure metric compute/lookup block + the 3 metric types moved to metricsCompute.ts (560). metrics.ts keeps the caches, status helpers, resolveProjectId/resolveEffectiveRunId atom-getters, and all atoms; re-exports the public ScenarioMetricData / RunLevelMetricData types so the API is unchanged. - scenarioColumnValues.ts (1231 -> 968): pure step/value helpers (getStepKind, pickStep, extractStepsByKind, extractStepError, findStepWithError, resolveAnnotationValue, …) moved to scenarioColumnValuesHelpers.ts (273). The 727-line scenarioColumnValueBaseAtomFamily and all public exports stay. Public API preserved; evaluations tsc+lint+133 unit tests green. Deferred: runMetrics.ts / metricProcessor.ts splits (owned by the spun-off metricProcessor-ReferenceError task — would collide). Note: the moved metrics compute block carries a pre-existing latent `declare const applyAggregatesToRaw` ReferenceError (sibling of the runMetrics one), preserved verbatim — needs its own fix.

…ToRaw ReferenceError) buildRunLevelMetricData referenced an undefined applyAggregatesToRaw (a declare-const masking a pre-existing, unconditional ReferenceError — migration-plan §11.3 bug #1). Its only transitive caller, runLevelMetricQueryAtomFamily, was unused (not exported from any barrel, referenced nowhere) and superseded by runMetrics.ts's own run-level engine (flattenRunLevelMetricData). Rather than implement a never-called function, remove the dead path: - metrics.ts: delete runLevelMetricQueryAtomFamily + its buildRunLevelMetricData / RunLevelMetricData imports and re-export. - metricsCompute.ts: delete buildRunLevelMetricData, applyAggregatesToRaw, and the RunLevelMetricData type. KEPT (live, used by buildGroupedMetrics → scenario metrics): computeAggregatedMetrics, extractStatTotal, asNumber. Zero runtime change (dead code); evaluations tsc+lint+133 unit tests green.

…appers Migrate 4 of the annotationFormController raw-axios /evaluations/* calls onto the typed, zod-validated entities wrappers (Fern under the hood), per web/CLAUDE.md: - PATCH /evaluations/scenarios/ -> setEvaluationScenarioStatuses - POST /evaluations/scenarios/query -> queryEvaluationScenarios - POST /evaluations/runs/query -> queryEvaluationRuns - PATCH /evaluations/runs/{id} -> editEvaluationRun Removed the now-orphaned getAgentaApiUrl()/apiUrl local in checkAndUpdateRunStatus. Left on raw axios deliberately (documented inline): - POST /evaluations/results/ — also sends span_id, which the wrapper's typed input omits (no backend column); migrating would drop span_id + cascade a param removal through the submit-entry flow. - POST /evaluations/metrics/query + /evaluations/metrics/ — duplicate the (also-axios) upsertScenarioMetricData service; no Fern metrics-set wrapper exists. Their own consolidation. - POST /testsets/revisions/query (annotationSessionController) — intentionally reads raw, un-normalized rows to preserve testcase_dedup_id (AGE-3761); a normalizing wrapper would reintroduce the dedup duplication bug. annotation tsc+lint+90 unit tests green.

…-to-packages main now contains the merged fe-feat/add-evaluators-to-existing-eval base + eval fixes that landed since this branch diverged. Integrated via merge (not rebase) to resolve the relocation conflict set once. Conflict resolutions (main's eval fixes ported onto the relocated package files): - OverviewView/utils/evaluatorMetrics.ts: took main's id-OR-slug evaluator 'definition' match ('evaluator name instead of default' fix); widened the local EvaluatorDefinitionLike with name?. - evalRun/atoms/table/columns.ts: kept package eslint header + canonicalizeMetricKey, added main's extractMetrics import (schemaless-evaluator type-from-step-schema fix). - evalRun/atoms/mutations/editEvaluation.ts: full-ported main's reliably-refresh improvements (key.includes(runId) surface match, authoritative run-status read, settle double-invalidation) and relocated previewRunBatcher (getPreviewRunBatcher / invalidatePreviewRunCache) into @agenta/evaluations; kept the injected-seam clearMetricSelectionCache to avoid a runsTable<->evalRun cycle. - agenta-ui/package.json: union (immer ^10.1.3 + main's jotai ^2.16.1); lockfile regenerated via pnpm install. Silent type-breaks from main's Fern api-client regen, fixed: - createEvaluationRun.ts: EvaluationRunData -> EvaluationRunCreate['data']. Rename detection paired all moved eval files; no old-OSS eval dirs resurrected. Eval packages green (tsc+lint; entities 663 / evaluations 133 / annotation 90 tests).

…equest types The eval wrappers passed request bodies through opaque `as never` casts. Replace each with a named cast onto the Fern-generated request type (via `as unknown as AgentaApi.X`), keeping the wrappers' intentionally-loose inputs and the Zod response boundary unchanged (per web/CLAUDE.md: Fern under-declares extra="allow", so the local Zod schema stays the drift check): - editRun -> AgentaApi.EvaluationRunEdit - queryRuns -> AgentaApi.EvaluationRunQueryRequest (both call sites) - setResults -> AgentaApi.EvaluationResultsSetRequest["results"] - queryMetrics -> AgentaApi.EvaluationMetricsQueryRequest - editScenarios -> AgentaApi.EvaluationScenarioEdit[] Benefit: names the real request type (readability/intent) and gives a compile-time drift signal if Fern renames/removes it — useful given the eval request surface is actively changing. No response/entity types touched (those stay Zod by design). entities tsc+lint+663 unit tests green.

…o-packages

…eferenceError) runMetrics.ts run-metric-stats queryFn referenced metricProcessor at the run-level-gap branch, but no such binding exists in that scope — the real processor is local to the inner processMetrics helper (which already flushed). A declare-const masked it at type-check; at runtime the branch threw a ReferenceError whenever a run-level gap existed (no run-level entry + scenario-less fetched metrics), failing the whole run-metrics query. Even resolved, it would push a flag onto a throwaway processor never flushed there (no-op). The legitimate gap-marking already happens inside processMetrics on the flushed processor. Removed the misplaced branch + the declare-const + the unused MetricProcessor import. Restores the query from throwing; preserves real behavior. evaluations tsc+lint+133 tests green.

… api client Dead public surface (all verified zero external consumers, tsc/lint/663+90 tests green): - evaluationRunMolecule: drop the 3 step-reference atomFamilies left behind when that logic moved to @agenta/evaluations (stepReferencesByEvaluatorId, stepKeysByEvaluatorSlug, scenarioInvocationStepKey — def+selector+get each) + the orphaned StepEvaluatorRefs interface; de-export invalidateEvaluationRunCache (kept as internal cache.invalidateDetail) + drop its barrel re-exports. - evaluationScenarioMolecule: drop the unused selector + imperative get.* block (only list/ids/statuses + atoms.query are consumed); kept the query family. - annotation: drop dead getOutputsSchema/getMetricFieldsFromEvaluator/ getMetricsFromAnnotation re-exports (real consumers import from @agenta/evaluations; re-pointed the one in-package test); drop the duplicate syncToTestset alias. Dedup: evaluationQueue/api/client.ts was byte-identical to evaluationRun's — re-point the sole importer at the run client and delete the dup. ~180 LOC removed. Note: canSyncToTestset/canSyncToTestsetAtom also look orphaned — left pending UI confirmation.

…on god-file annotationSessionController.ts was 2526 LOC mixing session/queue/scenario state with ~1100 LOC of add-to-testset + sync-to-testset export orchestration. Move the export machinery verbatim into a new sibling controllers/addToTestset.ts (modal/job atoms, export-prep helpers, column-remap family, prepare*ExportRows, addScenariosToTestsetAtom, sync preview + syncToTestsetsAtom). Pure relocation, no logic change. Session controller now 1447 LOC, focused on session state. Shared session atoms it still owns are exported and imported into addToTestset.ts; the moved atoms/actions are imported back so the public annotationSessionController object + barrels are byte-identical. Benign ES-module cycle (refs only inside getters/setters). annotation tsc+lint+90 tests green.

…narioMetricData annotationFormController.upsertAnnotationMetrics hand-rolled the same query-existing -> merge -> upsert flow that @agenta/evaluations services/metrics.ts upsertScenarioMetricData already ships (and which the eval run-details annotate flow uses). Keep the annotation-specific value shaping (buildMetricDataFromValue -> attributes.ag.data.outputs.* under the step key) and delegate persistence. Added an optional projectId param to upsertScenarioMetricData so annotation keeps passing its explicit project id (existing callers fall back to the store read, unchanged). ~55 LOC of duplicated query/merge/POST removed. Behavior delta: existing metrics are now PATCHed by id (vs POST upsert) — same end state, slightly more correct. QA: annotation submit (metric write-back) smoke test. evaluations + evaluations-ui + annotation tsc/lint green; 90 annotation tests pass.

…aluationRunKind evalRun/state/evalType.ts hand-declared PreviewEvaluationType = auto|human|online|null, a near-duplicate of core's EvaluationRunKind (auto|human|online|custom). The detection logic was already shared (derivedEvalTypeAtomFamily delegates to deriveEvaluationKind); only the type literal was duplicated. Redefine it as Exclude<EvaluationRunKind, "custom"> | null so the union has a single source of truth in core — identical narrow set (the run-details preview never surfaces the custom/SDK kind), zero behavior/type change. evaluations + evaluations-ui tsc/lint green; 133 tests. Note: a separate, unrelated PreviewEvaluationType (human|online|automatic| single_model_test) lives in hooks/usePreviewEvaluations — different domain (legacy API filter), left untouched (same-name footgun worth a future rename).

…fetcher scenarioData/metrics.ts queried per-scenario metrics with raw axios, bypassing the entities queryEvaluationMetrics (typed + zod). A single scenario belongs to exactly one run, so adding the fetcher's run_ids constraint is a redundant, behavior-equivalent narrowing — swap to queryEvaluationMetrics, dropping the raw axios path (closes the spun-off scenarioData-metrics chip). Scope note: the OTHER metric raw-axios paths are intentionally left: - evalRun/atoms/metrics.ts batcher deliberately omits run_ids for scenario-scoped (cross-run comparison) queries to avoid over-filtering — queryEvaluationMetrics forces run_ids, so routing it there would regress. - the /evaluations/metrics/refresh calls have no entities wrapper. evaluations tsc+lint+133 tests green. QA: scenario metric display in run-details.

…eQueueStatus Two unrelated types shared the name EvaluationStatus across subpaths: the canonical run/scenario enum in evaluationRun/core/status.ts (EVALUATION_* + failed/incomplete, used across OSS) and a different 7-value queue status (pending/queued/running/...) in simpleQueue/core/schema.ts whose comment falsely claimed it was shared with EvaluationRun. Same name, different shapes — a real footgun. Rename the simpleQueue type to SimpleQueueStatus (kept the evaluationStatusSchema Zod value name) and update its re-exports (simpleQueue + evaluationQueue barrels) and the 3 annotation-ui consumers. The run enum and its OSS consumers + Fern's generated AgentaApi.EvaluationStatus are untouched. entities (663 tests) + annotation-ui tsc/lint green.

…+ redundancy fixes Focused dead-code sweep follow-up. Removed ~767 LOC of exported-but-zero-consumer symbols (each re-verified across packages+oss+ee before deletion; tsc is the gate): @agenta/evaluations: - deleted whole files table/testcases.ts (superseded by molecule path) + services/workerUtils.ts - dead atoms/helpers: serializeRunIndex/deserializeRunIndex, normalizeEvaluationKindString, evaluationMetricBatcherAtom, scenarioStepsBatcherAtom, clearScenarioStatusCache, the runDerived app/variant-id cluster, isInvocationRunningAtom, scenarioHasEmbeddedInputsAtomFamily, scenarioRowHeightPxAtom, tableScenario{Ids,Offset}AtomFamily, traceUtils extractRootSpanIdFromTraceData/ findTraceForStep, clearAllBootstrapAttempts, evaluatorOutputTypes get/visibility helpers + dead version counter, invalidateMetricSelectionCache, FLAG_LABELS, primePreviewRunCache, paginationAtom — plus their barrel re-exports. @agenta/entities: deleteEvaluationQueues (plural) + queryEvaluationQueueScenarios chain (schema/type), unused *Molecule typeof-exports, Prefetch{Results,Metrics}{Args,Outcome} aliases. Redundancy fixes: isEmptyMetrics now uses isEmptyValue; annotationFormController's private getStore() dropped for the shared one; renamed the colliding hooks PreviewEvaluationType -> PreviewEvaluationFilterType. Re-verification SAVED 5 false-positives the broad sweep flagged but are actually used (extractEvaluatorMetricKeys, getPreviewRunBatcher, invalidatePreviewRunCache, evaluator{ColumnDefs,StepRefs}AtomFamily via object-map, searchQueryAtom) — kept. Untouched: etl/ scaffolding (separate audit), annotation sync-to-testset (kept, pending UI), evaluationQueue module. evaluations/entities/evaluations-ui/annotation tsc+lint+tests green.

etl/ audit: per-symbol consumer analysis (external + internal-non-barrel + test) across web/. Removed only symbols dead on all three axes; kept everything with any consumer. Deleted: - realScenarioSource.ts (whole file) — makeRealScenarioSource + types, 0/0/0 - cacheAwareFetchers.ts (whole file) — buildMoleculeBackedFetchers / MOLECULE_BACKED_HYDRATE_FETCHERS / cacheAwareFetchTestcases, 0/0/0 - hydrateScenariosTransform.ts: the makeHydrateScenariosTransform + DEFAULT_HYDRATE_FETCHERS cluster (kept the 3 live shared type exports) - cacheDiagnostics.ts: inspectMemory + MemorySnapshot (kept inspectCache/clearCacheByPrefix) - etl/index.ts: dropped the @agenta/entities/shared passthrough re-export block (every consumer imports those directly from entities, none via the etl barrel) KEPT (verified live via evaluations-ui / internal etl / package state / tests): resolveMappings + resolvers, rowPredicateFilter, runReferenceFilter, filterSchema, hitRatioMeter, predicateToEntitySlices, all filtering/* hooks, inspectCache. Same-named RunStep/RunMapping/ColumnGroup competing decls confirmed distinct. evaluations + evaluations-ui + entities tsc/lint green; 133 evaluations tests pass.

Third main integration on this branch (80 commits, incl. release/v0.103.5, OSS invite hardening, single-project batch fetchers, cascade evaluator selector, table header-scroll-sync). Conflicts resolved, porting main's landed fixes onto the relocated package files (main still edits the OLD oss eval paths this branch moved): - Playground/PlaygroundHeader: kept our re-point to openWorkflowRevisionDrawerAtom (the evaluatorDrawerStore compat bridge was deleted in WP-4 residue B; 3 consumers call the underlying playground-ui atom directly). Took main's isolated-playground evaluator-create feature (currentAppSelection, handleCreatedEvaluator) and threaded its new params (isolatedPlayground/initialAppSelection/postCreateNavigation/onWorkflowCreated) through context: "evaluator-create" instead of the bridge's mode: "create". - agenta-ui InfiniteVirtualTableInner: ported main's .ant-table-header / .ant-table-body scroll-sync useEffect (#4697) into the relocated package copy; removed the leftover deleted oss original. - state/evaluator/evaluatorDrawerStore: accepted our deletion. main's added drawer params already merged into the underlying @agenta/playground-ui workflow-revision-drawer store, so no porting needed. Auto-merged eval package files verified to carry main's changes: - evaluations-ui RunDetails/Page: "SDK Evals" / kind:"custom" typeMap entry (no regression against the Q6 PreviewEvaluationType narrowing). - entities evaluationRun molecule: single-project runBatchFetcher rewrite. - annotation-ui CreateQueueDrawer: multi-select evaluator picker props. Gates: package tsc 0 errors (entities/evaluations/evaluations-ui/annotation/ annotation-ui/ui/playground-ui); lint 0; tests evaluations 133, entities 669, annotation 90. OSS tsc 355 (≤ pre-merge baseline ~363), no new signatures from touched files.

Drop 4 ungated/untagged console.log leftovers and 2 dead commented-out console blocks. Behaviour-only logging cleanup, no logic change. Removed: - metrics.ts triggerMetricsRefresh success log (kept the failure warn) - useAnnotationState baseline-change + remaining-edits debug logs - export/referenceResolvers stray console.log("slot") - runMetrics dead "entry.needsTemporal" comment - metricProcessor dead "flush called" comment block Deliberately kept (guarded dev diagnostics / facilities, not noise): - metricProcessorDebug isDev-gated logger; process.env.NODE_ENV-guarded [HUMAN_EVAL_REFRESH_LOG] / [EvalRunDetails2] diagnostics; buildRunIndex shouldLogDetails-gated debug; traces.ts debug facility; logExportAction helper; NEXT_PUBLIC_EVAL_RUN_DEBUG-parked blocks; catch-block error logs. Gates: evaluations + evaluations-ui types=0, lint clean, 133 tests pass.

… them EvaluationRunsTableStoreProvider mirrors injected eval-view seam atoms from the parent store into its scoped store via store.set(atom, parentValue). Several of those atoms hold a FUNCTION value (the query/metric/member families, factories, the online-evaluations api). jotai's primitive set() treats a function argument as a state updater and CALLS it, so the mirror ran e.g. queriesQueryFamily(null) — crashing in the family's {payload} destructure on the apps overview page — and silently corrupted every other function-valued seam (storing factory(prev) instead of the factory). Wrap function values in a constant updater when mirroring (both the initial seed and the live sync), matching how the host registers them via set(atom, () => v).

vercel · 2026-06-19T13:40:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 19, 2026 1:40pm

coderabbitai · 2026-06-19T13:40:21Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a57d4c9b-1422-4f36-aae6-0c84adc2bb0a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fe-chore/move-evals-to-packages

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ardaerzin added 30 commits June 8, 2026 00:15

ardaerzin added 27 commits June 13, 2026 15:19

Merge remote-tracking branch 'origin/main' into fe-chore/move-evals-t…

67e7b51

…o-packages

ardaerzin requested a review from ashrafchowdury June 19, 2026 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FE / chore] Move evals to packages#4753

[FE / chore] Move evals to packages#4753
ardaerzin wants to merge 107 commits into
mainfrom
fe-chore/move-evals-to-packages

ardaerzin commented Jun 19, 2026

Uh oh!

vercel Bot commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ardaerzin commented Jun 19, 2026

Summary

Testing

QA follow-up

Checklist

Contributor Resources

Uh oh!

vercel Bot commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant