-
Notifications
You must be signed in to change notification settings - Fork 0
Optimization flow UI, schema relaxation, and in-memory KV #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add EPICS_LIMIT env var to cap loaded epics for fast test runs - Lower acceptance criteria minimum from 2 to 1 - Make ADO fields optional throughout schema, types, and prompts - Add SCHEMA_GUARDRAILS to generator for compact, valid output - Update champion prompts to reflect relaxed constraints Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove persistent Deno KV dependency; use in-memory Maps for task, session, and checkpoint storage. All async interfaces preserved for future swap to any persistent backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add flow diagram components (playground and optimization flows) - Add OptimizationKickoff panel with real-time step/iteration tracking - Visibility-aware polling (pause when tab is hidden) - Null-safety for optional ADO fields in story cards - Add motion, shadcn, noto-sans UI dependencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite e2e test from mocked to live-server validation - Add env-based playwright config with .env file parsing - Make poll judges configurable via env vars (model, count, temps) - Add null-safety checks in scorer for missing analysis results Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds EPICS_LIMIT config and .env loading in Playwright, relaxes story schema/ADO requirements, adds prompt schema guardrails, replaces Deno KV with an in-memory Map store, introduces dynamic judge generation and scorer label guards, and implements UI flow visualizations plus an optimization e2e test. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as UI Client
participant Server as API Server
participant Store as In-Memory Store
participant Generator as LM/Generator
Client->>Server: POST /optimize (iterations, replicates, candidates)
Server->>Store: createTask(payload)
Store-->>Server: taskId (PENDING)
Server-->>Client: 202 Accepted (taskId)
loop Poll interval
Client->>Server: GET /task/{taskId}
Server->>Store: getTask(taskId)
Store-->>Server: task state + progress
Server-->>Client: task status + progress
alt status = RUNNING
Server->>Generator: request evaluation / candidate generation
Generator-->>Server: results / updates
Server->>Store: updateTaskProgress / updateTaskStatus
else status = COMPLETED or FAILED
Server->>Store: finalize task
Client->>Client: stop polling
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @venikman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refines the system's flexibility and user experience. It relaxes schema constraints for user stories, making them more adaptable, and replaces the persistent Deno KV store with a lightweight in-memory solution for reduced runtime dependencies. The user interface for the optimization flow has been greatly enhanced with visual progress tracking and real-time updates, improving observability. Additionally, the changes include robust E2E testing and configurable polling mechanisms, contributing to a more stable and testable application. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant set of improvements across the application, focusing on UI, state management, and schema flexibility. The replacement of Deno KV with an in-memory key-value store is a major architectural change that is well-implemented to remove runtime dependencies, with the async API contract thoughtfully preserved. The UI for the optimization flow has been substantially enhanced with new diagrams and a real-time progress panel, and the underlying polling logic has been made more efficient and robust by using the Page Visibility API and preventing concurrent requests. Schema relaxation for user stories and the addition of a new end-to-end test for the optimization flow are also welcome changes that improve flexibility and test coverage. The overall code quality is high, with notable improvements in null-safety and configurability throughout the codebase. I have a couple of suggestions to further improve maintainability.
playwright.config.ts
Outdated
| const readEnvFile = () => { | ||
| try { | ||
| const raw = Deno.readTextFileSync(".env"); | ||
| const entries: Record<string, string> = {}; | ||
| for (const line of raw.split(/\r?\n/)) { | ||
| const trimmed = line.trim(); | ||
| if (!trimmed || trimmed.startsWith("#")) continue; | ||
| const eq = trimmed.indexOf("="); | ||
| if (eq === -1) continue; | ||
| const key = trimmed.slice(0, eq).trim(); | ||
| let value = trimmed.slice(eq + 1).trim(); | ||
| if ( | ||
| (value.startsWith('"') && value.endsWith('"')) || | ||
| (value.startsWith("'") && value.endsWith("'")) | ||
| ) { | ||
| value = value.slice(1, -1); | ||
| } | ||
| entries[key] = value; | ||
| } | ||
| return entries; | ||
| } catch { | ||
| return {}; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This custom readEnvFile function re-implements logic that is already available in the Deno standard library's dotenv module, which is used elsewhere in the project. To reduce code duplication and improve maintainability, you can replace this implementation with a call to the parse function from @std/dotenv. You will need to add import { parse } from "@std/dotenv"; to the top of the file.
This also improves error handling by only catching NotFound errors and re-throwing others, which is safer than the current implementation that swallows all errors.
const readEnvFile = () => {
try {
const raw = Deno.readTextFileSync(".env");
return parse(raw);
} catch (e) {
if (e instanceof Deno.errors.NotFound) {
return {};
}
throw e;
}
};| /** | ||
| * Deno KV State Store | ||
| * In-memory State Store | ||
| * | ||
| * Persistent storage for tasks and session checkpoints. | ||
| * Replaces in-memory Maps for production reliability. | ||
| * | ||
| * Key schema: | ||
| * - ["tasks", taskId] → TaskRecord | ||
| * - ["tasks:by-status", status, taskId] → taskId (index) | ||
| * - ["optimization-tasks", taskId] → OptimizationTask | ||
| * - ["checkpoints", checkpointId] → SessionCheckpoint | ||
| * - ["checkpoints:by-session", sessionId, checkpointId] → checkpointId (index) | ||
| * Simple, non-persistent storage for tasks and session checkpoints. | ||
| * This replaces the Deno KV-backed store to avoid runtime dependencies. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this module provides an in-memory store instead of a Deno KV-backed one, the filename kv-store.ts and the main export kvStore have become misnomers. For better clarity and long-term maintainability, consider renaming the file to state-store.ts and the kvStore export to stateStore to more accurately reflect its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the prompt-optimization product across schema validation, server/runtime infrastructure, UI observability, and E2E coverage—aiming to make runs faster to test locally while improving the optimization UX and removing the Deno KV dependency.
Changes:
- Relaxed story schema requirements (acceptance criteria min 2→1; ADO metadata optional) and updated prompt guardrails accordingly.
- Replaced the orchestrator’s Deno KV-backed state store with an in-memory Map-based implementation (async API preserved).
- Expanded the UI optimization flow experience (step diagrams + richer progress panel + visibility-aware polling) and added a live-server Playwright E2E for optimization.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/schema.test.ts | Updates schema tests for the new acceptance-criteria minimum. |
| tests/e2e/optimization-flow.spec.ts | Adds a UI-driven optimization E2E test against the live server. |
| src/ui/src/types.ts | Makes ado and key ADO fields optional to match relaxed schema. |
| src/ui/src/App.tsx | Adds flow diagrams, richer optimization progress UI, and visibility-aware polling / null-safety. |
| src/ui/package.json | Adds UI dependencies needed by styling/animation updates. |
| src/server/handler.ts | Adds EPICS_LIMIT slicing for faster local/E2E runs. |
| src/scorer.ts | Makes scorer resilient to missing analysis step output (null-safe scoring + labeling). |
| src/schema.ts | Relaxes user story schema (min acceptance criteria 1; ADO optional; ADO fields partial). |
| src/orchestrator/state/kv-store.ts | Replaces Deno KV persistence with in-memory Maps for task/checkpoint storage. |
| src/generator.ts | Adds explicit schema guardrails to generation instructions and surfaces them in traces/results. |
| src/fpf/poll.ts | Generates default PoLL judges from env-configured count/temps instead of hardcoding. |
| src/config.ts | Adds EPICS_LIMIT env var validation. |
| prompts/champion.md | Updates prompt rules to match schema relaxation + compactness guidance. |
| prompts/champion.base.md | Same prompt rule updates as champion.md. |
| playwright.config.ts | Loads env (including from .env) and configures webServer env for faster E2E runs. |
| .env.example | Documents EPICS_LIMIT for fast test runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ui/src/App.tsx
Outdated
| const cycleStepStyle = (index: number, total: number, radius: number) => { | ||
| const angle = (Math.PI * 2 * index) / total - Math.PI / 2; | ||
| const x = Math.cos(angle) * radius; | ||
| const y = Math.sin(angle) * radius; | ||
| return { | ||
| transform: `translate(-50%, -50%) translate(${x}px, ${y}px)`, | ||
| }; | ||
| }; | ||
|
|
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cycleStepStyle is defined but never used, which leaves dead code in App.tsx. Either remove it or wire it into the cycle layout rendering so it has a purpose.
| const cycleStepStyle = (index: number, total: number, radius: number) => { | |
| const angle = (Math.PI * 2 * index) / total - Math.PI / 2; | |
| const x = Math.cos(angle) * radius; | |
| const y = Math.sin(angle) * radius; | |
| return { | |
| transform: `translate(-50%, -50%) translate(${x}px, ${y}px)`, | |
| }; | |
| }; |
src/ui/src/App.tsx
Outdated
| const FlowExplanationPanel = ( | ||
| { explanation }: { explanation: FlowExplanation }, | ||
| ) => ( | ||
| <Artifact className="rounded-none border border-border bg-card shadow-sm"> | ||
| <ArtifactHeader className="items-start gap-3 border-b border-border bg-muted/40"> | ||
| <div className="space-y-1"> | ||
| <ArtifactTitle className="text-base font-semibold text-foreground"> | ||
| {index + 1}. {story.title} | ||
| </ArtifactTitle> | ||
| <ArtifactDescription className="text-sm leading-relaxed text-muted-foreground"> | ||
| As a {story.asA}, I want {story.iWant} so that {story.soThat}. | ||
| </ArtifactDescription> | ||
| <div className="rounded-none border border-border bg-muted/30 p-4 text-sm"> | ||
| <p className="text-xs uppercase tracking-[0.3em] text-muted-foreground"> |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FlowExplanationPanel is declared here but never referenced. If flow explanations are intended, render this from FlowDiagramCard when diagram.explanation is present; otherwise remove the unused component/types to keep App.tsx smaller.
| Deno.test("userStorySchema - rejects story with empty acceptance criteria", () => { | ||
| const story = { | ||
| title: "Add login functionality", | ||
| asA: "user", | ||
| iWant: "to login", |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema behavior was relaxed (acceptanceCriteria min 2→1, ado optional, ADO fields optional), but the tests only cover the new rejection case (empty acceptanceCriteria). Add positive coverage for a story with exactly 1 acceptance criterion and for omitting ado / providing partial ado.fields so the relaxed contract can’t regress silently.
playwright.config.ts
Outdated
| const readEnvFile = () => { | ||
| try { | ||
| const raw = Deno.readTextFileSync(".env"); | ||
| const entries: Record<string, string> = {}; |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config reads .env and environment variables via Deno.* APIs. If someone runs the E2E tests via npx playwright test (Node runtime, as suggested in the PR description), this will throw because Deno is undefined—either update the documented command to the Deno-based runner (deno task test:e2e) or refactor env loading to be Node-compatible.
| /** | ||
| * Save a task to the store. | ||
| */ | ||
| export async function saveTask(task: TaskRecord): Promise<void> { |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saveTask() always adds the task id to the status index, but it never removes a prior index entry if the same task id is saved again with a different status. That can leave stale ids in tasksByStatus (and make listTasksByStatus() return tasks whose status no longer matches the queried status). Consider de-indexing any existing task record for the same id before re-indexing, or make saveTask private and force all status changes through updateTaskStatus.
| export async function saveTask(task: TaskRecord): Promise<void> { | |
| export async function saveTask(task: TaskRecord): Promise<void> { | |
| const existing = tasks.get(task.id); | |
| if (existing) { | |
| deindexTask(existing); | |
| } |
Without --include=src/ui/dist, deployctl excludes the built UI and the app returns 503 at runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f94fdfc145
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const llmBaseUrl = getEnv("LLM_BASE_URL", "https://openrouter.ai/api/v1"); | ||
| const llmApiKey = getEnv("LLM_API_KEY"); | ||
| const llmModel = getEnv("LLM_MODEL", "openai/gpt-4o-mini"); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid defaulting e2e LLM to OpenRouter without a key
The Playwright webServer now defaults LLM_BASE_URL to OpenRouter while leaving LLM_API_KEY empty when no .env/env vars are set. In a fresh checkout this makes the server pick an external base URL with no credentials, so any LLM call in the optimization flow will fail with 401s and the new e2e test will hang/fail. Consider falling back to the previous local LM Studio defaults unless LLM_* is explicitly provided, or require a non-empty key before using OpenRouter.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@playwright.config.ts`:
- Around line 29-34: The code uses getEnv to populate llmApiKey which currently
defaults to an empty string when unset, causing silent failures; update the
startup logic after llmApiKey is read to detect an empty or missing key and
either throw an error or log a clear warning and exit when llmBaseUrl is not
localhost (e.g., check that llmBaseUrl does not start with "http://localhost" or
"http://127.0.0.1"); reference the symbols getEnv, llmApiKey, and llmBaseUrl and
implement the guard immediately after those constants are set so remote
providers fail fast or emit a visible warning instead of passing an empty
LLM_API_KEY to the server.
In `@src/orchestrator/state/kv-store.ts`:
- Around line 88-90: getTask currently returns the live TaskRecord stored in the
tasks Map so callers can mutate it and silently corrupt the tasksByStatus index;
change getTask to return a shallow copy of the stored record (e.g., clone the
object before returning) so callers cannot mutate the internal map entry, and
apply the same defensive-return pattern to analogous getters
(getOptimizationTask, getCheckpoint, getLatestCheckpoint) if they also return
objects that callers might mutate; ensure listTasksByStatus and tasksByStatus
keep using the internal objects while external callers always get copies.
- Around line 83-90: CI fails because saveTask and getTask are declared async
but contain no await; add a Deno lint suppression to satisfy the require-await
rule while keeping the API async: either add a file-level directive (e.g., a
single "// deno-lint-ignore-file require-await" near the top of the file) or add
per-function ignores immediately above the async functions (e.g., "//
deno-lint-ignore require-await" before the saveTask and getTask declarations) so
saveTask and getTask remain async but the linter stops erroring.
- Around line 1-6: Add unit tests for the in-memory KV store
(src/orchestrator/state/kv-store.ts) exercising task lifecycle and index
bookkeeping: create tasks, transition their status via whatever API exists
(e.g., createTask/updateTaskStatus or TaskStore methods), and assert
tasksByStatus indices and listTasksByStatus return correct sets after each
transition; add tests for checkpoints (checkpointsBySession) including creating
multiple checkpoints with different createdAt values and asserting retrieval is
sorted by createdAt and cleanupOldCheckpoints removes only the expected entries;
add tests for cleanupOldTasks to remove tasks older than threshold while leaving
current ones and handle edge cases such as empty stores and missing
cross-references (e.g., tasks referenced in indices but not in main store);
include assertions that indices are kept in sync after deletions and status
changes and cover boundary conditions (no-op cleanups).
In `@src/ui/src/App.tsx`:
- Around line 386-393: Remove the unused helper function cycleStepStyle from
App.tsx: locate the function declaration "const cycleStepStyle = (index: number,
total: number, radius: number) => { ... }" and delete it along with its return
block since FlowDiagramCard uses CSS grid for cycle layout and this trig-based
positioning is dead code; ensure no other references to cycleStepStyle remain
and run a quick compile to confirm no usages are left.
🧹 Nitpick comments (15)
src/scorer.ts (3)
187-190: Consider using optional chaining instead of an intermediate boolean for null-safety.TypeScript cannot narrow
athrough theanalyzeAvailableboolean variable, soa.investinanalyzeAvailable ? a.invest : 0won't type-narrowato non-nullish. While this works at runtime, optional chaining is both TS-friendly and more concise:♻️ Suggested simplification
- const analyzeAvailable = !!a; - const investScore = analyzeAvailable ? a.invest : 0; - const criteriaScore = analyzeAvailable ? a.acceptanceCriteria : 0; - const duplicationScore = analyzeAvailable ? a.duplication : 0; + const investScore = a?.invest ?? 0; + const criteriaScore = a?.acceptanceCriteria ?? 0; + const duplicationScore = a?.duplication ?? 0;You'd still need
analyzeAvailableingenerateReasonfor the labels, but there you could similarly usea?.invest?.toFixed(3) ?? "n/a"etc.
211-216: Silent score penalty when analysis is unavailable.When
analyzeStepResultis absent, 65% of the heuristic weight (invest + criteria + duplication) contributes zero, capping the heuristic score at ~0.35. This is reasonable as a degraded fallback, but be aware that intermittent analyze-step failures during optimization iterations could significantly skew scores downward without an obvious signal beyond the reason string.If this is intentional, no action needed. If not, consider either re-normalizing the weights across available dimensions or returning a sentinel/skip to avoid polluting optimization with artificially low scores.
241-248: Same TS narrowing caveat applies here.Same pattern as
generateScore— the intermediate boolean doesn't narrowafor TypeScript. Optional chaining would be cleaner:- const investLabel = analyzeAvailable ? a.invest.toFixed(3) : "n/a"; - const criteriaLabel = analyzeAvailable - ? a.acceptanceCriteria.toFixed(3) - : "n/a"; - const duplicationLabel = analyzeAvailable - ? a.duplication.toFixed(3) - : "n/a"; + const investLabel = a?.invest?.toFixed(3) ?? "n/a"; + const criteriaLabel = a?.acceptanceCriteria?.toFixed(3) ?? "n/a"; + const duplicationLabel = a?.duplication?.toFixed(3) ?? "n/a";src/fpf/poll.ts (1)
1-5: Stale doc comment: "3 diverse judges" is no longer accurate.Line 5 still says "3 diverse judges" but judge count is now runtime-configurable via
POLL_NUM_JUDGES. Consider updating to reflect the dynamic nature.src/orchestrator/state/kv-store.ts (2)
44-48: No reset mechanism for testing or between runs.These module-level Maps accumulate state for the lifetime of the process. Consider exporting a
resetStore()helper (even if only used in tests) to clear all five Maps, so tests can run in isolation without leaking state across suites.♻️ Proposed helper
+/** Clear all in-memory state – intended for tests. */ +export function resetStore(): void { + tasks.clear(); + tasksByStatus.clear(); + optimizationTasks.clear(); + checkpoints.clear(); + checkpointsBySession.clear(); +}
206-244: Minor DRY opportunity:getLatestCheckpointandlistCheckpointsshare identical filter-and-sort logic.Consider extracting a shared helper, or have
getLatestCheckpointdelegate tolistCheckpoints:♻️ Example
export async function getLatestCheckpoint( sessionId: string, ): Promise<SessionCheckpoint | null> { - const ids = checkpointsBySession.get(sessionId); - if (!ids || ids.size === 0) return null; - - const list = Array.from(ids) - .map((id) => checkpoints.get(id)) - .filter((checkpoint): checkpoint is SessionCheckpoint => - Boolean(checkpoint) - ); - - if (list.length === 0) return null; - - list.sort((a, b) => - new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime() - ); - - return list[0] ?? null; + const sorted = await listCheckpoints(sessionId); + return sorted[0] ?? null; }tests/e2e/optimization-flow.spec.ts (1)
1-24: Live e2e test looks solid for the happy path.The test correctly exercises the optimization flow lifecycle from the UI perspective, and the Playwright locators match the App.tsx label/text structure.
Two minor observations:
- No failure-path assertion: If the optimization fails server-side, the test silently waits the full 6 minutes before timing out. Consider adding a race with
page.getByText("failed")to surface server errors sooner:💡 Suggested improvement
- await expect( - page.getByText("completed", { exact: true }).first(), - ).toBeVisible({ timeout: 360_000 }); + const completed = page.getByText("completed", { exact: true }).first(); + const failed = page.getByText("failed", { exact: true }).first(); + const result = await Promise.race([ + completed.waitFor({ state: "visible", timeout: 360_000 }).then(() => "completed"), + failed.waitFor({ state: "visible", timeout: 360_000 }).then(() => "failed"), + ]); + expect(result).toBe("completed");
- 360s timeout appears both as test-level and assertion-level: The
test.setTimeout(360_000)at line 4 sets the overall test timeout, but the assertion at line 23 also hastimeout: 360_000. If the 30s wait at line 16 succeeds, only ~330s remain for the test, while the assertion still waits up to 360s — the test-level timeout will fire first, producing a less helpful error. Consider setting the test timeout slightly higher than the sum of all assertion timeouts.src/ui/src/App.tsx (3)
1079-1123: Visibility-aware telemetry polling is well implemented, but lacks aninFlightguard.The optimization poller (lines 1203-1234) properly guards against overlapping polls with
inFlight, but this telemetry poller does not. If a/telemetryfetch takes longer thanTELEMETRY_POLL_MS(30s), polls will overlap. This is low-risk since telemetry is read-only and the last response wins viasetTelemetry, but for consistency you could add the sameinFlightguard.
62-306: Large amount of types, constants, and mapping logic defined inline in App.tsx.This file is approaching ~1700 lines. The flow diagram types, step constants (
PLAYGROUND_STEPS,OPTIMIZATION_STEPS,FLOW_DIAGRAMS), stage-resolution logic, and activity descriptions are all domain data that could live in a separate module (e.g.,flow-config.ts). This would improve readability and testability of the mapping functions likeresolveOptimizationStage.
1574-1579:event.currentTarget?.value— optional chaining is unnecessary here.In a React
onChangehandler for an<input>,event.currentTargetis always defined. The?.is harmless but misleading — it suggestscurrentTargetcould be null, which isn't the case in this context.tests/schema.test.ts (1)
132-153: Add a boundary test for exactly 1 acceptance criterion.The test correctly validates that an empty array is rejected, but there's no positive test for the new lower bound — a story with exactly 1 acceptance criterion should now pass. This boundary case was previously implicitly covered (it was invalid at min 2), so it deserves explicit validation.
🧪 Suggested boundary test
+Deno.test("userStorySchema - accepts story with exactly 1 acceptance criterion", () => { + const story = { + title: "Single AC story", + asA: "user", + iWant: "to do something", + soThat: "I get value", + acceptanceCriteria: ["Given X, When Y, Then Z"], + }; + + const result = userStorySchema.safeParse(story); + assert(result.success, "Story with 1 acceptance criterion should be valid"); +});prompts/champion.base.md (1)
1-26: Content is identical toprompts/champion.md.If one is meant to be the immutable base and the other a working copy that evolves during optimization, consider documenting that convention. Otherwise, having two identical files risks silent drift over time.
src/schema.ts (2)
52-57: Schema relaxation looks correct; consider adding test coverage for the new optionality.
acceptanceCriteria: min(1)andado: .optional()align with the prompt and config changes. However, the test file still provides ADO fields in every test case. Adding tests foradoomitted entirely and for a story with exactly 1 acceptance criterion (as mentioned in the test file review) would solidify coverage of these new boundaries.🧪 Suggested test for optional ado
+Deno.test("userStorySchema - accepts story without ado field", () => { + const story = { + title: "No ADO story", + asA: "user", + iWant: "to do something", + soThat: "I get value", + acceptanceCriteria: ["Given X, When Y, Then Z"], + }; + + const result = userStorySchema.safeParse(story); + assert(result.success, "Story without ado should be valid"); + assertEquals(result.data?.ado, undefined); +});
26-39: Test coverage for optionaladoscenarios would clarify intended behavior.The
.partial()design is intentional: theadoblock is optional (line 57), and when included, fields within can be partial. Downstream code insrc/ui/src/App.tsxsafely handles this via optional chaining (story.ado?.fields?.["key"]), so emptyfields: {}poses no safety risk. However, adding tests for edge cases —adoomitted entirely,adowith empty fields, andadowith partial fields — would explicitly document this flexibility and prevent future regressions.playwright.config.ts (1)
3-26: Replace custom.envparser with@std/dotenv'sloadSyncto eliminate duplication and improve robustness.The custom parser duplicates functionality already available in
@std/dotenv, which is imported bysrc/config.ts.@std/dotenv'sloadSynchandles edge cases the current implementation misses: escape sequences in quoted values, variable expansion ($KEY,${KEY:-default}), andexportprefix support. Sinceplaywright.config.tsoperates independently and needs its own.envloader, migrate to the standard library function.♻️ Suggested approach
-const readEnvFile = () => { - try { - const raw = Deno.readTextFileSync(".env"); - const entries: Record<string, string> = {}; - for (const line of raw.split(/\r?\n/)) { - const trimmed = line.trim(); - if (!trimmed || trimmed.startsWith("#")) continue; - const eq = trimmed.indexOf("="); - if (eq === -1) continue; - const key = trimmed.slice(0, eq).trim(); - let value = trimmed.slice(eq + 1).trim(); - if ( - (value.startsWith('"') && value.endsWith('"')) || - (value.startsWith("'") && value.endsWith("'")) - ) { - value = value.slice(1, -1); - } - entries[key] = value; - } - return entries; - } catch { - return {}; - } -}; +import { loadSync } from "@std/dotenv"; + +const readEnvFile = (): Record<string, string> => { + try { + return loadSync({ export: false }); + } catch { + return {}; + } +};
| /** | ||
| * Deno KV State Store | ||
| * In-memory State Store | ||
| * | ||
| * Persistent storage for tasks and session checkpoints. | ||
| * Replaces in-memory Maps for production reliability. | ||
| * | ||
| * Key schema: | ||
| * - ["tasks", taskId] → TaskRecord | ||
| * - ["tasks:by-status", status, taskId] → taskId (index) | ||
| * - ["optimization-tasks", taskId] → OptimizationTask | ||
| * - ["checkpoints", checkpointId] → SessionCheckpoint | ||
| * - ["checkpoints:by-session", sessionId, checkpointId] → checkpointId (index) | ||
| * Simple, non-persistent storage for tasks and session checkpoints. | ||
| * This replaces the Deno KV-backed store to avoid runtime dependencies. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for test files related to kv-store
echo "=== Searching for test files mentioning kv-store ==="
fd -type f \( -name '*test*' -o -name '*spec*' \) | xargs grep -l 'kv-store' 2>/dev/null || echo "No test files found mentioning kv-store"
echo ""
echo "=== Searching all files mentioning kv-store ==="
rg 'kv-store' --type ts --type tsx --type js --type jsx -l
echo ""
echo "=== Checking test directory structure ==="
find . -type d -name '__tests__' -o -name 'tests' -o -name 'test' 2>/dev/null | head -20
echo ""
echo "=== Listing all test files in repo ==="
fd -type f \( -name '*.test.ts' -o -name '*.test.tsx' -o -name '*.spec.ts' -o -name '*.spec.tsx' \) | head -30Repository: venikman/PromptAgent
Length of output: 683
🏁 Script executed:
#!/bin/bash
# Search for kv-store.ts file
echo "=== Locating kv-store.ts ==="
fd 'kv-store' --extension ts
echo ""
echo "=== Test files in repo ==="
fd '\.test\.' --extension ts
fd '\.spec\.' --extension ts
echo ""
echo "=== Full content of kv-store.ts ==="
find . -name 'kv-store.ts' -exec cat {} \;
echo ""
echo "=== List tests directory ==="
ls -la ./tests/ 2>/dev/null || echo "Tests directory not accessible"Repository: venikman/PromptAgent
Length of output: 9668
Add unit tests for the in-memory KV store implementation.
This file contains non-trivial logic with index bookkeeping (tasksByStatus, checkpointsBySession), status transition handling, and cleanup operations. Tests should cover:
- Task lifecycle and status transitions (ensuring indices stay in sync)
listTasksByStatuscorrectness after status changes- Cleanup operations (
cleanupOldTasks,cleanupOldCheckpoints) with edge cases (empty sets, missing references) - Checkpoint retrieval and sorting by
createdAt
Per coding guidelines, unit tests are required for logic changes in TypeScript files.
🤖 Prompt for AI Agents
In `@src/orchestrator/state/kv-store.ts` around lines 1 - 6, Add unit tests for
the in-memory KV store (src/orchestrator/state/kv-store.ts) exercising task
lifecycle and index bookkeeping: create tasks, transition their status via
whatever API exists (e.g., createTask/updateTaskStatus or TaskStore methods),
and assert tasksByStatus indices and listTasksByStatus return correct sets after
each transition; add tests for checkpoints (checkpointsBySession) including
creating multiple checkpoints with different createdAt values and asserting
retrieval is sorted by createdAt and cleanupOldCheckpoints removes only the
expected entries; add tests for cleanupOldTasks to remove tasks older than
threshold while leaving current ones and handle edge cases such as empty stores
and missing cross-references (e.g., tasks referenced in indices but not in main
store); include assertions that indices are kept in sync after deletions and
status changes and cover boundary conditions (no-op cleanups).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/deploy-deno.yml (2)
9-14: 🛠️ Refactor suggestion | 🟠 MajorProduction deployment lacks an
environmentapproval gate.This workflow deploys to production (
--prod) on every push tomain, but the job has noenvironment:key, so GitHub Environment protection rules (required reviewers, wait timers) cannot be enforced. Adding one is straightforward:Proposed fix
deploy: runs-on: ubuntu-latest + environment: production permissions: contents: read id-token: writeAs per coding guidelines, "Require environment approvals for production deployments".
31-31:⚠️ Potential issue | 🟡 MinorPin
@deno/deployctlto a specific version for reproducible deployments.
jsr:@deno/deployctl`` without a version pin will resolve to the latest version at deploy time, potentially introducing breaking changes silently.Example
- - name: Install deployctl - run: deno install -A --global jsr:`@deno/deployctl` + - name: Install deployctl + run: deno install -A --global jsr:`@deno/deployctl`@1.13.1
- Fix CI: add require-await lint suppression in kv-store.ts - Fix saveTask stale index: de-index existing task before re-indexing - Return shallow copies from getTask and other getters to prevent callers from corrupting internal state indices - Remove unused cycleStepStyle and FlowExplanationPanel from App.tsx - Use optional chaining in scorer.ts instead of intermediate boolean - Replace custom .env parser with @std/dotenv parse in playwright config - Add LLM API key guard warning for remote providers without key - Add boundary tests for schema relaxation (1 AC, no ado, partial ado) - Update stale doc comment in poll.ts (N judges, not 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/orchestrator/state/kv-store.ts`:
- Around line 148-156: listTasksByStatus currently returns live TaskRecord
objects from the internal tasks Map, letting callers mutate fields (e.g.,
.status) and corrupt tasksByStatus; fix listTasksByStatus to return defensive
copies (shallow clones) of each task like getTask does instead of the original
references—when mapping ids to records use a clone (e.g., spread {...task}) for
each non-null entry so callers cannot mutate internal state; keep filters/type
guard the same and reference the tasks and tasksByStatus collections and the
listTasksByStatus function when making the change.
- Around line 237-254: The listCheckpoints function is returning live references
(unlike getCheckpoint/getLatestCheckpoint); update listCheckpoints to return
defensive copies of each SessionCheckpoint instead of the original objects
retrieved from the checkpoints map — locate the code that builds list using
checkpointsBySession and checkpoints, and when mapping/filtering the values (the
Array.from(ids).map(...).filter(...)), replace the object references with
shallow clones (e.g., create a new object per checkpoint) so callers receive
copies rather than the original entries from the checkpoints map.
In `@src/ui/src/App.tsx`:
- Around line 1182-1193: The current polling loop marks done = true and calls
stop() in the catch of the optimization fetch, which permanently halts polling
on any error; change this to implement a retry budget: add a retry counter
(e.g., retryCount or retryBudget) scoped alongside inFlight/cancelled,
decrement/increment on each catch, setOptimizationError with the error but only
set done = true and call stop() when the retry budget is exhausted or the error
is a terminal error type; reset retry budget on a successful fetch and ensure
inFlight is cleared in finally as before so transient network hiccups won't
permanently kill polling in functions/variables like setOptimizationError, done,
stop, inFlight, and cancelled.
🧹 Nitpick comments (7)
src/scorer.ts (1)
187-189: Score silently degrades when analysis is unavailable — consider documenting or adjusting weights.When
analyzeStepResultisundefined,investScore,criteriaScore, andduplicationScoreall fall back to0, which zeroes out 65% ofHEURISTIC_WEIGHTS(0.25 + 0.25 + 0.15). The final heuristic score is then driven only bycoverageandcount(35%). This is a valid graceful degradation, but it could surprise callers who don't inspect the reason string.A brief inline comment near Line 187 noting this trade-off (e.g., "analysis unavailable → 65% of heuristic weight zeroed") would help future maintainers understand the intent.
Also applies to: 211-215
playwright.config.ts (1)
49-59: Spreading the full process environment intowebServer.env.
Deno.env.toObject()forwards every variable from the parent process. This is fine for local dev/CI, but be mindful that it could leak sensitive variables (e.g., unrelated API keys, tokens) into the spawned test server process. Since this is a Playwright test config and not a production deployment path, the risk is low.src/ui/src/App.tsx (4)
1036-1080: Telemetry polling lacks an in-flight guard, unlike optimization polling.The optimization polling effect (Line 1160) has
inFlightprotection to prevent overlapping requests. This telemetry poller doesn't, so if a/telemetryfetch takes longer thanTELEMETRY_POLL_MS, concurrent requests can pile up. Low risk for a GET endpoint, but worth aligning for consistency.♻️ Add in-flight guard to telemetry polling
useEffect(() => { let cancelled = false; let intervalId: number | null = null; + let inFlight = false; const poll = async () => { + if (inFlight) return; + inFlight = true; try { const res = await fetch("/telemetry"); if (!res.ok) return; const data = await readJson<TelemetrySnapshot>(res); if (!cancelled) { setTelemetry(data); } } catch { return; + } finally { + inFlight = false; } };
1299-1305: Filtering a static array for a known single item.Both sections filter
FLOW_DIAGRAMSby a knownidand then.map(). Since you know the exact ids at compile time, you could reference the diagrams directly or use.find(). This is a very minor readability nit.Also applies to: 1684-1690
62-327: Consider extracting flow diagram types and constants to a separate module.This file now contains ~270 lines of type definitions, constant step arrays, lookup maps, and helper functions for flow diagrams before any React component code begins. Extracting these into a dedicated file (e.g.,
flow-diagrams.ts) would improve readability and make the types reusable.
613-797:OptimizationKickoffis a substantial presentational component (~185 lines).This component, along with
FlowStepCard,FlowDiagramCard, andStoryCard, could live in their own files under acomponents/directory. This would reduceApp.tsxfrom its current size and make individual components easier to test.src/orchestrator/state/kv-store.ts (1)
85-92:saveTaskstores a live reference — consider storing a defensive copy.
getTaskcorrectly returns{ ...task }, butsaveTaskstores the caller's object directly. If the caller retains and later mutates it (e.g., changes.status), the in-memory record changes without updatingtasksByStatus, silently corrupting the index.🛡️ Proposed fix
export async function saveTask(task: TaskRecord): Promise<void> { const existing = tasks.get(task.id); if (existing) { deindexTask(existing); } - tasks.set(task.id, task); - indexTask(task); + const copy = { ...task }; + tasks.set(copy.id, copy); + indexTask(copy); }Same concern applies to
saveOptimizationTaskon Line 176.
| export async function listTasksByStatus( | ||
| status: TaskStatus, | ||
| ): Promise<TaskRecord[]> { | ||
| const kv = await getKv(); | ||
| const tasks: TaskRecord[] = []; | ||
|
|
||
| const entries = kv.list<string>({ prefix: ["tasks:by-status", status] }); | ||
| for await (const entry of entries) { | ||
| const task = await getTask(entry.value); | ||
| if (task) tasks.push(task); | ||
| } | ||
|
|
||
| return tasks; | ||
| const ids = tasksByStatus.get(status); | ||
| if (!ids) return []; | ||
| return Array.from(ids) | ||
| .map((id) => tasks.get(id)) | ||
| .filter((task): task is TaskRecord => Boolean(task)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listTasksByStatus returns live references — callers can corrupt indices.
getTask was fixed to return { ...task }, but this list function hands out the actual objects from the internal Map. A caller mutating .status on any returned record silently desyncs tasksByStatus.
🛡️ Proposed fix
return Array.from(ids)
.map((id) => tasks.get(id))
- .filter((task): task is TaskRecord => Boolean(task));
+ .filter((task): task is TaskRecord => Boolean(task))
+ .map((task) => ({ ...task }));
}🤖 Prompt for AI Agents
In `@src/orchestrator/state/kv-store.ts` around lines 148 - 156, listTasksByStatus
currently returns live TaskRecord objects from the internal tasks Map, letting
callers mutate fields (e.g., .status) and corrupt tasksByStatus; fix
listTasksByStatus to return defensive copies (shallow clones) of each task like
getTask does instead of the original references—when mapping ids to records use
a clone (e.g., spread {...task}) for each non-null entry so callers cannot
mutate internal state; keep filters/type guard the same and reference the tasks
and tasksByStatus collections and the listTasksByStatus function when making the
change.
| export async function listCheckpoints( | ||
| sessionId: string, | ||
| ): Promise<SessionCheckpoint[]> { | ||
| const kv = await getKv(); | ||
| const checkpoints: SessionCheckpoint[] = []; | ||
| const ids = checkpointsBySession.get(sessionId); | ||
| if (!ids || ids.size === 0) return []; | ||
|
|
||
| const entries = kv.list<string>({ | ||
| prefix: ["checkpoints:by-session", sessionId], | ||
| }); | ||
|
|
||
| for await (const entry of entries) { | ||
| const checkpoint = await getCheckpoint(entry.value); | ||
| if (checkpoint) checkpoints.push(checkpoint); | ||
| } | ||
| const list = Array.from(ids) | ||
| .map((id) => checkpoints.get(id)) | ||
| .filter((checkpoint): checkpoint is SessionCheckpoint => | ||
| Boolean(checkpoint) | ||
| ); | ||
|
|
||
| // Sort by creation time (newest first) | ||
| checkpoints.sort((a, b) => | ||
| list.sort((a, b) => | ||
| new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime() | ||
| ); | ||
|
|
||
| return checkpoints; | ||
| return list; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listCheckpoints also returns live references — inconsistent with getCheckpoint/getLatestCheckpoint.
Same pattern gap as listTasksByStatus. Apply the defensive copy for consistency.
🛡️ Proposed fix
list.sort((a, b) =>
new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime()
);
- return list;
+ return list.map((cp) => ({ ...cp }));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export async function listCheckpoints( | |
| sessionId: string, | |
| ): Promise<SessionCheckpoint[]> { | |
| const kv = await getKv(); | |
| const checkpoints: SessionCheckpoint[] = []; | |
| const ids = checkpointsBySession.get(sessionId); | |
| if (!ids || ids.size === 0) return []; | |
| const entries = kv.list<string>({ | |
| prefix: ["checkpoints:by-session", sessionId], | |
| }); | |
| for await (const entry of entries) { | |
| const checkpoint = await getCheckpoint(entry.value); | |
| if (checkpoint) checkpoints.push(checkpoint); | |
| } | |
| const list = Array.from(ids) | |
| .map((id) => checkpoints.get(id)) | |
| .filter((checkpoint): checkpoint is SessionCheckpoint => | |
| Boolean(checkpoint) | |
| ); | |
| // Sort by creation time (newest first) | |
| checkpoints.sort((a, b) => | |
| list.sort((a, b) => | |
| new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime() | |
| ); | |
| return checkpoints; | |
| return list; | |
| } | |
| export async function listCheckpoints( | |
| sessionId: string, | |
| ): Promise<SessionCheckpoint[]> { | |
| const ids = checkpointsBySession.get(sessionId); | |
| if (!ids || ids.size === 0) return []; | |
| const list = Array.from(ids) | |
| .map((id) => checkpoints.get(id)) | |
| .filter((checkpoint): checkpoint is SessionCheckpoint => | |
| Boolean(checkpoint) | |
| ); | |
| list.sort((a, b) => | |
| new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime() | |
| ); | |
| return list.map((cp) => ({ ...cp })); | |
| } |
🤖 Prompt for AI Agents
In `@src/orchestrator/state/kv-store.ts` around lines 237 - 254, The
listCheckpoints function is returning live references (unlike
getCheckpoint/getLatestCheckpoint); update listCheckpoints to return defensive
copies of each SessionCheckpoint instead of the original objects retrieved from
the checkpoints map — locate the code that builds list using
checkpointsBySession and checkpoints, and when mapping/filtering the values (the
Array.from(ids).map(...).filter(...)), replace the object references with
shallow clones (e.g., create a new object per checkpoint) so callers receive
copies rather than the original entries from the checkpoints map.
| } catch (err) { | ||
| if (!cancelled) { | ||
| setOptimizationError( | ||
| err instanceof Error ? err.message : String(err), | ||
| ); | ||
| } | ||
| if (intervalId) clearInterval(intervalId); | ||
| done = true; | ||
| stop(); | ||
| } finally { | ||
| inFlight = false; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization polling stops permanently on any fetch error.
Setting done = true in the catch block (Line 1188) means a single transient network hiccup (e.g., brief connectivity loss) permanently kills polling. The user would have to re-trigger the optimization to see updates again.
Consider adding a retry budget or only treating certain errors as terminal.
🛡️ Suggested approach: retry budget
let inFlight = false;
let done = false;
+ let errorCount = 0;
+ const MAX_POLL_ERRORS = 3;
const poll = async () => {
if (inFlight || done) return;
inFlight = true;
try {
const res = await fetch(`/v3/optimize/${optimizationTask.taskId}`);
const data = await readJson<OptimizationTask & { error?: string }>(res);
if (!res.ok) {
throw new Error(data.error || `HTTP ${res.status}`);
}
if (!cancelled) {
setOptimizationTask(data);
}
+ errorCount = 0;
if (data.status === "completed" || data.status === "failed") {
done = true;
stop();
}
} catch (err) {
- if (!cancelled) {
- setOptimizationError(
- err instanceof Error ? err.message : String(err),
- );
+ errorCount += 1;
+ if (errorCount >= MAX_POLL_ERRORS) {
+ if (!cancelled) {
+ setOptimizationError(
+ err instanceof Error ? err.message : String(err),
+ );
+ }
+ done = true;
+ stop();
}
- done = true;
- stop();
} finally {
inFlight = false;
}🤖 Prompt for AI Agents
In `@src/ui/src/App.tsx` around lines 1182 - 1193, The current polling loop marks
done = true and calls stop() in the catch of the optimization fetch, which
permanently halts polling on any error; change this to implement a retry budget:
add a retry counter (e.g., retryCount or retryBudget) scoped alongside
inFlight/cancelled, decrement/increment on each catch, setOptimizationError with
the error but only set done = true and call stop() when the retry budget is
exhausted or the error is a terminal error type; reset retry budget on a
successful fetch and ensure inFlight is cleared in finally as before so
transient network hiccups won't permanently kill polling in functions/variables
like setOptimizationError, done, stop, inFlight, and cancelled.
Summary
Test plan
deno task dev— verify UI loads, flow diagrams render, optimization starts/tracksnpx playwright test tests/e2e/optimization-flow.spec.ts— live e2e passes🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Breaking / API-affecting UI
Bug Fixes
Tests / CI
Chores