Skip to content

Conversation

@venikman
Copy link
Owner

@venikman venikman commented Feb 9, 2026

Summary

  • Schema relaxation: acceptance criteria min 2→1, ADO fields optional, EPICS_LIMIT for fast test runs, generation guardrails
  • In-memory KV store: replace Deno KV with Maps — removes runtime dependency, async interfaces preserved
  • Optimization flow UI: step tracking diagrams, real-time progress panel, visibility-aware polling, null-safety
  • E2e + poll: live-server e2e test (replaces mocks), configurable poll judges via env vars, scorer null-safety

Test plan

  • deno task dev — verify UI loads, flow diagrams render, optimization starts/tracks
  • npx playwright test tests/e2e/optimization-flow.spec.ts — live e2e passes
  • Verify production defaults unchanged (EVAL_REPLICATES=5, OPT_PATCH_CANDIDATES=10, OPT_ITERATIONS=10)
  • Confirm stories with 1 acceptance criterion and no ADO fields pass schema validation

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Flow visualizations for playground and optimization with live progress
    • Runtime .env loading and environment-driven LLM/config options
    • Optional EPICS_LIMIT to limit epics loaded
  • Breaking / API-affecting UI

    • ADO/story fields made optional; acceptance criteria minimum reduced to 1
  • Bug Fixes

    • Defensive handling when analysis data is unavailable in scoring
  • Tests / CI

    • End-to-end optimization flow test; deployment includes built UI artifacts
  • Chores

    • Server state switched to in-memory store; added UI font/motion packages

Stanislau Niadbailau and others added 4 commits February 8, 2026 19:50
- Add EPICS_LIMIT env var to cap loaded epics for fast test runs
- Lower acceptance criteria minimum from 2 to 1
- Make ADO fields optional throughout schema, types, and prompts
- Add SCHEMA_GUARDRAILS to generator for compact, valid output
- Update champion prompts to reflect relaxed constraints

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove persistent Deno KV dependency; use in-memory Maps for task,
session, and checkpoint storage. All async interfaces preserved for
future swap to any persistent backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add flow diagram components (playground and optimization flows)
- Add OptimizationKickoff panel with real-time step/iteration tracking
- Visibility-aware polling (pause when tab is hidden)
- Null-safety for optional ADO fields in story cards
- Add motion, shadcn, noto-sans UI dependencies

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite e2e test from mocked to live-server validation
- Add env-based playwright config with .env file parsing
- Make poll judges configurable via env vars (model, count, temps)
- Add null-safety checks in scorer for missing analysis results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 9, 2026 00:55
@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

Adds EPICS_LIMIT config and .env loading in Playwright, relaxes story schema/ADO requirements, adds prompt schema guardrails, replaces Deno KV with an in-memory Map store, introduces dynamic judge generation and scorer label guards, and implements UI flow visualizations plus an optimization e2e test.

Changes

Cohort / File(s) Summary
Env & Playwright
\.env.example, playwright.config.ts
Add EPICS_LIMIT example; load .env at runtime and compute LLM_* env vars and feature flags injected into webServer.env.
Config & Server
src/config.ts, src/server/handler.ts
Add EPICS_LIMIT to EnvSchema (coerced int 1–100, optional); server now slices loaded epics when limit set.
Schema & Types & Tests
src/schema.ts, src/ui/src/types.ts, tests/schema.test.ts
Make ADO fields and ado optional; reduce userStory.acceptanceCriteria min from 2 to 1; update/extend schema tests for new boundaries.
Prompts & Generator
prompts/champion.base.md, prompts/champion.md, src/generator.ts
Loosen acceptanceCriteria requirement; add compact-output guidance and Azure DevOps mapping; introduce SCHEMA_GUARDRAILS and use guardedInstructions for LM calls.
Judges & Scoring
src/fpf/poll.ts, src/scorer.ts
Replace hardcoded DEFAULT_JUDGES with env-driven buildDefaultJudges (model, base temp, spread, count); add clamped temps and defensive scoring/labels when analysis absent.
State Store
src/orchestrator/state/kv-store.ts
Migrate from Deno KV to in-memory Map-backed store with indices (tasksByStatus, checkpointsBySession); rewrite CRUD/cleanup to operate on Maps while preserving public API.
UI & Frontend
src/ui/package.json, src/ui/src/App.tsx
Add UI deps (@fontsource/noto-sans, motion, dev shadcn); add flow visualization types/components (playground & optimization), OptimizationKickoff, FlowDiagram, telemetry/polling improvements, and various UI state changes.
E2E Test
tests/e2e/optimization-flow.spec.ts
Add Playwright test for optimization flow: fills form, starts optimization, asserts running state and completion within timeouts.
CI / Deploy
.github/workflows/deploy-deno.yml, .github/workflows/deploy-staging.yml
Add --include=src/ui/dist to deployctl deploy so UI distribution is included.
Misc / Imports
deno.json, src/ui/package.json
Add import map entry for @std/dotenv; add UI dependencies in src/ui/package.json.

Sequence Diagram(s)

sequenceDiagram
    participant Client as UI Client
    participant Server as API Server
    participant Store as In-Memory Store
    participant Generator as LM/Generator

    Client->>Server: POST /optimize (iterations, replicates, candidates)
    Server->>Store: createTask(payload)
    Store-->>Server: taskId (PENDING)
    Server-->>Client: 202 Accepted (taskId)

    loop Poll interval
        Client->>Server: GET /task/{taskId}
        Server->>Store: getTask(taskId)
        Store-->>Server: task state + progress
        Server-->>Client: task status + progress

        alt status = RUNNING
            Server->>Generator: request evaluation / candidate generation
            Generator-->>Server: results / updates
            Server->>Store: updateTaskProgress / updateTaskStatus
        else status = COMPLETED or FAILED
            Server->>Store: finalize task
            Client->>Client: stop polling
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 A hop through configs, a quick little cheer—
EPICS now counted, a limit appears.
Maps hold the tasks where KV once stood,
Guardrails keep JSON tidy and good.
Flows bloom in the UI, optimization near! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures three primary changes in the changeset: optimization flow UI, schema relaxation, and in-memory KV store migration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/include-ui-dist-deploy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @venikman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the system's flexibility and user experience. It relaxes schema constraints for user stories, making them more adaptable, and replaces the persistent Deno KV store with a lightweight in-memory solution for reduced runtime dependencies. The user interface for the optimization flow has been greatly enhanced with visual progress tracking and real-time updates, improving observability. Additionally, the changes include robust E2E testing and configurable polling mechanisms, contributing to a more stable and testable application.

Highlights

  • Schema Relaxation: Reduced the minimum acceptance criteria for user stories from 2 to 1, made Azure DevOps (ADO) fields optional, and introduced an EPICS_LIMIT environment variable for faster test runs.
  • In-Memory KV Store: Replaced the Deno KV-based state store with in-memory Maps, removing a runtime dependency while preserving async interfaces.
  • Optimization Flow UI Enhancements: Introduced step tracking diagrams, a real-time progress panel, and visibility-aware polling for optimization tasks, along with null-safety improvements.
  • E2E Testing & Polling: Added a live-server end-to-end test for the optimization flow, made poll judges configurable via environment variables, and improved null-safety in the scorer.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .env.example
    • Added EPICS_LIMIT for controlling the number of epics loaded.
  • playwright.config.ts
    • Introduced environment variable parsing for Playwright tests, allowing configuration of LLM settings and optimization parameters (e.g., EPICS_LIMIT, POLL_ENABLED, OPT_ITERATIONS).
  • prompts/champion.base.md
    • Updated prompt rules to require at least 1 acceptance criterion (down from 2).
    • Explicitly noted Azure DevOps mapping as optional.
    • Added a rule to keep output compact.
  • prompts/champion.md
    • Updated prompt rules to require at least 1 acceptance criterion (down from 2).
    • Explicitly noted Azure DevOps mapping as optional.
    • Added a rule to keep output compact.
  • src/config.ts
    • Added EPICS_LIMIT to the environment variable schema with validation.
  • src/fpf/poll.ts
    • Refactored judge configuration to be dynamic, allowing the number of judges, model, and temperature spread to be controlled via environment variables.
  • src/generator.ts
    • Enhanced story generation with SCHEMA_GUARDRAILS to enforce schema compliance, compactness, minimum acceptance criteria, and brief ADO field values.
  • src/orchestrator/state/kv-store.ts
    • Migrated the state management from Deno KV to in-memory Maps, simplifying the storage layer and removing the Deno KV runtime dependency.
  • src/schema.ts
    • Modified adoFieldsSchema to make all fields partial (optional).
    • Made the ado object in userStorySchema optional.
    • Reduced acceptanceCriteria minimum length to 1.
  • src/scorer.ts
    • Implemented null-safety checks for analyzeStepResult to prevent errors when analysis data might be unavailable, affecting heuristic score calculation and reason string generation.
  • src/server/handler.ts
    • Applied the EPICS_LIMIT environment variable to restrict the number of epics loaded for evaluation.
  • src/ui/package.json
    • Added new UI dependencies: @fontsource/noto-sans, motion, and shadcn.
  • src/ui/src/App.tsx
    • Introduced comprehensive UI components for visualizing optimization flow steps, including icons, detailed cards, and a real-time progress panel.
    • Implemented visibility-aware polling for both telemetry and optimization tasks to reduce unnecessary background activity.
    • Updated StoryCard to gracefully handle optional ADO fields and acceptance criteria.
  • src/ui/src/types.ts
    • Adjusted types to reflect optional AdoFields properties and an optional ado object within UserStory.
  • tests/e2e/optimization-flow.spec.ts
    • Added a new end-to-end test to verify the optimization flow UI completes successfully with minimal iterations.
  • tests/schema.test.ts
    • Updated the userStorySchema test to assert rejection for empty acceptance criteria, aligning with the new minimum of 1.
Activity
  • The pull request was generated using Claude Code, indicating an AI-assisted development process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant set of improvements across the application, focusing on UI, state management, and schema flexibility. The replacement of Deno KV with an in-memory key-value store is a major architectural change that is well-implemented to remove runtime dependencies, with the async API contract thoughtfully preserved. The UI for the optimization flow has been substantially enhanced with new diagrams and a real-time progress panel, and the underlying polling logic has been made more efficient and robust by using the Page Visibility API and preventing concurrent requests. Schema relaxation for user stories and the addition of a new end-to-end test for the optimization flow are also welcome changes that improve flexibility and test coverage. The overall code quality is high, with notable improvements in null-safety and configurability throughout the codebase. I have a couple of suggestions to further improve maintainability.

Comment on lines 3 to 26
const readEnvFile = () => {
try {
const raw = Deno.readTextFileSync(".env");
const entries: Record<string, string> = {};
for (const line of raw.split(/\r?\n/)) {
const trimmed = line.trim();
if (!trimmed || trimmed.startsWith("#")) continue;
const eq = trimmed.indexOf("=");
if (eq === -1) continue;
const key = trimmed.slice(0, eq).trim();
let value = trimmed.slice(eq + 1).trim();
if (
(value.startsWith('"') && value.endsWith('"')) ||
(value.startsWith("'") && value.endsWith("'"))
) {
value = value.slice(1, -1);
}
entries[key] = value;
}
return entries;
} catch {
return {};
}
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This custom readEnvFile function re-implements logic that is already available in the Deno standard library's dotenv module, which is used elsewhere in the project. To reduce code duplication and improve maintainability, you can replace this implementation with a call to the parse function from @std/dotenv. You will need to add import { parse } from "@std/dotenv"; to the top of the file.

This also improves error handling by only catching NotFound errors and re-throwing others, which is safer than the current implementation that swallows all errors.

const readEnvFile = () => {
  try {
    const raw = Deno.readTextFileSync(".env");
    return parse(raw);
  } catch (e) {
    if (e instanceof Deno.errors.NotFound) {
      return {};
    }
    throw e;
  }
};

Comment on lines 1 to 6
/**
* Deno KV State Store
* In-memory State Store
*
* Persistent storage for tasks and session checkpoints.
* Replaces in-memory Maps for production reliability.
*
* Key schema:
* - ["tasks", taskId] → TaskRecord
* - ["tasks:by-status", status, taskId] → taskId (index)
* - ["optimization-tasks", taskId] → OptimizationTask
* - ["checkpoints", checkpointId] → SessionCheckpoint
* - ["checkpoints:by-session", sessionId, checkpointId] → checkpointId (index)
* Simple, non-persistent storage for tasks and session checkpoints.
* This replaces the Deno KV-backed store to avoid runtime dependencies.
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Now that this module provides an in-memory store instead of a Deno KV-backed one, the filename kv-store.ts and the main export kvStore have become misnomers. For better clarity and long-term maintainability, consider renaming the file to state-store.ts and the kvStore export to stateStore to more accurately reflect its purpose.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the prompt-optimization product across schema validation, server/runtime infrastructure, UI observability, and E2E coverage—aiming to make runs faster to test locally while improving the optimization UX and removing the Deno KV dependency.

Changes:

  • Relaxed story schema requirements (acceptance criteria min 2→1; ADO metadata optional) and updated prompt guardrails accordingly.
  • Replaced the orchestrator’s Deno KV-backed state store with an in-memory Map-based implementation (async API preserved).
  • Expanded the UI optimization flow experience (step diagrams + richer progress panel + visibility-aware polling) and added a live-server Playwright E2E for optimization.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/schema.test.ts Updates schema tests for the new acceptance-criteria minimum.
tests/e2e/optimization-flow.spec.ts Adds a UI-driven optimization E2E test against the live server.
src/ui/src/types.ts Makes ado and key ADO fields optional to match relaxed schema.
src/ui/src/App.tsx Adds flow diagrams, richer optimization progress UI, and visibility-aware polling / null-safety.
src/ui/package.json Adds UI dependencies needed by styling/animation updates.
src/server/handler.ts Adds EPICS_LIMIT slicing for faster local/E2E runs.
src/scorer.ts Makes scorer resilient to missing analysis step output (null-safe scoring + labeling).
src/schema.ts Relaxes user story schema (min acceptance criteria 1; ADO optional; ADO fields partial).
src/orchestrator/state/kv-store.ts Replaces Deno KV persistence with in-memory Maps for task/checkpoint storage.
src/generator.ts Adds explicit schema guardrails to generation instructions and surfaces them in traces/results.
src/fpf/poll.ts Generates default PoLL judges from env-configured count/temps instead of hardcoding.
src/config.ts Adds EPICS_LIMIT env var validation.
prompts/champion.md Updates prompt rules to match schema relaxation + compactness guidance.
prompts/champion.base.md Same prompt rule updates as champion.md.
playwright.config.ts Loads env (including from .env) and configures webServer env for faster E2E runs.
.env.example Documents EPICS_LIMIT for fast test runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 386 to 394
const cycleStepStyle = (index: number, total: number, radius: number) => {
const angle = (Math.PI * 2 * index) / total - Math.PI / 2;
const x = Math.cos(angle) * radius;
const y = Math.sin(angle) * radius;
return {
transform: `translate(-50%, -50%) translate(${x}px, ${y}px)`,
};
};

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cycleStepStyle is defined but never used, which leaves dead code in App.tsx. Either remove it or wire it into the cycle layout rendering so it has a purpose.

Suggested change
const cycleStepStyle = (index: number, total: number, radius: number) => {
const angle = (Math.PI * 2 * index) / total - Math.PI / 2;
const x = Math.cos(angle) * radius;
const y = Math.sin(angle) * radius;
return {
transform: `translate(-50%, -50%) translate(${x}px, ${y}px)`,
};
};

Copilot uses AI. Check for mistakes.
Comment on lines 441 to 445
const FlowExplanationPanel = (
{ explanation }: { explanation: FlowExplanation },
) => (
<Artifact className="rounded-none border border-border bg-card shadow-sm">
<ArtifactHeader className="items-start gap-3 border-b border-border bg-muted/40">
<div className="space-y-1">
<ArtifactTitle className="text-base font-semibold text-foreground">
{index + 1}. {story.title}
</ArtifactTitle>
<ArtifactDescription className="text-sm leading-relaxed text-muted-foreground">
As a {story.asA}, I want {story.iWant} so that {story.soThat}.
</ArtifactDescription>
<div className="rounded-none border border-border bg-muted/30 p-4 text-sm">
<p className="text-xs uppercase tracking-[0.3em] text-muted-foreground">
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlowExplanationPanel is declared here but never referenced. If flow explanations are intended, render this from FlowDiagramCard when diagram.explanation is present; otherwise remove the unused component/types to keep App.tsx smaller.

Copilot uses AI. Check for mistakes.
Comment on lines +132 to 136
Deno.test("userStorySchema - rejects story with empty acceptance criteria", () => {
const story = {
title: "Add login functionality",
asA: "user",
iWant: "to login",
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema behavior was relaxed (acceptanceCriteria min 2→1, ado optional, ADO fields optional), but the tests only cover the new rejection case (empty acceptanceCriteria). Add positive coverage for a story with exactly 1 acceptance criterion and for omitting ado / providing partial ado.fields so the relaxed contract can’t regress silently.

Copilot uses AI. Check for mistakes.
Comment on lines 3 to 6
const readEnvFile = () => {
try {
const raw = Deno.readTextFileSync(".env");
const entries: Record<string, string> = {};
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config reads .env and environment variables via Deno.* APIs. If someone runs the E2E tests via npx playwright test (Node runtime, as suggested in the PR description), this will throw because Deno is undefined—either update the documented command to the Deno-based runner (deno task test:e2e) or refactor env loading to be Node-compatible.

Copilot uses AI. Check for mistakes.
/**
* Save a task to the store.
*/
export async function saveTask(task: TaskRecord): Promise<void> {
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saveTask() always adds the task id to the status index, but it never removes a prior index entry if the same task id is saved again with a different status. That can leave stale ids in tasksByStatus (and make listTasksByStatus() return tasks whose status no longer matches the queried status). Consider de-indexing any existing task record for the same id before re-indexing, or make saveTask private and force all status changes through updateTaskStatus.

Suggested change
export async function saveTask(task: TaskRecord): Promise<void> {
export async function saveTask(task: TaskRecord): Promise<void> {
const existing = tasks.get(task.id);
if (existing) {
deindexTask(existing);
}

Copilot uses AI. Check for mistakes.
Without --include=src/ui/dist, deployctl excludes the built UI
and the app returns 503 at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f94fdfc145

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +32 to +35
const llmBaseUrl = getEnv("LLM_BASE_URL", "https://openrouter.ai/api/v1");
const llmApiKey = getEnv("LLM_API_KEY");
const llmModel = getEnv("LLM_MODEL", "openai/gpt-4o-mini");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid defaulting e2e LLM to OpenRouter without a key

The Playwright webServer now defaults LLM_BASE_URL to OpenRouter while leaving LLM_API_KEY empty when no .env/env vars are set. In a fresh checkout this makes the server pick an external base URL with no credentials, so any LLM call in the optimization flow will fail with 401s and the new e2e test will hang/fail. Consider falling back to the previous local LM Studio defaults unless LLM_* is explicitly provided, or require a non-empty key before using OpenRouter.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@playwright.config.ts`:
- Around line 29-34: The code uses getEnv to populate llmApiKey which currently
defaults to an empty string when unset, causing silent failures; update the
startup logic after llmApiKey is read to detect an empty or missing key and
either throw an error or log a clear warning and exit when llmBaseUrl is not
localhost (e.g., check that llmBaseUrl does not start with "http://localhost" or
"http://127.0.0.1"); reference the symbols getEnv, llmApiKey, and llmBaseUrl and
implement the guard immediately after those constants are set so remote
providers fail fast or emit a visible warning instead of passing an empty
LLM_API_KEY to the server.

In `@src/orchestrator/state/kv-store.ts`:
- Around line 88-90: getTask currently returns the live TaskRecord stored in the
tasks Map so callers can mutate it and silently corrupt the tasksByStatus index;
change getTask to return a shallow copy of the stored record (e.g., clone the
object before returning) so callers cannot mutate the internal map entry, and
apply the same defensive-return pattern to analogous getters
(getOptimizationTask, getCheckpoint, getLatestCheckpoint) if they also return
objects that callers might mutate; ensure listTasksByStatus and tasksByStatus
keep using the internal objects while external callers always get copies.
- Around line 83-90: CI fails because saveTask and getTask are declared async
but contain no await; add a Deno lint suppression to satisfy the require-await
rule while keeping the API async: either add a file-level directive (e.g., a
single "// deno-lint-ignore-file require-await" near the top of the file) or add
per-function ignores immediately above the async functions (e.g., "//
deno-lint-ignore require-await" before the saveTask and getTask declarations) so
saveTask and getTask remain async but the linter stops erroring.
- Around line 1-6: Add unit tests for the in-memory KV store
(src/orchestrator/state/kv-store.ts) exercising task lifecycle and index
bookkeeping: create tasks, transition their status via whatever API exists
(e.g., createTask/updateTaskStatus or TaskStore methods), and assert
tasksByStatus indices and listTasksByStatus return correct sets after each
transition; add tests for checkpoints (checkpointsBySession) including creating
multiple checkpoints with different createdAt values and asserting retrieval is
sorted by createdAt and cleanupOldCheckpoints removes only the expected entries;
add tests for cleanupOldTasks to remove tasks older than threshold while leaving
current ones and handle edge cases such as empty stores and missing
cross-references (e.g., tasks referenced in indices but not in main store);
include assertions that indices are kept in sync after deletions and status
changes and cover boundary conditions (no-op cleanups).

In `@src/ui/src/App.tsx`:
- Around line 386-393: Remove the unused helper function cycleStepStyle from
App.tsx: locate the function declaration "const cycleStepStyle = (index: number,
total: number, radius: number) => { ... }" and delete it along with its return
block since FlowDiagramCard uses CSS grid for cycle layout and this trig-based
positioning is dead code; ensure no other references to cycleStepStyle remain
and run a quick compile to confirm no usages are left.
🧹 Nitpick comments (15)
src/scorer.ts (3)

187-190: Consider using optional chaining instead of an intermediate boolean for null-safety.

TypeScript cannot narrow a through the analyzeAvailable boolean variable, so a.invest in analyzeAvailable ? a.invest : 0 won't type-narrow a to non-nullish. While this works at runtime, optional chaining is both TS-friendly and more concise:

♻️ Suggested simplification
-      const analyzeAvailable = !!a;
-      const investScore = analyzeAvailable ? a.invest : 0;
-      const criteriaScore = analyzeAvailable ? a.acceptanceCriteria : 0;
-      const duplicationScore = analyzeAvailable ? a.duplication : 0;
+      const investScore = a?.invest ?? 0;
+      const criteriaScore = a?.acceptanceCriteria ?? 0;
+      const duplicationScore = a?.duplication ?? 0;

You'd still need analyzeAvailable in generateReason for the labels, but there you could similarly use a?.invest?.toFixed(3) ?? "n/a" etc.


211-216: Silent score penalty when analysis is unavailable.

When analyzeStepResult is absent, 65% of the heuristic weight (invest + criteria + duplication) contributes zero, capping the heuristic score at ~0.35. This is reasonable as a degraded fallback, but be aware that intermittent analyze-step failures during optimization iterations could significantly skew scores downward without an obvious signal beyond the reason string.

If this is intentional, no action needed. If not, consider either re-normalizing the weights across available dimensions or returning a sentinel/skip to avoid polluting optimization with artificially low scores.


241-248: Same TS narrowing caveat applies here.

Same pattern as generateScore — the intermediate boolean doesn't narrow a for TypeScript. Optional chaining would be cleaner:

-      const investLabel = analyzeAvailable ? a.invest.toFixed(3) : "n/a";
-      const criteriaLabel = analyzeAvailable
-        ? a.acceptanceCriteria.toFixed(3)
-        : "n/a";
-      const duplicationLabel = analyzeAvailable
-        ? a.duplication.toFixed(3)
-        : "n/a";
+      const investLabel = a?.invest?.toFixed(3) ?? "n/a";
+      const criteriaLabel = a?.acceptanceCriteria?.toFixed(3) ?? "n/a";
+      const duplicationLabel = a?.duplication?.toFixed(3) ?? "n/a";
src/fpf/poll.ts (1)

1-5: Stale doc comment: "3 diverse judges" is no longer accurate.

Line 5 still says "3 diverse judges" but judge count is now runtime-configurable via POLL_NUM_JUDGES. Consider updating to reflect the dynamic nature.

src/orchestrator/state/kv-store.ts (2)

44-48: No reset mechanism for testing or between runs.

These module-level Maps accumulate state for the lifetime of the process. Consider exporting a resetStore() helper (even if only used in tests) to clear all five Maps, so tests can run in isolation without leaking state across suites.

♻️ Proposed helper
+/** Clear all in-memory state – intended for tests. */
+export function resetStore(): void {
+  tasks.clear();
+  tasksByStatus.clear();
+  optimizationTasks.clear();
+  checkpoints.clear();
+  checkpointsBySession.clear();
+}

206-244: Minor DRY opportunity: getLatestCheckpoint and listCheckpoints share identical filter-and-sort logic.

Consider extracting a shared helper, or have getLatestCheckpoint delegate to listCheckpoints:

♻️ Example
 export async function getLatestCheckpoint(
   sessionId: string,
 ): Promise<SessionCheckpoint | null> {
-  const ids = checkpointsBySession.get(sessionId);
-  if (!ids || ids.size === 0) return null;
-
-  const list = Array.from(ids)
-    .map((id) => checkpoints.get(id))
-    .filter((checkpoint): checkpoint is SessionCheckpoint =>
-      Boolean(checkpoint)
-    );
-
-  if (list.length === 0) return null;
-
-  list.sort((a, b) =>
-    new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime()
-  );
-
-  return list[0] ?? null;
+  const sorted = await listCheckpoints(sessionId);
+  return sorted[0] ?? null;
 }
tests/e2e/optimization-flow.spec.ts (1)

1-24: Live e2e test looks solid for the happy path.

The test correctly exercises the optimization flow lifecycle from the UI perspective, and the Playwright locators match the App.tsx label/text structure.

Two minor observations:

  1. No failure-path assertion: If the optimization fails server-side, the test silently waits the full 6 minutes before timing out. Consider adding a race with page.getByText("failed") to surface server errors sooner:
💡 Suggested improvement
- await expect(
-   page.getByText("completed", { exact: true }).first(),
- ).toBeVisible({ timeout: 360_000 });
+ const completed = page.getByText("completed", { exact: true }).first();
+ const failed = page.getByText("failed", { exact: true }).first();
+ const result = await Promise.race([
+   completed.waitFor({ state: "visible", timeout: 360_000 }).then(() => "completed"),
+   failed.waitFor({ state: "visible", timeout: 360_000 }).then(() => "failed"),
+ ]);
+ expect(result).toBe("completed");
  1. 360s timeout appears both as test-level and assertion-level: The test.setTimeout(360_000) at line 4 sets the overall test timeout, but the assertion at line 23 also has timeout: 360_000. If the 30s wait at line 16 succeeds, only ~330s remain for the test, while the assertion still waits up to 360s — the test-level timeout will fire first, producing a less helpful error. Consider setting the test timeout slightly higher than the sum of all assertion timeouts.
src/ui/src/App.tsx (3)

1079-1123: Visibility-aware telemetry polling is well implemented, but lacks an inFlight guard.

The optimization poller (lines 1203-1234) properly guards against overlapping polls with inFlight, but this telemetry poller does not. If a /telemetry fetch takes longer than TELEMETRY_POLL_MS (30s), polls will overlap. This is low-risk since telemetry is read-only and the last response wins via setTelemetry, but for consistency you could add the same inFlight guard.


62-306: Large amount of types, constants, and mapping logic defined inline in App.tsx.

This file is approaching ~1700 lines. The flow diagram types, step constants (PLAYGROUND_STEPS, OPTIMIZATION_STEPS, FLOW_DIAGRAMS), stage-resolution logic, and activity descriptions are all domain data that could live in a separate module (e.g., flow-config.ts). This would improve readability and testability of the mapping functions like resolveOptimizationStage.


1574-1579: event.currentTarget?.value — optional chaining is unnecessary here.

In a React onChange handler for an <input>, event.currentTarget is always defined. The ?. is harmless but misleading — it suggests currentTarget could be null, which isn't the case in this context.

tests/schema.test.ts (1)

132-153: Add a boundary test for exactly 1 acceptance criterion.

The test correctly validates that an empty array is rejected, but there's no positive test for the new lower bound — a story with exactly 1 acceptance criterion should now pass. This boundary case was previously implicitly covered (it was invalid at min 2), so it deserves explicit validation.

🧪 Suggested boundary test
+Deno.test("userStorySchema - accepts story with exactly 1 acceptance criterion", () => {
+  const story = {
+    title: "Single AC story",
+    asA: "user",
+    iWant: "to do something",
+    soThat: "I get value",
+    acceptanceCriteria: ["Given X, When Y, Then Z"],
+  };
+
+  const result = userStorySchema.safeParse(story);
+  assert(result.success, "Story with 1 acceptance criterion should be valid");
+});
prompts/champion.base.md (1)

1-26: Content is identical to prompts/champion.md.

If one is meant to be the immutable base and the other a working copy that evolves during optimization, consider documenting that convention. Otherwise, having two identical files risks silent drift over time.

src/schema.ts (2)

52-57: Schema relaxation looks correct; consider adding test coverage for the new optionality.

acceptanceCriteria: min(1) and ado: .optional() align with the prompt and config changes. However, the test file still provides ADO fields in every test case. Adding tests for ado omitted entirely and for a story with exactly 1 acceptance criterion (as mentioned in the test file review) would solidify coverage of these new boundaries.

🧪 Suggested test for optional ado
+Deno.test("userStorySchema - accepts story without ado field", () => {
+  const story = {
+    title: "No ADO story",
+    asA: "user",
+    iWant: "to do something",
+    soThat: "I get value",
+    acceptanceCriteria: ["Given X, When Y, Then Z"],
+  };
+
+  const result = userStorySchema.safeParse(story);
+  assert(result.success, "Story without ado should be valid");
+  assertEquals(result.data?.ado, undefined);
+});

26-39: Test coverage for optional ado scenarios would clarify intended behavior.

The .partial() design is intentional: the ado block is optional (line 57), and when included, fields within can be partial. Downstream code in src/ui/src/App.tsx safely handles this via optional chaining (story.ado?.fields?.["key"]), so empty fields: {} poses no safety risk. However, adding tests for edge cases — ado omitted entirely, ado with empty fields, and ado with partial fields — would explicitly document this flexibility and prevent future regressions.

playwright.config.ts (1)

3-26: Replace custom .env parser with @std/dotenv's loadSync to eliminate duplication and improve robustness.

The custom parser duplicates functionality already available in @std/dotenv, which is imported by src/config.ts. @std/dotenv's loadSync handles edge cases the current implementation misses: escape sequences in quoted values, variable expansion ($KEY, ${KEY:-default}), and export prefix support. Since playwright.config.ts operates independently and needs its own .env loader, migrate to the standard library function.

♻️ Suggested approach
-const readEnvFile = () => {
-  try {
-    const raw = Deno.readTextFileSync(".env");
-    const entries: Record<string, string> = {};
-    for (const line of raw.split(/\r?\n/)) {
-      const trimmed = line.trim();
-      if (!trimmed || trimmed.startsWith("#")) continue;
-      const eq = trimmed.indexOf("=");
-      if (eq === -1) continue;
-      const key = trimmed.slice(0, eq).trim();
-      let value = trimmed.slice(eq + 1).trim();
-      if (
-        (value.startsWith('"') && value.endsWith('"')) ||
-        (value.startsWith("'") && value.endsWith("'"))
-      ) {
-        value = value.slice(1, -1);
-      }
-      entries[key] = value;
-    }
-    return entries;
-  } catch {
-    return {};
-  }
-};
+import { loadSync } from "@std/dotenv";
+
+const readEnvFile = (): Record<string, string> => {
+  try {
+    return loadSync({ export: false });
+  } catch {
+    return {};
+  }
+};

Comment on lines 1 to 6
/**
* Deno KV State Store
* In-memory State Store
*
* Persistent storage for tasks and session checkpoints.
* Replaces in-memory Maps for production reliability.
*
* Key schema:
* - ["tasks", taskId] → TaskRecord
* - ["tasks:by-status", status, taskId] → taskId (index)
* - ["optimization-tasks", taskId] → OptimizationTask
* - ["checkpoints", checkpointId] → SessionCheckpoint
* - ["checkpoints:by-session", sessionId, checkpointId] → checkpointId (index)
* Simple, non-persistent storage for tasks and session checkpoints.
* This replaces the Deno KV-backed store to avoid runtime dependencies.
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for test files related to kv-store
echo "=== Searching for test files mentioning kv-store ==="
fd -type f \( -name '*test*' -o -name '*spec*' \) | xargs grep -l 'kv-store' 2>/dev/null || echo "No test files found mentioning kv-store"

echo ""
echo "=== Searching all files mentioning kv-store ==="
rg 'kv-store' --type ts --type tsx --type js --type jsx -l

echo ""
echo "=== Checking test directory structure ==="
find . -type d -name '__tests__' -o -name 'tests' -o -name 'test' 2>/dev/null | head -20

echo ""
echo "=== Listing all test files in repo ==="
fd -type f \( -name '*.test.ts' -o -name '*.test.tsx' -o -name '*.spec.ts' -o -name '*.spec.tsx' \) | head -30

Repository: venikman/PromptAgent

Length of output: 683


🏁 Script executed:

#!/bin/bash
# Search for kv-store.ts file
echo "=== Locating kv-store.ts ==="
fd 'kv-store' --extension ts

echo ""
echo "=== Test files in repo ==="
fd '\.test\.' --extension ts
fd '\.spec\.' --extension ts

echo ""
echo "=== Full content of kv-store.ts ==="
find . -name 'kv-store.ts' -exec cat {} \;

echo ""
echo "=== List tests directory ==="
ls -la ./tests/ 2>/dev/null || echo "Tests directory not accessible"

Repository: venikman/PromptAgent

Length of output: 9668


Add unit tests for the in-memory KV store implementation.

This file contains non-trivial logic with index bookkeeping (tasksByStatus, checkpointsBySession), status transition handling, and cleanup operations. Tests should cover:

  • Task lifecycle and status transitions (ensuring indices stay in sync)
  • listTasksByStatus correctness after status changes
  • Cleanup operations (cleanupOldTasks, cleanupOldCheckpoints) with edge cases (empty sets, missing references)
  • Checkpoint retrieval and sorting by createdAt

Per coding guidelines, unit tests are required for logic changes in TypeScript files.

🤖 Prompt for AI Agents
In `@src/orchestrator/state/kv-store.ts` around lines 1 - 6, Add unit tests for
the in-memory KV store (src/orchestrator/state/kv-store.ts) exercising task
lifecycle and index bookkeeping: create tasks, transition their status via
whatever API exists (e.g., createTask/updateTaskStatus or TaskStore methods),
and assert tasksByStatus indices and listTasksByStatus return correct sets after
each transition; add tests for checkpoints (checkpointsBySession) including
creating multiple checkpoints with different createdAt values and asserting
retrieval is sorted by createdAt and cleanupOldCheckpoints removes only the
expected entries; add tests for cleanupOldTasks to remove tasks older than
threshold while leaving current ones and handle edge cases such as empty stores
and missing cross-references (e.g., tasks referenced in indices but not in main
store); include assertions that indices are kept in sync after deletions and
status changes and cover boundary conditions (no-op cleanups).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/deploy-deno.yml (2)

9-14: 🛠️ Refactor suggestion | 🟠 Major

Production deployment lacks an environment approval gate.

This workflow deploys to production (--prod) on every push to main, but the job has no environment: key, so GitHub Environment protection rules (required reviewers, wait timers) cannot be enforced. Adding one is straightforward:

Proposed fix
   deploy:
     runs-on: ubuntu-latest
+    environment: production
     permissions:
       contents: read
       id-token: write

As per coding guidelines, "Require environment approvals for production deployments".


31-31: ⚠️ Potential issue | 🟡 Minor

Pin @deno/deployctl to a specific version for reproducible deployments.

jsr:@deno/deployctl`` without a version pin will resolve to the latest version at deploy time, potentially introducing breaking changes silently.

Example
-      - name: Install deployctl
-        run: deno install -A --global jsr:`@deno/deployctl`
+      - name: Install deployctl
+        run: deno install -A --global jsr:`@deno/deployctl`@1.13.1

- Fix CI: add require-await lint suppression in kv-store.ts
- Fix saveTask stale index: de-index existing task before re-indexing
- Return shallow copies from getTask and other getters to prevent
  callers from corrupting internal state indices
- Remove unused cycleStepStyle and FlowExplanationPanel from App.tsx
- Use optional chaining in scorer.ts instead of intermediate boolean
- Replace custom .env parser with @std/dotenv parse in playwright config
- Add LLM API key guard warning for remote providers without key
- Add boundary tests for schema relaxation (1 AC, no ado, partial ado)
- Update stale doc comment in poll.ts (N judges, not 3)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/orchestrator/state/kv-store.ts`:
- Around line 148-156: listTasksByStatus currently returns live TaskRecord
objects from the internal tasks Map, letting callers mutate fields (e.g.,
.status) and corrupt tasksByStatus; fix listTasksByStatus to return defensive
copies (shallow clones) of each task like getTask does instead of the original
references—when mapping ids to records use a clone (e.g., spread {...task}) for
each non-null entry so callers cannot mutate internal state; keep filters/type
guard the same and reference the tasks and tasksByStatus collections and the
listTasksByStatus function when making the change.
- Around line 237-254: The listCheckpoints function is returning live references
(unlike getCheckpoint/getLatestCheckpoint); update listCheckpoints to return
defensive copies of each SessionCheckpoint instead of the original objects
retrieved from the checkpoints map — locate the code that builds list using
checkpointsBySession and checkpoints, and when mapping/filtering the values (the
Array.from(ids).map(...).filter(...)), replace the object references with
shallow clones (e.g., create a new object per checkpoint) so callers receive
copies rather than the original entries from the checkpoints map.

In `@src/ui/src/App.tsx`:
- Around line 1182-1193: The current polling loop marks done = true and calls
stop() in the catch of the optimization fetch, which permanently halts polling
on any error; change this to implement a retry budget: add a retry counter
(e.g., retryCount or retryBudget) scoped alongside inFlight/cancelled,
decrement/increment on each catch, setOptimizationError with the error but only
set done = true and call stop() when the retry budget is exhausted or the error
is a terminal error type; reset retry budget on a successful fetch and ensure
inFlight is cleared in finally as before so transient network hiccups won't
permanently kill polling in functions/variables like setOptimizationError, done,
stop, inFlight, and cancelled.
🧹 Nitpick comments (7)
src/scorer.ts (1)

187-189: Score silently degrades when analysis is unavailable — consider documenting or adjusting weights.

When analyzeStepResult is undefined, investScore, criteriaScore, and duplicationScore all fall back to 0, which zeroes out 65% of HEURISTIC_WEIGHTS (0.25 + 0.25 + 0.15). The final heuristic score is then driven only by coverage and count (35%). This is a valid graceful degradation, but it could surprise callers who don't inspect the reason string.

A brief inline comment near Line 187 noting this trade-off (e.g., "analysis unavailable → 65% of heuristic weight zeroed") would help future maintainers understand the intent.

Also applies to: 211-215

playwright.config.ts (1)

49-59: Spreading the full process environment into webServer.env.

Deno.env.toObject() forwards every variable from the parent process. This is fine for local dev/CI, but be mindful that it could leak sensitive variables (e.g., unrelated API keys, tokens) into the spawned test server process. Since this is a Playwright test config and not a production deployment path, the risk is low.

src/ui/src/App.tsx (4)

1036-1080: Telemetry polling lacks an in-flight guard, unlike optimization polling.

The optimization polling effect (Line 1160) has inFlight protection to prevent overlapping requests. This telemetry poller doesn't, so if a /telemetry fetch takes longer than TELEMETRY_POLL_MS, concurrent requests can pile up. Low risk for a GET endpoint, but worth aligning for consistency.

♻️ Add in-flight guard to telemetry polling
   useEffect(() => {
     let cancelled = false;
     let intervalId: number | null = null;
+    let inFlight = false;
     const poll = async () => {
+      if (inFlight) return;
+      inFlight = true;
       try {
         const res = await fetch("/telemetry");
         if (!res.ok) return;
         const data = await readJson<TelemetrySnapshot>(res);
         if (!cancelled) {
           setTelemetry(data);
         }
       } catch {
         return;
+      } finally {
+        inFlight = false;
       }
     };

1299-1305: Filtering a static array for a known single item.

Both sections filter FLOW_DIAGRAMS by a known id and then .map(). Since you know the exact ids at compile time, you could reference the diagrams directly or use .find(). This is a very minor readability nit.

Also applies to: 1684-1690


62-327: Consider extracting flow diagram types and constants to a separate module.

This file now contains ~270 lines of type definitions, constant step arrays, lookup maps, and helper functions for flow diagrams before any React component code begins. Extracting these into a dedicated file (e.g., flow-diagrams.ts) would improve readability and make the types reusable.


613-797: OptimizationKickoff is a substantial presentational component (~185 lines).

This component, along with FlowStepCard, FlowDiagramCard, and StoryCard, could live in their own files under a components/ directory. This would reduce App.tsx from its current size and make individual components easier to test.

src/orchestrator/state/kv-store.ts (1)

85-92: saveTask stores a live reference — consider storing a defensive copy.

getTask correctly returns { ...task }, but saveTask stores the caller's object directly. If the caller retains and later mutates it (e.g., changes .status), the in-memory record changes without updating tasksByStatus, silently corrupting the index.

🛡️ Proposed fix
 export async function saveTask(task: TaskRecord): Promise<void> {
   const existing = tasks.get(task.id);
   if (existing) {
     deindexTask(existing);
   }
-  tasks.set(task.id, task);
-  indexTask(task);
+  const copy = { ...task };
+  tasks.set(copy.id, copy);
+  indexTask(copy);
 }

Same concern applies to saveOptimizationTask on Line 176.

Comment on lines 148 to 156
export async function listTasksByStatus(
status: TaskStatus,
): Promise<TaskRecord[]> {
const kv = await getKv();
const tasks: TaskRecord[] = [];

const entries = kv.list<string>({ prefix: ["tasks:by-status", status] });
for await (const entry of entries) {
const task = await getTask(entry.value);
if (task) tasks.push(task);
}

return tasks;
const ids = tasksByStatus.get(status);
if (!ids) return [];
return Array.from(ids)
.map((id) => tasks.get(id))
.filter((task): task is TaskRecord => Boolean(task));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

listTasksByStatus returns live references — callers can corrupt indices.

getTask was fixed to return { ...task }, but this list function hands out the actual objects from the internal Map. A caller mutating .status on any returned record silently desyncs tasksByStatus.

🛡️ Proposed fix
   return Array.from(ids)
     .map((id) => tasks.get(id))
-    .filter((task): task is TaskRecord => Boolean(task));
+    .filter((task): task is TaskRecord => Boolean(task))
+    .map((task) => ({ ...task }));
 }
🤖 Prompt for AI Agents
In `@src/orchestrator/state/kv-store.ts` around lines 148 - 156, listTasksByStatus
currently returns live TaskRecord objects from the internal tasks Map, letting
callers mutate fields (e.g., .status) and corrupt tasksByStatus; fix
listTasksByStatus to return defensive copies (shallow clones) of each task like
getTask does instead of the original references—when mapping ids to records use
a clone (e.g., spread {...task}) for each non-null entry so callers cannot
mutate internal state; keep filters/type guard the same and reference the tasks
and tasksByStatus collections and the listTasksByStatus function when making the
change.

Comment on lines 237 to 254
export async function listCheckpoints(
sessionId: string,
): Promise<SessionCheckpoint[]> {
const kv = await getKv();
const checkpoints: SessionCheckpoint[] = [];
const ids = checkpointsBySession.get(sessionId);
if (!ids || ids.size === 0) return [];

const entries = kv.list<string>({
prefix: ["checkpoints:by-session", sessionId],
});

for await (const entry of entries) {
const checkpoint = await getCheckpoint(entry.value);
if (checkpoint) checkpoints.push(checkpoint);
}
const list = Array.from(ids)
.map((id) => checkpoints.get(id))
.filter((checkpoint): checkpoint is SessionCheckpoint =>
Boolean(checkpoint)
);

// Sort by creation time (newest first)
checkpoints.sort((a, b) =>
list.sort((a, b) =>
new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime()
);

return checkpoints;
return list;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

listCheckpoints also returns live references — inconsistent with getCheckpoint/getLatestCheckpoint.

Same pattern gap as listTasksByStatus. Apply the defensive copy for consistency.

🛡️ Proposed fix
   list.sort((a, b) =>
     new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime()
   );

-  return list;
+  return list.map((cp) => ({ ...cp }));
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function listCheckpoints(
sessionId: string,
): Promise<SessionCheckpoint[]> {
const kv = await getKv();
const checkpoints: SessionCheckpoint[] = [];
const ids = checkpointsBySession.get(sessionId);
if (!ids || ids.size === 0) return [];
const entries = kv.list<string>({
prefix: ["checkpoints:by-session", sessionId],
});
for await (const entry of entries) {
const checkpoint = await getCheckpoint(entry.value);
if (checkpoint) checkpoints.push(checkpoint);
}
const list = Array.from(ids)
.map((id) => checkpoints.get(id))
.filter((checkpoint): checkpoint is SessionCheckpoint =>
Boolean(checkpoint)
);
// Sort by creation time (newest first)
checkpoints.sort((a, b) =>
list.sort((a, b) =>
new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime()
);
return checkpoints;
return list;
}
export async function listCheckpoints(
sessionId: string,
): Promise<SessionCheckpoint[]> {
const ids = checkpointsBySession.get(sessionId);
if (!ids || ids.size === 0) return [];
const list = Array.from(ids)
.map((id) => checkpoints.get(id))
.filter((checkpoint): checkpoint is SessionCheckpoint =>
Boolean(checkpoint)
);
list.sort((a, b) =>
new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime()
);
return list.map((cp) => ({ ...cp }));
}
🤖 Prompt for AI Agents
In `@src/orchestrator/state/kv-store.ts` around lines 237 - 254, The
listCheckpoints function is returning live references (unlike
getCheckpoint/getLatestCheckpoint); update listCheckpoints to return defensive
copies of each SessionCheckpoint instead of the original objects retrieved from
the checkpoints map — locate the code that builds list using
checkpointsBySession and checkpoints, and when mapping/filtering the values (the
Array.from(ids).map(...).filter(...)), replace the object references with
shallow clones (e.g., create a new object per checkpoint) so callers receive
copies rather than the original entries from the checkpoints map.

Comment on lines 1182 to +1193
} catch (err) {
if (!cancelled) {
setOptimizationError(
err instanceof Error ? err.message : String(err),
);
}
if (intervalId) clearInterval(intervalId);
done = true;
stop();
} finally {
inFlight = false;
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Optimization polling stops permanently on any fetch error.

Setting done = true in the catch block (Line 1188) means a single transient network hiccup (e.g., brief connectivity loss) permanently kills polling. The user would have to re-trigger the optimization to see updates again.

Consider adding a retry budget or only treating certain errors as terminal.

🛡️ Suggested approach: retry budget
     let inFlight = false;
     let done = false;
+    let errorCount = 0;
+    const MAX_POLL_ERRORS = 3;

     const poll = async () => {
       if (inFlight || done) return;
       inFlight = true;
       try {
         const res = await fetch(`/v3/optimize/${optimizationTask.taskId}`);
         const data = await readJson<OptimizationTask & { error?: string }>(res);

         if (!res.ok) {
           throw new Error(data.error || `HTTP ${res.status}`);
         }

         if (!cancelled) {
           setOptimizationTask(data);
         }
+        errorCount = 0;

         if (data.status === "completed" || data.status === "failed") {
           done = true;
           stop();
         }
       } catch (err) {
-        if (!cancelled) {
-          setOptimizationError(
-            err instanceof Error ? err.message : String(err),
-          );
+        errorCount += 1;
+        if (errorCount >= MAX_POLL_ERRORS) {
+          if (!cancelled) {
+            setOptimizationError(
+              err instanceof Error ? err.message : String(err),
+            );
+          }
+          done = true;
+          stop();
         }
-        done = true;
-        stop();
       } finally {
         inFlight = false;
       }
🤖 Prompt for AI Agents
In `@src/ui/src/App.tsx` around lines 1182 - 1193, The current polling loop marks
done = true and calls stop() in the catch of the optimization fetch, which
permanently halts polling on any error; change this to implement a retry budget:
add a retry counter (e.g., retryCount or retryBudget) scoped alongside
inFlight/cancelled, decrement/increment on each catch, setOptimizationError with
the error but only set done = true and call stop() when the retry budget is
exhausted or the error is a terminal error type; reset retry budget on a
successful fetch and ensure inFlight is cleared in finally as before so
transient network hiccups won't permanently kill polling in functions/variables
like setOptimizationError, done, stop, inFlight, and cancelled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant