diff --git a/README.md b/README.md index 9118322d..083dceb8 100644 --- a/README.md +++ b/README.md @@ -117,6 +117,7 @@ Pick by the **shape** of the problem, not the topic: | A problem decomposed before you act | **think** | Sequential thinking — linear or reflexion (forced self-critique), optional branches; surfaces open questions + a `goal` handoff. | | Your own decision pressure-tested | **nero** | Adversarial self-challenge — the top-rated *critic* (by tribunal rating, not the best builder) attacks it and returns a verdict. | | A library/repo/spec/fact looked up with sources | **research** | Keyless, web-grounded, *cited* research — Agon discovers sources (npm/GitHub/MDN/IETF/Stack Overflow/Wikipedia, **no API key**), an engine drafts a grounded answer, and Agon **verifies every citation**. | +| Your **own browser** driven to research / check a page | **chrome** | Drive your real browser (navigate / read / screenshot / click) from the REPL, the CLI, or Cesar — reuses a running agon the side panel is on, else embeds a transient bridge. Needs the Agon browser extension. | | Code built competitively against a test | **forge** | Engines race on the same task in isolated worktrees; the best test-passing patch is applied. | | The **first** passing solution, fast | **speculate** | N engines race; the first to pass the test wins and is applied immediately. | | A routine build with one engine | **pipeline** | Single-engine build → review → fix loop; no competition overhead. | @@ -138,6 +139,7 @@ Pick by the **shape** of the problem, not the topic: - Need to think a problem through before acting → `think` - Need your own decision attacked before you commit → `nero` - Need a library/repo/spec/fact looked up with trustworthy, verified sources → `research` +- Need your own browser driven to research or check a page design → `chrome` - Need the whole panel on a high-stakes, hard-to-reverse call → `council` - Need a whole open-ended thing built unattended, away from the desk → `conquer` @@ -219,6 +221,22 @@ agon research "what is a Merkle tree?" --json # emit the Rese - **Citations are verified, not trusted.** Every cited URL is re-fetched; dead/redirected/mismatched ones are surfaced so you know which lines to trust. - **For external CLIs too** — `agon call research "" --jsonl` lets Codex / Antigravity / Claude get web-grounded, citation-checked answers without their own search key. +### Chrome +Drive your **own browser** from Agon — to research live pages or check a design without leaving the flow. Where `research` reads keyless source APIs, `chrome` drives the *real* browser you're logged into: it navigates, reads the rendered page, screenshots it, and (with approval) clicks and types. The agentic ReAct brain runs the turn through a loopback bridge the **Agon browser extension's side panel** attaches to — the panel lends the page tools and executes them in your Chrome. + +```bash +/chrome open news.ycombinator.com and summarize the top 3 posts # in the agon REPL +/chrome check the pricing page design — is the hero cluttered? +agon chrome "open the staging dashboard and tell me if the chart overflows" # from the CLI +agon chrome "..." --auto-approve # act on the page without prompting (unattended) +agon call chrome "" [--auto-approve] # machine bridge for external CLIs (Codex/Antigravity) +``` + +- **Reuse-or-embed, no separate terminal.** `chrome` reuses a running agon (`agon serve` or the REPL) the side panel is already attached to; if nothing's running it embeds a transient in-process bridge just for the turn and writes the 0600 connection file the panel auto-connects through. The REPL `/chrome` hosts the bridge itself, so the panel auto-connects to your live agon. +- **Your approval is the backstop.** Read-only page tools (navigate / read / screenshot) run without prompting; page-changing actions (click / type) prompt with Agon's own Y/N gate (the REPL permission UI, or a terminal prompt from the CLI). `--auto-approve` allows them unattended — required for a non-interactive caller (`agon call chrome`) to act on the page. Page content is treated as untrusted: the approval gate is the prompt-injection backstop. +- **Never Cesar.** The browser turn runs on the agentic brain, never your live Cesar session (no split-brain); its result is fed back to Cesar so the conversation builds on what the browser found. In the REPL it's also recorded for `Ctrl+R` / `agon last`. +- **Requires the Agon browser extension** with its side panel open + attached. With no panel attached, the brain answers in text only (and says so). + ### Brainstorm Engines bid their confidence level on how they would tackle a complex problem. Engines with higher confidence bids are allocated more tokens and priority. diff --git a/docs/modes.md b/docs/modes.md index f3d4f897..5958202d 100644 --- a/docs/modes.md +++ b/docs/modes.md @@ -17,6 +17,7 @@ Run it inside the git repo you are working on. - `agon think "" [--strategy linear|reflexion] [--steps N] [--branches N]` — sequential thinking: decompose a problem into structured thoughts (reflexion forces a self-critique+revision per step; --branches explores alternatives), surface open questions, and emit a refined spec to hand to `agon goal`. Use to think before acting, or for engines with weak built-in reasoning. - `agon nero "" [--reasoning ""] [--focus ""] [--confidence N]` — adversarial self-challenge. The top-rated CRITIC (picked by tribunal-discipline rating, not the best builder) attacks your decision with concrete failure scenarios and returns a verdict (FLAWED | PROCEED WITH CAUTION | SOUND) plus its own confidence the original is correct. Use this INSTEAD of an internal evil-twin / devil's-advocate pass — it gives you a real second model, not your own reasoning mirrored. - `agon research "" [--count N] [--engine ]` — keyless, web-grounded, CITED research. Agon (not the model) discovers sources via first-party endpoints that need NO API key — npm registry, GitHub repo search, MDN, IETF/RFC datatracker, Stack Overflow, Wikipedia — WebFetches them, an engine drafts an answer grounded ONLY in that content with inline [n] citations, and Agon then re-fetches and VERIFIES every citation (rejecting dead/redirected/mismatched URLs). Use it to look up a library/repo/spec/Q&A/encyclopedic fact and get an answer with sources you can trust. A truly-general web query (no keyless lane) reports that rather than guessing. +- `agon chrome ""` (machine: `agon call chrome "" [--auto-approve] [--engine ]`) — drive the USER's own browser to research or check a page design (navigate / read / screenshot / click / type). Reuses a running agon (serve/REPL) the side panel is attached to, else embeds a transient bridge just for the run. Read-only page tools need no approval; add `--auto-approve` so a non-interactive caller can perform page-changing actions (without it they're denied). Requires the Agon browser extension with its side panel open + attached; with no panel attached the brain answers in text only. - `agon review ` — non-interactive multi-engine code review. - `agon goal "" --queue --gate ""` — autonomous controller: drives a task queue to completion unattended, looping build -> witness -> gate -> review (panel + judge) -> fix -> commit per task on a goal/ branch. Bound it with `--max-hours`/`--budget`; `--push` pushes each task. Long-running (designed for 8-24h). - `agon conquer "" --gate ""` — supervised-autonomous BUILD of an OPEN-ENDED task. Cesar drives a pluggable builder CLI (codex/claude/agy) in agent mode turn by turn; when the builder hits a fork it asks and Cesar convenes the cheapest sufficient consult (nero/tribunal/brainstorm/council) and feeds back a compact verdict; when it claims done, a layered done-oracle runs (the `--gate` command + diff acceptance-drift + a nero falsification round) and it STOPS at a HUMAN merge gate — it never auto-merges to main. The open-ended sibling to `goal`: use `conquer` when you CANNOT write a clean discriminating oracle up front (build a whole tool), `goal` when you can. `--push` needs a clean tree; bound it with `--max-turns`/`--max-hours`. @@ -63,6 +64,7 @@ The direct `agon ` commands stream human output. For JSONL lifecycle + out - explore, no decision needed -> campfire - think before acting / decompose -> think - pressure-test your own decision -> nero +- check a live page / a design in your browser -> chrome - judge existing code -> review - drive a whole task queue to done -> goal - build a whole open-ended thing -> conquer diff --git a/packages/cli/src/commands/chrome.ts b/packages/cli/src/commands/chrome.ts new file mode 100644 index 00000000..317c42c6 --- /dev/null +++ b/packages/cli/src/commands/chrome.ts @@ -0,0 +1,2 @@ +// Re-export from KERN-generated chrome command (source: kern/commands/chrome.kern) +export { chromeCommand, runChrome } from '../generated/commands/chrome.js'; diff --git a/packages/cli/src/generated/blocks/results-formatter.ts b/packages/cli/src/generated/blocks/results-formatter.ts index 2c263dda..3d109f00 100644 --- a/packages/cli/src/generated/blocks/results-formatter.ts +++ b/packages/cli/src/generated/blocks/results-formatter.ts @@ -1,6 +1,6 @@ // @generated by kern v4.0.0 — DO NOT EDIT. Source: src/kern/blocks/results-formatter.kern -import type { SessionResult, BrainstormResultData, CampfireResultData, TribunalResultData, ForgeResultData, ThinkResultData, CouncilResultData, SynthesisResultData, NeroResultData, ReviewResultData, ResearchResultData, ChatSession } from '@kernlang/agon-core'; +import type { SessionResult, BrainstormResultData, CampfireResultData, TribunalResultData, ForgeResultData, ThinkResultData, CouncilResultData, SynthesisResultData, NeroResultData, ReviewResultData, ResearchResultData, ChromeResultData, ChatSession } from '@kernlang/agon-core'; // @kern-source: results-formatter:3 export const BOLD: string = '\x1b[1m'; @@ -263,6 +263,22 @@ export function formatResearch(r: SessionResult, idx: number): string { } // @kern-source: results-formatter:227 +export function formatChrome(r: SessionResult, idx: number): string { + const data = r.data as ChromeResultData; + const lines: string[] = []; + lines.push(`${BOLD}${CYAN}${RULE}${RESET}`); + lines.push(`${BOLD} CHROME #${idx} ${DIM}· ${formatTime(r.timestamp)} · ${RESET}${BOLD}"${r.question}"${RESET}`); + lines.push(`${DIM} Engine: ${data.engineId}${data.pageActivity ? ' · drove the page' : ' · text only (no page tools ran)'}${RESET}`); + lines.push(`${BOLD}${CYAN}${RULE}${RESET}`); + lines.push(''); + if (data.answer) { + lines.push(data.answer); + lines.push(''); + } + return lines.join('\n'); +} + +// @kern-source: results-formatter:241 export function formatReview(r: SessionResult, idx: number): string { const data = r.data as ReviewResultData; const lines: string[] = []; @@ -286,10 +302,10 @@ export function formatReview(r: SessionResult, idx: number): string { return lines.join('\n'); } -// @kern-source: results-formatter:248 +// @kern-source: results-formatter:262 export function formatSessionResults(results: SessionResult[]): string { if (results.length === 0) { - return `${DIM}No results in this session yet. Run /brainstorm, /campfire, /tribunal, /forge, /think, /council, /synthesis, /nero, /research, or /review first.${RESET}\n`; + return `${DIM}No results in this session yet. Run /brainstorm, /campfire, /tribunal, /forge, /think, /council, /synthesis, /nero, /research, /chrome, or /review first.${RESET}\n`; } const sections: string[] = []; @@ -308,6 +324,7 @@ export function formatSessionResults(results: SessionResult[]): string { case 'synthesis': sections.push(formatSynthesis(r, idx)); break; case 'nero': sections.push(formatNero(r, idx)); break; case 'research': sections.push(formatResearch(r, idx)); break; + case 'chrome': sections.push(formatChrome(r, idx)); break; case 'review': sections.push(formatReview(r, idx)); break; } } diff --git a/packages/cli/src/generated/bridge/agentic-brain-client.ts b/packages/cli/src/generated/bridge/agentic-brain-client.ts index bfe18df0..4edd0239 100644 --- a/packages/cli/src/generated/bridge/agentic-brain-client.ts +++ b/packages/cli/src/generated/bridge/agentic-brain-client.ts @@ -104,6 +104,14 @@ export function buildAgentSystemPrompt(tools: CapabilitySpec[], base?: string): 'TOOL LINE NOW. One tool per step — you then get its result and may call another tool.', 'Read the page before acting on it. Never fabricate a result or claim an action you', 'did not take. Reply in prose ONLY when giving the user your FINAL answer (no tool line).', + '', + 'BE PROACTIVE — take initiative. When the user asks you to DO something on the web —', + 'search for / find / look up something, open or navigate to a site, fill a field, click —', + 'ACT by calling tools; the user wants it DONE, not described, and does NOT want to spell', + 'out each step. Drive it yourself across multiple steps (read → act → read → act) until', + 'the goal is reached. Need a site you are not on? navigate there, then readPage, then act.', + 'This holds in EVERY language: a request written in German/French/etc. STILL requires tool', + 'lines — never answer a "do X" request with only a prose promise to do it.', ); return lines.join('\n'); } @@ -111,7 +119,7 @@ export function buildAgentSystemPrompt(tools: CapabilitySpec[], base?: string): /** * The growing ReAct transcript re-sent to the (stateless, exec-mode) engine each step: the original request plus every prior tool call and its result, ending with a nudge to act or answer. */ -// @kern-source: agentic-brain-client:119 +// @kern-source: agentic-brain-client:127 export function renderAgentTranscript(userInput: string, steps: Array<{ name: string; input: Record; output: string }>): string { const lines: string[] = [`User request: ${userInput}`, '']; if (steps.length === 0) { @@ -128,21 +136,31 @@ export function renderAgentTranscript(userInput: string, steps: Array<{ name: st } /** - * True when an engine's reply DESCRIBES an action ('Let me navigate…', 'I'll click…') but emitted no tool call — a short intent preamble, not a final answer. Used to NUDGE the engine to actually emit the tool line instead of treating the narration as a final answer (the common weak-engine failure). Looks only at the head + caps length to avoid matching a real prose answer that happens to say 'review' or 'let me know'. + * True when an engine's reply DESCRIBES an action ('Let me navigate…', 'Ich suche…') but emitted no tool call — a short intent preamble, not a final answer — so the loop can NUDGE it to actually emit the tool line (the common weak-engine failure: the brain says it will act, then stops). Covers English + German narration, the panel's two languages; other languages degrade to no-nudge (the proactivity system prompt still pushes the model to act in any language — this is only the backstop). Bounded to short replies so a real prose answer that mentions 'review'/'search' isn't misread as narration. */ -// @kern-source: agentic-brain-client:136 +// @kern-source: agentic-brain-client:144 export function looksLikeActionIntent(text: string): boolean { - const head = text.trim().slice(0, 200).toLowerCase(); + const s = text.trim(); + if (s.length === 0 || s.length >= 500) return false; // a substantive reply is a real answer, not a preamble + const head = s.slice(0, 200).toLowerCase(); if (/\blet me know\b/.test(head)) return false; // a sign-off, not an action - const intent = /\b(let me|i'?ll|i will|i'?m going to|i am going to|let's|first,? i|now i'?ll|going to)\b/.test(head); - const verb = /\b(navigat|click|type|go to|open|fill|select|read|review|look at|check|press|scroll|enter|search|find)\b/.test(head); - return intent && verb && text.trim().length < 500; + // English: a first-person "about to act" opener AND an action verb. Verb stems anchor at + // the word START only (no trailing \b), so a stem matches its inflections (navigat→navigate). + const enIntent = /\b(let me|i'?ll|i will|i'?m going to|i am going to|let's|now i'?ll|going to)\b/.test(head); + const enVerb = /\b(navigat|go to|open|fill|select|read|review|look at|check|press|scroll|click|type|search|find|brows)/.test(head); + // German: a first-person SELF-action ("ich suche/navigiere/öffne…", "starte ich"). Listed as + // explicit "ich " pairs (not a bare adverb or a stem) so opinion phrasing like + // "ich finde X gut" (I think X is good) and "jetzt sieht …" (now … looks) are NOT misread as + // intent. ASCII-anchored before "ich", which sidesteps the \b-vs-umlaut problem (\b fails + // before 'ö', so a bare "\böffne" never matched). + const deAction = /\b(ich\s+(werde|möchte|gehe|navigiere|klicke|tippe|öffne|lese|suche|scrolle|wähle|prüfe|fülle|drücke|starte|beginne|rufe|schaue)|starte ich|beginne ich|lass mich)\b/.test(head); + return (enIntent && enVerb) || deAction; } /** * A compact, human-readable one-liner for the approval popup's `command` field — what the agent is about to do. */ -// @kern-source: agentic-brain-client:146 +// @kern-source: agentic-brain-client:164 export function describeAgentAction(name: string, input: Record): string { let arg = ''; try { arg = JSON.stringify(input); } catch { arg = '{…}'; } @@ -153,7 +171,7 @@ export function describeAgentAction(name: string, input: Record /** * v2 BrainClient: a bounded ReAct tool-loop over one engine, with client-lent capabilities (registerCapability) the brain pulls mid-turn via capability-request, and a per-action approval gate for destructive tools. Construct with the daemon's EngineRegistry; open() binds engine/cwd; runTurn() drives the loop; provideCapabilityResult/provideApproval answer the *-request events by requestId. */ -// @kern-source: agentic-brain-client:157 +// @kern-source: agentic-brain-client:175 export class AgenticTurnBrainClient implements BrainClient { private registry: EngineRegistry; private adapter: EngineAdapter; @@ -496,7 +514,7 @@ export class AgenticTurnBrainClient implements BrainClient { /** * Factory mirroring createHeadlessTurnBrainClient: build the v2 agentic tool-loop BrainClient from the daemon's EngineRegistry. */ -// @kern-source: agentic-brain-client:522 +// @kern-source: agentic-brain-client:540 export function createAgenticTurnBrainClient(registry: EngineRegistry): BrainClient { return new AgenticTurnBrainClient(registry); } diff --git a/packages/cli/src/generated/commands/agent-guide-text.ts b/packages/cli/src/generated/commands/agent-guide-text.ts index dc8a3dc9..4dc6fb9d 100644 --- a/packages/cli/src/generated/commands/agent-guide-text.ts +++ b/packages/cli/src/generated/commands/agent-guide-text.ts @@ -21,6 +21,7 @@ export function agentGuideMarkdown(): string { '- `agon think "" [--strategy linear|reflexion] [--steps N] [--branches N]` — sequential thinking: decompose a problem into structured thoughts (reflexion forces a self-critique+revision per step; --branches explores alternatives), surface open questions, and emit a refined spec to hand to `agon goal`. Use to think before acting, or for engines with weak built-in reasoning.', '- `agon nero "" [--reasoning ""] [--focus ""] [--confidence N]` — adversarial self-challenge. The top-rated CRITIC (picked by tribunal-discipline rating, not the best builder) attacks your decision with concrete failure scenarios and returns a verdict (FLAWED | PROCEED WITH CAUTION | SOUND) plus its own confidence the original is correct. Use this INSTEAD of an internal evil-twin / devil\'s-advocate pass — it gives you a real second model, not your own reasoning mirrored.', '- `agon research "" [--count N] [--engine ]` — keyless, web-grounded, CITED research. Agon (not the model) discovers sources via first-party endpoints that need NO API key — npm registry, GitHub repo search, MDN, IETF/RFC datatracker, Stack Overflow, Wikipedia — WebFetches them, an engine drafts an answer grounded ONLY in that content with inline [n] citations, and Agon then re-fetches and VERIFIES every citation (rejecting dead/redirected/mismatched URLs). Use it to look up a library/repo/spec/Q&A/encyclopedic fact and get an answer with sources you can trust. A truly-general web query (no keyless lane) reports that rather than guessing.', + '- `agon chrome ""` (machine: `agon call chrome "" [--auto-approve] [--engine ]`) — drive the USER\'s own browser to research or check a page design (navigate / read / screenshot / click / type). Reuses a running agon (serve/REPL) the side panel is attached to, else embeds a transient bridge just for the run. Read-only page tools need no approval; add `--auto-approve` so a non-interactive caller can perform page-changing actions (without it they\'re denied). Requires the Agon browser extension with its side panel open + attached; with no panel attached the brain answers in text only.', '- `agon review ` — non-interactive multi-engine code review.', '- `agon goal "" --queue --gate ""` — autonomous controller: drives a task queue to completion unattended, looping build -> witness -> gate -> review (panel + judge) -> fix -> commit per task on a goal/ branch. Bound it with `--max-hours`/`--budget`; `--push` pushes each task. Long-running (designed for 8-24h).', '- `agon conquer "" --gate ""` — supervised-autonomous BUILD of an OPEN-ENDED task. Cesar drives a pluggable builder CLI (codex/claude/agy) in agent mode turn by turn; when the builder hits a fork it asks and Cesar convenes the cheapest sufficient consult (nero/tribunal/brainstorm/council) and feeds back a compact verdict; when it claims done, a layered done-oracle runs (the `--gate` command + diff acceptance-drift + a nero falsification round) and it STOPS at a HUMAN merge gate — it never auto-merges to main. The open-ended sibling to `goal`: use `conquer` when you CANNOT write a clean discriminating oracle up front (build a whole tool), `goal` when you can. `--push` needs a clean tree; bound it with `--max-turns`/`--max-hours`.', @@ -67,6 +68,7 @@ export function agentGuideMarkdown(): string { '- explore, no decision needed -> campfire', '- think before acting / decompose -> think', '- pressure-test your own decision -> nero', + '- check a live page / a design in your browser -> chrome', '- judge existing code -> review', '- drive a whole task queue to done -> goal', '- build a whole open-ended thing -> conquer', @@ -80,7 +82,7 @@ export function agentGuideMarkdown(): string { /** * docs/modes.md content — the agent guide re-emitted as a docs-corpus page so RAG ('agon rag query', the ProjectContext MCP tool, --ground) answers mode questions like 'tribunal vs council' with citations. The guide stays the single source of truth; regenerate with npm run docs:modes. */ -// @kern-source: agent-guide-text:82 +// @kern-source: agent-guide-text:84 export function modeDocsMarkdown(): string { return [ '', @@ -95,10 +97,10 @@ export function modeDocsMarkdown(): string { /** * Per-CLI /agon slash-command shim. format is one of agy | claude | markdown. */ -// @kern-source: agent-guide-text:95 +// @kern-source: agent-guide-text:97 export function agonShim(format: string): string { const body = [ - 'You have access to Agon, a multi-AI orchestration CLI (forge, synthesis, brainstorm, tribunal, council, campfire, think, nero, research, review, goal, conquer).', + 'You have access to Agon, a multi-AI orchestration CLI (forge, synthesis, brainstorm, tribunal, council, campfire, think, nero, research, chrome, review, goal, conquer).', 'First run `agon agent-guide` in the shell to see exactly how to call it, then use the right Agon mode to handle the request.', 'Call agon with your normal shell/Bash tool — there is no MCP and nothing is loaded until you invoke it.', ].join('\n'); @@ -143,7 +145,7 @@ export function agonShim(format: string): string { /** * Native Codex skill that exposes Agon as $agon. */ -// @kern-source: agent-guide-text:141 +// @kern-source: agent-guide-text:143 export function codexSkillMarkdown(): string { return [ '---', @@ -172,6 +174,7 @@ export function codexSkillMarkdown(): string { '- `campfire`: open multi-engine discussion.', '- `think`: sequential thinking — decompose a problem (optionally branch) and surface open questions before acting.', '- `nero`: adversarial self-challenge — the top-rated critic attacks a decision and returns a verdict (FLAWED / PROCEED WITH CAUTION / SOUND).', + '- `chrome`: drive the user\'s own browser (navigate/read/screenshot/click) for research or checking a page design; needs the Agon browser extension. Machine form: `agon call chrome "" [--auto-approve]`.', '- `review`: non-interactive AI review of a diff target.', '- `goal`: autonomous task-queue execution with stronger gates.', '', @@ -181,7 +184,7 @@ export function codexSkillMarkdown(): string { /** * Codex UI metadata for the Agon skill. */ -// @kern-source: agent-guide-text:177 +// @kern-source: agent-guide-text:180 export function codexSkillOpenAiYaml(): string { return [ 'interface:', diff --git a/packages/cli/src/generated/commands/agent-guide.ts b/packages/cli/src/generated/commands/agent-guide.ts index 7abc16f9..376ad90a 100644 --- a/packages/cli/src/generated/commands/agent-guide.ts +++ b/packages/cli/src/generated/commands/agent-guide.ts @@ -39,6 +39,7 @@ export const agentGuideCommand: any = defineCommand({ { name: 'think', cmd: 'agon think "" --strategy reflexion [--steps N --branches N]', use: 'sequential thinking — decompose a problem (optionally branch) and surface open questions before acting' }, { name: 'nero', cmd: 'agon nero "" --reasoning ""', use: 'adversarial self-challenge — the top-rated critic attacks YOUR decision and returns a verdict. Use this instead of an internal evil-twin/devils-advocate pass' }, { name: 'research', cmd: 'agon research "" [--count N] [--engine ]', use: 'keyless web-grounded, CITED research — Agon discovers sources (npm/GitHub/MDN/IETF/Stack Overflow/Wikipedia, no API key), an engine drafts an answer grounded only in them, and Agon re-fetches and VERIFIES every citation. Use to look up a library/repo/spec/Q&A/fact with sources you can trust' }, + { name: 'chrome', cmd: 'agon call chrome "" [--auto-approve] [--engine ] (direct: agon chrome "")', use: "drive the USER's own browser for research / checking a page design — reuses a running agon (serve/REPL) the side panel is attached to, else embeds a transient bridge for the run. Read-only page tools (navigate/read/screenshot) need no approval; --auto-approve lets a non-interactive caller perform page-changing actions (click/type). Requires the Agon browser extension with its side panel open" }, { name: 'review', cmd: 'agon review uncommitted', use: 'multi-engine code review' }, { name: 'goal', cmd: 'agon goal "" --queue --gate ""', use: 'autonomously drive a task queue to completion (build->review->fix->commit per task); long-running', setup: 'each task verify + --gate IS the spec: --dryRun, gate green at base, each slice RED-at-base for the right reason. Add --oracle-gate=warn|strict to have the panel try to GAME each verify before forging (strict refuses to launch if any verify is gameable) — the built-in oracle red-team.' }, { name: 'conquer', cmd: 'agon conquer "" --gate ""', use: 'supervised-autonomous BUILD of an OPEN-ENDED task: Cesar drives a pluggable builder CLI (codex/claude/agy) unattended; on a fork the builder asks and Cesar convenes the cheapest consult (nero/tribunal/brainstorm/council); on done a layered oracle runs and it STOPS at a human merge gate (never auto-merges). The open-ended sibling to goal — use it when you cannot write a clean discriminating oracle up front.', setup: 'the --gate command IS the done-oracle floor (L0). Runs in cwd; --push needs a clean tree at start.' }, diff --git a/packages/cli/src/generated/commands/call.ts b/packages/cli/src/generated/commands/call.ts index 6c0330bb..431840de 100644 --- a/packages/cli/src/generated/commands/call.ts +++ b/packages/cli/src/generated/commands/call.ts @@ -39,21 +39,22 @@ export interface CallCommandOptions { oracleGate?: string; count?: string; engine?: string; + autoApprove?: boolean; } -// @kern-source: call:36 +// @kern-source: call:37 export interface BuiltCallCommands { cwd: string; commands: string[][]; } -// @kern-source: call:40 +// @kern-source: call:41 export function textFlag(flag: string, value: string|undefined): string[] { const text = value?.trim(); return text ? [flag, text] : []; } -// @kern-source: call:46 +// @kern-source: call:47 export function requireInput(workflow: string, input: string|undefined): string { const text = input?.trim(); if (!text) { @@ -62,7 +63,7 @@ export function requireInput(workflow: string, input: string|undefined): string return text; } -// @kern-source: call:55 +// @kern-source: call:56 export function exitWithFailure(message: string): never { fail(message); process.exit(1); @@ -72,7 +73,7 @@ export function exitWithFailure(message: string): never { /** * Enforce the HARD removedEngines denylist at the external-CLI boundary, BEFORE any --engines list is forwarded to a subcommand. Without this, an external CLI (Codex/Antigravity) that passes --engines a,b, would resurrect a hard-removed engine, since explicit -e lists bypass the registry's auto roster. Fails loudly (pre-run error) rather than silently dropping — silent roster rewrite is the trust hazard (Council batch-2 verdict). */ -// @kern-source: call:62 +// @kern-source: call:63 export function assertNoRemovedEngines(enginesCsv: string|undefined): void { const text = enginesCsv?.trim(); if (!text) return; @@ -92,12 +93,12 @@ export function assertNoRemovedEngines(enginesCsv: string|undefined): void { } } -// @kern-source: call:83 +// @kern-source: call:84 export function normalizeCallWorkflow(workflow: string): string { return workflow.trim().toLowerCase().replace(/_/g, '-'); } -// @kern-source: call:88 +// @kern-source: call:89 export function buildCallCommands(opts: CallCommandOptions): BuiltCallCommands { const workflow = normalizeCallWorkflow(opts.workflow); const cwd = opts.cwd?.trim() || process.cwd(); @@ -219,6 +220,19 @@ export function buildCallCommands(opts: CallCommandOptions): BuiltCallCommands { ...timeout, ...engines, ]); + } else if (workflow === 'chrome') { + // Drive the user's browser through a running agon (serve/REPL) the side panel is + // attached to, else a transient embedded bridge. Read-only page tools (read/ + // screenshot) need no approval; --auto-approve lets a non-interactive external CLI + // perform page-changing actions (click/type) unattended — without it they're denied. + // --engine is the per-turn engine override (singular; chrome takes no --engines pool). + const task = requireInput(workflow, opts.input); + commands.push([ + 'chrome', + task, + ...(opts.autoApprove ? ['--auto-approve'] : []), + ...textFlag('--engine', opts.engine), + ]); } else if (workflow === 'review') { commands.push(['review', opts.input?.trim() || 'uncommitted', ...timeout, ...engines]); } else if (workflow === 'goal') { @@ -264,18 +278,18 @@ export function buildCallCommands(opts: CallCommandOptions): BuiltCallCommands { commands.push(['forge', task, '--test', fitness, '--cwd', cwd, ...timeout, ...engines]); commands.push(['tribunal', `Review the pipeline result for: ${task}`, '--rounds', opts.rounds?.trim() || '1', ...tribunalMode, ...timeout, ...engines]); } else { - throw new Error(`Unknown call workflow: ${opts.workflow}. Use forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, research, conquer, pipeline, review, goal, doctor, or a team-* workflow.`); + throw new Error(`Unknown call workflow: ${opts.workflow}. Use forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, research, conquer, chrome, pipeline, review, goal, doctor, or a team-* workflow.`); } return { cwd, commands }; } -// @kern-source: call:261 +// @kern-source: call:275 export function writeJsonl(event: Record): void { process.stdout.write(`${JSON.stringify({ ...event, timestamp: new Date().toISOString() })}\n`); } -// @kern-source: call:266 +// @kern-source: call:280 export async function runCommand(command: string, args: string[], cwd: string, jsonl: boolean): Promise { return new Promise((resolve) => { const startedAt = Date.now(); @@ -319,7 +333,7 @@ export async function runCommand(command: string, args: string[], cwd: string, j }); } -// @kern-source: call:310 +// @kern-source: call:324 export const callCommand: any = defineCommand({ meta: { name: 'call', @@ -328,7 +342,7 @@ export const callCommand: any = defineCommand({ args: { workflow: { type: 'positional', - description: 'Workflow: forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, conquer, pipeline, review, goal, doctor, or team-*', + description: 'Workflow: forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, research, conquer, chrome, pipeline, review, goal, doctor, or team-*', required: true, }, input: { @@ -445,7 +459,12 @@ export const callCommand: any = defineCommand({ }, engine: { type: 'string', - description: 'For research: force a specific engine to draft the answer (skips rating-based selection)', + description: 'For research/chrome: force a specific engine (research: draft the answer; chrome: per-turn engine override)', + }, + 'auto-approve': { + type: 'boolean', + description: 'For chrome: auto-approve page-changing actions (click/type) unattended — required for a non-interactive caller to act on the page (read-only tools never prompt)', + default: false, }, }, async run({ args }) { @@ -480,6 +499,7 @@ export const callCommand: any = defineCommand({ oracleGate: args.oracleGate, count: args.count, engine: args.engine, + autoApprove: args['auto-approve'], }); } catch (err) { exitWithFailure(err instanceof Error ? err.message : String(err)); diff --git a/packages/cli/src/generated/commands/chrome.ts b/packages/cli/src/generated/commands/chrome.ts new file mode 100644 index 00000000..75013c6f --- /dev/null +++ b/packages/cli/src/generated/commands/chrome.ts @@ -0,0 +1,82 @@ +// @generated by kern v4.0.0 — DO NOT EDIT. Source: src/kern/commands/chrome.kern + +import { defineCommand } from 'citty'; + +import { ensureAgonHome, resolveWorkingDir } from '@kernlang/agon-core'; + +import { ensureChromeBridge, closeChromeBridge } from '../signals/chrome-bridge.js'; + +import { runDrive } from './drive.js'; + +import { info, warn, dim } from '../blocks/output-format.js'; + +/** + * Drive one terminal browser-turn for `agon chrome `: resolve the bridge via ensureChromeBridge (reuse a live serve/REPL bridge, else embed a transient in-process one), then hand the bridge's url+token to the battle-tested runDrive loop (terminal render + readline approval). If we EMBEDDED the bridge, tear it down afterward (closeChromeBridge); a REUSED live bridge is left running. The embedded brain is the agentic ReAct brain — never Cesar — so a browser turn can't write the live session. + */ +// @kern-source: chrome:31 +export async function runChrome(opts: { task: string; engine?: string; autoApprove: boolean }): Promise { + ensureAgonHome(); + const cwd = resolveWorkingDir(); + try { + const bridge = await ensureChromeBridge(cwd); + if (bridge.embedded && bridge.origins.length === 0) { + warn('Started an embedded bridge but no browser Origin is allowed yet — the side panel can’t attach, so this runs without page tools. Set one: `agon config chromeExtensionOrigin chrome-extension://`.'); + } else if (bridge.embedded) { + info(dim('Embedded a bridge for this run — open the Agon side panel; it auto-connects.')); + } + await runDrive({ + prompt: opts.task, + url: bridge.url, + token: bridge.token, + engine: opts.engine, + autoApprove: opts.autoApprove, + }); + } finally { + // closeChromeBridge tears down ONLY an embedded bridge we own — it reads the module + // holder, which a REUSED live serve/REPL never sets, so this is a no-op there. Calling + // it unconditionally in finally also cleans up if ensureChromeBridge threw *after* + // starting the embedded server, so an error path can't leak the loopback bridge. + await closeChromeBridge(); + } +} + +// @kern-source: chrome:59 +export const chromeCommand: any = defineCommand({ + meta: { + name: 'chrome', + description: 'Drive your browser from the terminal — reuses a running agon (serve/REPL) the side panel is on, or embeds a transient bridge (research, check a page design, navigate/read/screenshot)', + }, + args: { + task: { + type: 'positional', + description: 'What to do, e.g. "open the pricing page and tell me if the hero is cluttered"', + required: false, + }, + engine: { + type: 'string', + alias: 'e', + description: 'Per-turn engine override (defaults to the bridge session engine)', + required: false, + }, + 'auto-approve': { + type: 'boolean', + description: "Auto-approve page-changing actions (unattended). Off = prompt in the terminal.", + required: false, + }, + }, + async run({ args }: { args: { task?: string; engine?: string; 'auto-approve'?: boolean } }) { + ensureAgonHome(); + const task = typeof args.task === 'string' ? args.task.trim() : ''; + if (!task) { + info('Usage: `agon chrome ""` — drives your browser through a running agon (or a transient embedded bridge) + the open side panel.'); + info(dim(' e.g. agon chrome "open news.ycombinator.com and summarize the top 3 posts"')); + info(dim(' --auto-approve act without prompting -e per-turn engine')); + return; + } + await runChrome({ + task, + engine: typeof args.engine === 'string' ? args.engine : undefined, + autoApprove: args['auto-approve'] === true, + }); + }, +}); diff --git a/packages/cli/src/generated/handlers/chrome.ts b/packages/cli/src/generated/handlers/chrome.ts index 478f919d..e31fe5c5 100644 --- a/packages/cli/src/generated/handlers/chrome.ts +++ b/packages/cli/src/generated/handlers/chrome.ts @@ -8,6 +8,8 @@ import { ensureChromeBridge } from '../signals/chrome-bridge.js'; import { parseSseChunk, approvalTargetsClient } from '../commands/drive.js'; +import { sessionResultStore } from '../models/session-results.js'; + import { ENGINE_COLORS } from '../blocks/output-format.js'; import { recordRun } from '../../telemetry/index.js'; @@ -17,7 +19,7 @@ import type { Dispatch, HandlerContext } from '../../handlers/types.js'; /** * Drive one browser-agent turn for `/chrome `: resolve/embed the bridge, render the turn inline via dispatch, approve page actions via the REPL permission-ask UI, and append the answer to the chat session for Cesar. Mirrors `agon drive`'s two-connection client (blocking /send + SSE tail) but with REPL-native render + approval. Returns true only when the turn produced an answer — the dispatch layer gates the Cesar continuation on it so a no-result turn never makes Cesar summarize stale context. */ -// @kern-source: chrome:21 +// @kern-source: chrome:22 export async function handleChrome(task: string, dispatch: Dispatch, ctx: HandlerContext): Promise { const chrAbort = new AbortController(); try { @@ -170,6 +172,17 @@ export async function handleChrome(task: string, dispatch: Dispatch, ctx: Handle const produced = answer.trim().length > 0; if (produced) { appendMessage(ctx.chatSession, { role: 'engine', engineId: answerEngine || 'agon', content: answer, timestamp: new Date().toISOString() }); + // Record for the Ctrl+R results pager / `agon last` — the full browser-turn + // answer lives here so the transcript stays compact. pageActivity reflects whether + // the panel actually drove the page (vs a text-only answer with no panel attached). + sessionResultStore.add({ + type: 'chrome', + timestamp: new Date().toISOString(), + question: task, + engines: answerEngine ? [answerEngine] : [], + winner: answerEngine || null, + data: { task, answer, engineId: answerEngine || 'agon', pageActivity: sawBrowserActivity }, + }); } const ok = produced && !failed && !streamFailed && !endedReason; recordRun({ mode: 'chrome', intent: task, winner: undefined, success: ok, durationMs: Date.now() - startTime, engineIds: answerEngine ? [answerEngine] : [], completionState: (failed || endedReason) ? 'aborted' : 'completed' }); diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts index 6cf96d5e..7aed510c 100644 --- a/packages/cli/src/index.ts +++ b/packages/cli/src/index.ts @@ -34,6 +34,7 @@ import { attachCommand } from './commands/attach.js'; import { daemonCommand } from './commands/daemon.js'; import { serveCommand } from './commands/serve.js'; import { driveCommand } from './commands/drive.js'; +import { chromeCommand } from './commands/chrome.js'; import { extCommand } from './commands/ext.js'; import { loginCommand } from './commands/login.js'; import { updateCommand } from './commands/update.js'; @@ -232,6 +233,7 @@ const main = defineCommand({ daemon: daemonCommand, serve: serveCommand, drive: driveCommand, + chrome: chromeCommand, ext: extCommand, login: loginCommand, update: updateCommand, diff --git a/packages/cli/src/kern/blocks/results-formatter.kern b/packages/cli/src/kern/blocks/results-formatter.kern index c9360847..d50cdc26 100644 --- a/packages/cli/src/kern/blocks/results-formatter.kern +++ b/packages/cli/src/kern/blocks/results-formatter.kern @@ -1,4 +1,4 @@ -import from="@kernlang/agon-core" names="SessionResult,BrainstormResultData,CampfireResultData,TribunalResultData,ForgeResultData,ThinkResultData,CouncilResultData,SynthesisResultData,NeroResultData,ReviewResultData,ResearchResultData,ChatSession" types=true +import from="@kernlang/agon-core" names="SessionResult,BrainstormResultData,CampfireResultData,TribunalResultData,ForgeResultData,ThinkResultData,CouncilResultData,SynthesisResultData,NeroResultData,ReviewResultData,ResearchResultData,ChromeResultData,ChatSession" types=true const name=BOLD type=string value={{ '\x1b[1m' }} @@ -224,6 +224,20 @@ fn name=formatResearch params="r:SessionResult, idx:number" returns=string do value="lines.push('')" return value="lines.join('\\n')" +fn name=formatChrome params="r:SessionResult, idx:number" returns=string + handler lang="kern" + let name=data value="r.data as ChromeResultData" + let name=lines type="string[]" value="[]" + do value="lines.push(`${BOLD}${CYAN}${RULE}${RESET}`)" + do value="lines.push(`${BOLD} CHROME #${idx} ${DIM}· ${formatTime(r.timestamp)} · ${RESET}${BOLD}\"${r.question}\"${RESET}`)" + do value="lines.push(`${DIM} Engine: ${data.engineId}${data.pageActivity ? ' · drove the page' : ' · text only (no page tools ran)'}${RESET}`)" + do value="lines.push(`${BOLD}${CYAN}${RULE}${RESET}`)" + do value="lines.push('')" + if cond="data.answer" + do value="lines.push(data.answer)" + do value="lines.push('')" + return value="lines.join('\\n')" + fn name=formatReview params="r:SessionResult, idx:number" returns=string handler lang="kern" let name=data value="r.data as ReviewResultData" @@ -248,7 +262,7 @@ fn name=formatReview params="r:SessionResult, idx:number" returns=string fn name=formatSessionResults params="results:SessionResult[]" returns=string export=true handler <<< if (results.length === 0) { - return `${DIM}No results in this session yet. Run /brainstorm, /campfire, /tribunal, /forge, /think, /council, /synthesis, /nero, /research, or /review first.${RESET}\n`; + return `${DIM}No results in this session yet. Run /brainstorm, /campfire, /tribunal, /forge, /think, /council, /synthesis, /nero, /research, /chrome, or /review first.${RESET}\n`; } const sections: string[] = []; @@ -267,6 +281,7 @@ fn name=formatSessionResults params="results:SessionResult[]" returns=string exp case 'synthesis': sections.push(formatSynthesis(r, idx)); break; case 'nero': sections.push(formatNero(r, idx)); break; case 'research': sections.push(formatResearch(r, idx)); break; + case 'chrome': sections.push(formatChrome(r, idx)); break; case 'review': sections.push(formatReview(r, idx)); break; } } diff --git a/packages/cli/src/kern/bridge/agentic-brain-client.kern b/packages/cli/src/kern/bridge/agentic-brain-client.kern index 755ff7df..4e55c668 100644 --- a/packages/cli/src/kern/bridge/agentic-brain-client.kern +++ b/packages/cli/src/kern/bridge/agentic-brain-client.kern @@ -112,6 +112,14 @@ fn name=buildAgentSystemPrompt params="tools:CapabilitySpec[], base?:string" ret 'TOOL LINE NOW. One tool per step — you then get its result and may call another tool.', 'Read the page before acting on it. Never fabricate a result or claim an action you', 'did not take. Reply in prose ONLY when giving the user your FINAL answer (no tool line).', + '', + 'BE PROACTIVE — take initiative. When the user asks you to DO something on the web —', + 'search for / find / look up something, open or navigate to a site, fill a field, click —', + 'ACT by calling tools; the user wants it DONE, not described, and does NOT want to spell', + 'out each step. Drive it yourself across multiple steps (read → act → read → act) until', + 'the goal is reached. Need a site you are not on? navigate there, then readPage, then act.', + 'This holds in EVERY language: a request written in German/French/etc. STILL requires tool', + 'lines — never answer a "do X" request with only a prose promise to do it.', ); return lines.join('\n'); >>> @@ -134,13 +142,23 @@ fn name=renderAgentTranscript params="userInput:string, steps:Array<{ name: stri >>> fn name=looksLikeActionIntent params="text:string" returns="boolean" export=true - doc "True when an engine's reply DESCRIBES an action ('Let me navigate…', 'I'll click…') but emitted no tool call — a short intent preamble, not a final answer. Used to NUDGE the engine to actually emit the tool line instead of treating the narration as a final answer (the common weak-engine failure). Looks only at the head + caps length to avoid matching a real prose answer that happens to say 'review' or 'let me know'." + doc "True when an engine's reply DESCRIBES an action ('Let me navigate…', 'Ich suche…') but emitted no tool call — a short intent preamble, not a final answer — so the loop can NUDGE it to actually emit the tool line (the common weak-engine failure: the brain says it will act, then stops). Covers English + German narration, the panel's two languages; other languages degrade to no-nudge (the proactivity system prompt still pushes the model to act in any language — this is only the backstop). Bounded to short replies so a real prose answer that mentions 'review'/'search' isn't misread as narration." handler <<< - const head = text.trim().slice(0, 200).toLowerCase(); + const s = text.trim(); + if (s.length === 0 || s.length >= 500) return false; // a substantive reply is a real answer, not a preamble + const head = s.slice(0, 200).toLowerCase(); if (/\blet me know\b/.test(head)) return false; // a sign-off, not an action - const intent = /\b(let me|i'?ll|i will|i'?m going to|i am going to|let's|first,? i|now i'?ll|going to)\b/.test(head); - const verb = /\b(navigat|click|type|go to|open|fill|select|read|review|look at|check|press|scroll|enter|search|find)\b/.test(head); - return intent && verb && text.trim().length < 500; + // English: a first-person "about to act" opener AND an action verb. Verb stems anchor at + // the word START only (no trailing \b), so a stem matches its inflections (navigat→navigate). + const enIntent = /\b(let me|i'?ll|i will|i'?m going to|i am going to|let's|now i'?ll|going to)\b/.test(head); + const enVerb = /\b(navigat|go to|open|fill|select|read|review|look at|check|press|scroll|click|type|search|find|brows)/.test(head); + // German: a first-person SELF-action ("ich suche/navigiere/öffne…", "starte ich"). Listed as + // explicit "ich " pairs (not a bare adverb or a stem) so opinion phrasing like + // "ich finde X gut" (I think X is good) and "jetzt sieht …" (now … looks) are NOT misread as + // intent. ASCII-anchored before "ich", which sidesteps the \b-vs-umlaut problem (\b fails + // before 'ö', so a bare "\böffne" never matched). + const deAction = /\b(ich\s+(werde|möchte|gehe|navigiere|klicke|tippe|öffne|lese|suche|scrolle|wähle|prüfe|fülle|drücke|starte|beginne|rufe|schaue)|starte ich|beginne ich|lass mich)\b/.test(head); + return (enIntent && enVerb) || deAction; >>> fn name=describeAgentAction params="name:string, input:Record" returns="string" export=true diff --git a/packages/cli/src/kern/commands/agent-guide-text.kern b/packages/cli/src/kern/commands/agent-guide-text.kern index a1dc4c4c..5f66da9a 100644 --- a/packages/cli/src/kern/commands/agent-guide-text.kern +++ b/packages/cli/src/kern/commands/agent-guide-text.kern @@ -23,6 +23,7 @@ fn name=agentGuideMarkdown returns=string export=true '- `agon think "" [--strategy linear|reflexion] [--steps N] [--branches N]` — sequential thinking: decompose a problem into structured thoughts (reflexion forces a self-critique+revision per step; --branches explores alternatives), surface open questions, and emit a refined spec to hand to `agon goal`. Use to think before acting, or for engines with weak built-in reasoning.', '- `agon nero "" [--reasoning ""] [--focus ""] [--confidence N]` — adversarial self-challenge. The top-rated CRITIC (picked by tribunal-discipline rating, not the best builder) attacks your decision with concrete failure scenarios and returns a verdict (FLAWED | PROCEED WITH CAUTION | SOUND) plus its own confidence the original is correct. Use this INSTEAD of an internal evil-twin / devil\'s-advocate pass — it gives you a real second model, not your own reasoning mirrored.', '- `agon research "" [--count N] [--engine ]` — keyless, web-grounded, CITED research. Agon (not the model) discovers sources via first-party endpoints that need NO API key — npm registry, GitHub repo search, MDN, IETF/RFC datatracker, Stack Overflow, Wikipedia — WebFetches them, an engine drafts an answer grounded ONLY in that content with inline [n] citations, and Agon then re-fetches and VERIFIES every citation (rejecting dead/redirected/mismatched URLs). Use it to look up a library/repo/spec/Q&A/encyclopedic fact and get an answer with sources you can trust. A truly-general web query (no keyless lane) reports that rather than guessing.', + '- `agon chrome ""` (machine: `agon call chrome "" [--auto-approve] [--engine ]`) — drive the USER\'s own browser to research or check a page design (navigate / read / screenshot / click / type). Reuses a running agon (serve/REPL) the side panel is attached to, else embeds a transient bridge just for the run. Read-only page tools need no approval; add `--auto-approve` so a non-interactive caller can perform page-changing actions (without it they\'re denied). Requires the Agon browser extension with its side panel open + attached; with no panel attached the brain answers in text only.', '- `agon review ` — non-interactive multi-engine code review.', '- `agon goal "" --queue --gate ""` — autonomous controller: drives a task queue to completion unattended, looping build -> witness -> gate -> review (panel + judge) -> fix -> commit per task on a goal/ branch. Bound it with `--max-hours`/`--budget`; `--push` pushes each task. Long-running (designed for 8-24h).', '- `agon conquer "" --gate ""` — supervised-autonomous BUILD of an OPEN-ENDED task. Cesar drives a pluggable builder CLI (codex/claude/agy) in agent mode turn by turn; when the builder hits a fork it asks and Cesar convenes the cheapest sufficient consult (nero/tribunal/brainstorm/council) and feeds back a compact verdict; when it claims done, a layered done-oracle runs (the `--gate` command + diff acceptance-drift + a nero falsification round) and it STOPS at a HUMAN merge gate — it never auto-merges to main. The open-ended sibling to `goal`: use `conquer` when you CANNOT write a clean discriminating oracle up front (build a whole tool), `goal` when you can. `--push` needs a clean tree; bound it with `--max-turns`/`--max-hours`.', @@ -69,6 +70,7 @@ fn name=agentGuideMarkdown returns=string export=true '- explore, no decision needed -> campfire', '- think before acting / decompose -> think', '- pressure-test your own decision -> nero', + '- check a live page / a design in your browser -> chrome', '- judge existing code -> review', '- drive a whole task queue to done -> goal', '- build a whole open-ended thing -> conquer', @@ -96,7 +98,7 @@ fn name=agonShim params="format:string" returns=string export=true doc "Per-CLI /agon slash-command shim. format is one of agy | claude | markdown." handler <<< const body = [ - 'You have access to Agon, a multi-AI orchestration CLI (forge, synthesis, brainstorm, tribunal, council, campfire, think, nero, research, review, goal, conquer).', + 'You have access to Agon, a multi-AI orchestration CLI (forge, synthesis, brainstorm, tribunal, council, campfire, think, nero, research, chrome, review, goal, conquer).', 'First run `agon agent-guide` in the shell to see exactly how to call it, then use the right Agon mode to handle the request.', 'Call agon with your normal shell/Bash tool — there is no MCP and nothing is loaded until you invoke it.', ].join('\n'); @@ -168,6 +170,7 @@ fn name=codexSkillMarkdown returns=string export=true '- `campfire`: open multi-engine discussion.', '- `think`: sequential thinking — decompose a problem (optionally branch) and surface open questions before acting.', '- `nero`: adversarial self-challenge — the top-rated critic attacks a decision and returns a verdict (FLAWED / PROCEED WITH CAUTION / SOUND).', + '- `chrome`: drive the user\'s own browser (navigate/read/screenshot/click) for research or checking a page design; needs the Agon browser extension. Machine form: `agon call chrome "" [--auto-approve]`.', '- `review`: non-interactive AI review of a diff target.', '- `goal`: autonomous task-queue execution with stronger gates.', '', diff --git a/packages/cli/src/kern/commands/agent-guide.kern b/packages/cli/src/kern/commands/agent-guide.kern index 685ac3f5..138a1d96 100644 --- a/packages/cli/src/kern/commands/agent-guide.kern +++ b/packages/cli/src/kern/commands/agent-guide.kern @@ -44,6 +44,7 @@ const name=agentGuideCommand type=any { name: 'think', cmd: 'agon think "" --strategy reflexion [--steps N --branches N]', use: 'sequential thinking — decompose a problem (optionally branch) and surface open questions before acting' }, { name: 'nero', cmd: 'agon nero "" --reasoning ""', use: 'adversarial self-challenge — the top-rated critic attacks YOUR decision and returns a verdict. Use this instead of an internal evil-twin/devils-advocate pass' }, { name: 'research', cmd: 'agon research "" [--count N] [--engine ]', use: 'keyless web-grounded, CITED research — Agon discovers sources (npm/GitHub/MDN/IETF/Stack Overflow/Wikipedia, no API key), an engine drafts an answer grounded only in them, and Agon re-fetches and VERIFIES every citation. Use to look up a library/repo/spec/Q&A/fact with sources you can trust' }, + { name: 'chrome', cmd: 'agon call chrome "" [--auto-approve] [--engine ] (direct: agon chrome "")', use: "drive the USER's own browser for research / checking a page design — reuses a running agon (serve/REPL) the side panel is attached to, else embeds a transient bridge for the run. Read-only page tools (navigate/read/screenshot) need no approval; --auto-approve lets a non-interactive caller perform page-changing actions (click/type). Requires the Agon browser extension with its side panel open" }, { name: 'review', cmd: 'agon review uncommitted', use: 'multi-engine code review' }, { name: 'goal', cmd: 'agon goal "" --queue --gate ""', use: 'autonomously drive a task queue to completion (build->review->fix->commit per task); long-running', setup: 'each task verify + --gate IS the spec: --dryRun, gate green at base, each slice RED-at-base for the right reason. Add --oracle-gate=warn|strict to have the panel try to GAME each verify before forging (strict refuses to launch if any verify is gameable) — the built-in oracle red-team.' }, { name: 'conquer', cmd: 'agon conquer "" --gate ""', use: 'supervised-autonomous BUILD of an OPEN-ENDED task: Cesar drives a pluggable builder CLI (codex/claude/agy) unattended; on a fork the builder asks and Cesar convenes the cheapest consult (nero/tribunal/brainstorm/council); on done a layered oracle runs and it STOPS at a human merge gate (never auto-merges). The open-ended sibling to goal — use it when you cannot write a clean discriminating oracle up front.', setup: 'the --gate command IS the done-oracle floor (L0). Runs in cwd; --push needs a clean tree at start.' }, diff --git a/packages/cli/src/kern/commands/call.kern b/packages/cli/src/kern/commands/call.kern index 44322036..9160ef3c 100644 --- a/packages/cli/src/kern/commands/call.kern +++ b/packages/cli/src/kern/commands/call.kern @@ -32,6 +32,7 @@ interface name=CallCommandOptions export=true field name=oracleGate type=string optional=true field name=count type=string optional=true field name=engine type=string optional=true + field name=autoApprove type=boolean optional=true interface name=BuiltCallCommands export=true field name=cwd type=string @@ -207,6 +208,19 @@ fn name=buildCallCommands params="opts:CallCommandOptions" returns="BuiltCallCom ...timeout, ...engines, ]); + } else if (workflow === 'chrome') { + // Drive the user's browser through a running agon (serve/REPL) the side panel is + // attached to, else a transient embedded bridge. Read-only page tools (read/ + // screenshot) need no approval; --auto-approve lets a non-interactive external CLI + // perform page-changing actions (click/type) unattended — without it they're denied. + // --engine is the per-turn engine override (singular; chrome takes no --engines pool). + const task = requireInput(workflow, opts.input); + commands.push([ + 'chrome', + task, + ...(opts.autoApprove ? ['--auto-approve'] : []), + ...textFlag('--engine', opts.engine), + ]); } else if (workflow === 'review') { commands.push(['review', opts.input?.trim() || 'uncommitted', ...timeout, ...engines]); } else if (workflow === 'goal') { @@ -252,7 +266,7 @@ fn name=buildCallCommands params="opts:CallCommandOptions" returns="BuiltCallCom commands.push(['forge', task, '--test', fitness, '--cwd', cwd, ...timeout, ...engines]); commands.push(['tribunal', `Review the pipeline result for: ${task}`, '--rounds', opts.rounds?.trim() || '1', ...tribunalMode, ...timeout, ...engines]); } else { - throw new Error(`Unknown call workflow: ${opts.workflow}. Use forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, research, conquer, pipeline, review, goal, doctor, or a team-* workflow.`); + throw new Error(`Unknown call workflow: ${opts.workflow}. Use forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, research, conquer, chrome, pipeline, review, goal, doctor, or a team-* workflow.`); } return { cwd, commands }; @@ -317,7 +331,7 @@ const name=callCommand type="any" args: { workflow: { type: 'positional', - description: 'Workflow: forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, conquer, pipeline, review, goal, doctor, or team-*', + description: 'Workflow: forge, brainstorm, synthesis, tribunal, council, campfire, think, nero, research, conquer, chrome, pipeline, review, goal, doctor, or team-*', required: true, }, input: { @@ -434,7 +448,12 @@ const name=callCommand type="any" }, engine: { type: 'string', - description: 'For research: force a specific engine to draft the answer (skips rating-based selection)', + description: 'For research/chrome: force a specific engine (research: draft the answer; chrome: per-turn engine override)', + }, + 'auto-approve': { + type: 'boolean', + description: 'For chrome: auto-approve page-changing actions (click/type) unattended — required for a non-interactive caller to act on the page (read-only tools never prompt)', + default: false, }, }, async run({ args }) { @@ -469,6 +488,7 @@ const name=callCommand type="any" oracleGate: args.oracleGate, count: args.count, engine: args.engine, + autoApprove: args['auto-approve'], }); } catch (err) { exitWithFailure(err instanceof Error ? err.message : String(err)); diff --git a/packages/cli/src/kern/commands/chrome.kern b/packages/cli/src/kern/commands/chrome.kern new file mode 100644 index 00000000..71fcbe6c --- /dev/null +++ b/packages/cli/src/kern/commands/chrome.kern @@ -0,0 +1,100 @@ +// ── agon chrome — drive your BROWSER from the terminal, auto-opening a bridge ── +// +// The CLI sibling of the REPL's `/chrome`. Like `agon drive`, it drives ONE browser +// turn through the bridge the side panel is attached to — but it RESOLVES that bridge +// the same way `/chrome` does (signals/chrome-bridge.ensureChromeBridge): +// +// 1. REUSE a live `agon serve`/REPL chrome bridge if one is running (the panel is +// already on it), else +// 2. EMBED a transient in-process bridge for this run (the same agentic ReAct brain +// `agon serve` uses) and write its 0600 connection file so the panel auto-connects. +// +// So `agon chrome ""` Just Works when the user has agon + the extension — no +// separate `agon serve` terminal. Read-only page tools (read/screenshot) need no +// approval; page-changing actions prompt in the terminal (or --auto-approve to act +// unattended; a non-interactive caller denies them). When NOTHING is running and no +// panel attaches, the embedded brain answers in text only (warned). +// +// This is also what `agon call chrome` spawns, so external CLIs (Codex/Antigravity) +// can drive the browser through agon for research / design-checking. +// +// agon chrome "open the pricing page and tell me if the hero is cluttered" +// agon chrome "..." --auto-approve act without prompting (unattended) +// agon chrome "..." -e codex per-turn engine override + +import from="citty" names="defineCommand" +import from="@kernlang/agon-core" names="ensureAgonHome,resolveWorkingDir" +import from="../signals/chrome-bridge.js" names="ensureChromeBridge,closeChromeBridge" +import from="./drive.js" names="runDrive" +import from="../blocks/output-format.js" names="info,warn,dim" + +fn name=runChrome params="opts:{ task: string; engine?: string; autoApprove: boolean }" returns="Promise" async=true export=true + doc "Drive one terminal browser-turn for `agon chrome `: resolve the bridge via ensureChromeBridge (reuse a live serve/REPL bridge, else embed a transient in-process one), then hand the bridge's url+token to the battle-tested runDrive loop (terminal render + readline approval). If we EMBEDDED the bridge, tear it down afterward (closeChromeBridge); a REUSED live bridge is left running. The embedded brain is the agentic ReAct brain — never Cesar — so a browser turn can't write the live session." + handler <<< + ensureAgonHome(); + const cwd = resolveWorkingDir(); + try { + const bridge = await ensureChromeBridge(cwd); + if (bridge.embedded && bridge.origins.length === 0) { + warn('Started an embedded bridge but no browser Origin is allowed yet — the side panel can’t attach, so this runs without page tools. Set one: `agon config chromeExtensionOrigin chrome-extension://`.'); + } else if (bridge.embedded) { + info(dim('Embedded a bridge for this run — open the Agon side panel; it auto-connects.')); + } + await runDrive({ + prompt: opts.task, + url: bridge.url, + token: bridge.token, + engine: opts.engine, + autoApprove: opts.autoApprove, + }); + } finally { + // closeChromeBridge tears down ONLY an embedded bridge we own — it reads the module + // holder, which a REUSED live serve/REPL never sets, so this is a no-op there. Calling + // it unconditionally in finally also cleans up if ensureChromeBridge threw *after* + // starting the embedded server, so an error path can't leak the loopback bridge. + await closeChromeBridge(); + } + >>> + +const name=chromeCommand type=any + handler <<< + defineCommand({ + meta: { + name: 'chrome', + description: 'Drive your browser from the terminal — reuses a running agon (serve/REPL) the side panel is on, or embeds a transient bridge (research, check a page design, navigate/read/screenshot)', + }, + args: { + task: { + type: 'positional', + description: 'What to do, e.g. "open the pricing page and tell me if the hero is cluttered"', + required: false, + }, + engine: { + type: 'string', + alias: 'e', + description: 'Per-turn engine override (defaults to the bridge session engine)', + required: false, + }, + 'auto-approve': { + type: 'boolean', + description: "Auto-approve page-changing actions (unattended). Off = prompt in the terminal.", + required: false, + }, + }, + async run({ args }: { args: { task?: string; engine?: string; 'auto-approve'?: boolean } }) { + ensureAgonHome(); + const task = typeof args.task === 'string' ? args.task.trim() : ''; + if (!task) { + info('Usage: `agon chrome ""` — drives your browser through a running agon (or a transient embedded bridge) + the open side panel.'); + info(dim(' e.g. agon chrome "open news.ycombinator.com and summarize the top 3 posts"')); + info(dim(' --auto-approve act without prompting -e per-turn engine')); + return; + } + await runChrome({ + task, + engine: typeof args.engine === 'string' ? args.engine : undefined, + autoApprove: args['auto-approve'] === true, + }); + }, + }) + >>> diff --git a/packages/cli/src/kern/handlers/chrome.kern b/packages/cli/src/kern/handlers/chrome.kern index a16eef5c..2af143a3 100644 --- a/packages/cli/src/kern/handlers/chrome.kern +++ b/packages/cli/src/kern/handlers/chrome.kern @@ -14,6 +14,7 @@ import from="@kernlang/agon-core" names="ensureAgonHome,appendMessage,resolveWor import from="node:crypto" names="randomUUID" import from="../signals/chrome-bridge.js" names="ensureChromeBridge" import from="../commands/drive.js" names="parseSseChunk,approvalTargetsClient" +import from="../models/session-results.js" names="sessionResultStore" import from="../blocks/output-format.js" names="ENGINE_COLORS" import from="../../telemetry/index.js" names="recordRun" import from="../../handlers/types.js" names="Dispatch,HandlerContext" types=true @@ -171,6 +172,17 @@ fn name=handleChrome params="task:string, dispatch:Dispatch, ctx:HandlerContext" const produced = answer.trim().length > 0; if (produced) { appendMessage(ctx.chatSession, { role: 'engine', engineId: answerEngine || 'agon', content: answer, timestamp: new Date().toISOString() }); + // Record for the Ctrl+R results pager / `agon last` — the full browser-turn + // answer lives here so the transcript stays compact. pageActivity reflects whether + // the panel actually drove the page (vs a text-only answer with no panel attached). + sessionResultStore.add({ + type: 'chrome', + timestamp: new Date().toISOString(), + question: task, + engines: answerEngine ? [answerEngine] : [], + winner: answerEngine || null, + data: { task, answer, engineId: answerEngine || 'agon', pageActivity: sawBrowserActivity }, + }); } const ok = produced && !failed && !streamFailed && !endedReason; recordRun({ mode: 'chrome', intent: task, winner: undefined, success: ok, durationMs: Date.now() - startTime, engineIds: answerEngine ? [answerEngine] : [], completionState: (failed || endedReason) ? 'aborted' : 'completed' }); diff --git a/packages/core/src/generated/models/session-result-types.ts b/packages/core/src/generated/models/session-result-types.ts index b437b402..9b00a353 100644 --- a/packages/core/src/generated/models/session-result-types.ts +++ b/packages/core/src/generated/models/session-result-types.ts @@ -79,12 +79,23 @@ export interface ResearchResultData { citationsTotal: number; } +/** + * A browser-driving turn (`/chrome` / `agon chrome`): the agentic ReAct brain answered using the user's own browser through the side panel. answer is the engine's reply; pageActivity is true when page tools actually ran (the panel was attached and acted on the page), false when no page tools ran this turn — either no panel was attached, or the brain answered in text without needing one. + */ // @kern-source: session-result-types:60 +export interface ChromeResultData { + task: string; + answer: string; + engineId: string; + pageActivity: boolean; +} + +// @kern-source: session-result-types:67 export interface SessionResult { - type: 'brainstorm' | 'campfire' | 'tribunal' | 'forge' | 'think' | 'council' | 'synthesis' | 'nero' | 'review' | 'research'; + type: 'brainstorm' | 'campfire' | 'tribunal' | 'forge' | 'think' | 'council' | 'synthesis' | 'nero' | 'review' | 'research' | 'chrome'; timestamp: string; question: string; engines: string[]; winner: string | null; - data: BrainstormResultData | CampfireResultData | TribunalResultData | ForgeResultData | ThinkResultData | CouncilResultData | SynthesisResultData | NeroResultData | ReviewResultData | ResearchResultData; + data: BrainstormResultData | CampfireResultData | TribunalResultData | ForgeResultData | ThinkResultData | CouncilResultData | SynthesisResultData | NeroResultData | ReviewResultData | ResearchResultData | ChromeResultData; } diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts index 4cf7ae16..22e04898 100644 --- a/packages/core/src/index.ts +++ b/packages/core/src/index.ts @@ -331,7 +331,7 @@ export { EngineHealth, engineHealth, classifyDispatchFailure } from './engine-he export type { EngineHealthRecord } from './engine-health.js'; export type { ValidatedEngineDefinition } from './schemas/engine-schema.js'; export { sessionContext } from './session-context.js'; -export type { SessionResult, BrainstormResultData, CampfireResultData, TribunalResultData, ForgeResultData, ThinkResultData, CouncilResultData, SynthesisResultData, NeroResultData, ReviewResultData, ResearchResultData } from './generated/models/session-result-types.js'; +export type { SessionResult, BrainstormResultData, CampfireResultData, TribunalResultData, ForgeResultData, ThinkResultData, CouncilResultData, SynthesisResultData, NeroResultData, ReviewResultData, ResearchResultData, ChromeResultData } from './generated/models/session-result-types.js'; export { splitPromptBlocks, mergeBlocksByRole, } from './prompt-builder.js'; diff --git a/packages/core/src/kern/models/session-result-types.kern b/packages/core/src/kern/models/session-result-types.kern index 57df1666..92da618d 100644 --- a/packages/core/src/kern/models/session-result-types.kern +++ b/packages/core/src/kern/models/session-result-types.kern @@ -57,10 +57,17 @@ interface name=ResearchResultData export=true field name=citationsVerified type=number field name=citationsTotal type=number +interface name=ChromeResultData export=true + doc "A browser-driving turn (`/chrome` / `agon chrome`): the agentic ReAct brain answered using the user's own browser through the side panel. answer is the engine's reply; pageActivity is true when page tools actually ran (the panel was attached and acted on the page), false when no page tools ran this turn — either no panel was attached, or the brain answered in text without needing one." + field name=task type=string + field name=answer type=string + field name=engineId type=string + field name=pageActivity type=boolean + interface name=SessionResult export=true - field name=type type="'brainstorm' | 'campfire' | 'tribunal' | 'forge' | 'think' | 'council' | 'synthesis' | 'nero' | 'review' | 'research'" + field name=type type="'brainstorm' | 'campfire' | 'tribunal' | 'forge' | 'think' | 'council' | 'synthesis' | 'nero' | 'review' | 'research' | 'chrome'" field name=timestamp type=string field name=question type=string field name=engines type="string[]" field name=winner type="string | null" - field name=data type="BrainstormResultData | CampfireResultData | TribunalResultData | ForgeResultData | ThinkResultData | CouncilResultData | SynthesisResultData | NeroResultData | ReviewResultData | ResearchResultData" + field name=data type="BrainstormResultData | CampfireResultData | TribunalResultData | ForgeResultData | ThinkResultData | CouncilResultData | SynthesisResultData | NeroResultData | ReviewResultData | ResearchResultData | ChromeResultData" diff --git a/tests/unit/agentic-brain-client.test.ts b/tests/unit/agentic-brain-client.test.ts index 8b2515fc..96cc751f 100644 --- a/tests/unit/agentic-brain-client.test.ts +++ b/tests/unit/agentic-brain-client.test.ts @@ -113,6 +113,25 @@ describe('agent prompt + transcript helpers', () => { expect(looksLikeActionIntent('Done. Let me know if you want me to review a specific section.')).toBe(false); expect(looksLikeActionIntent('Here is my detailed review of the page. '.repeat(40))).toBe(false); // too long = a real answer }); + + it('looksLikeActionIntent catches German narration (the panel is multilingual)', () => { + // The exact failure from the field: glm-5.2 narrated a job search in German and stopped, + // never calling a tool — the English-only matcher accepted it as a final answer. + expect(looksLikeActionIntent( + 'Ich suche auf LinkedIn Jobs nach passenden Stellen für dich. Basierend auf deinem Profil starte ich mit einer gezielten Suche.', + )).toBe(true); + expect(looksLikeActionIntent('Ich navigiere jetzt zu deinem Profil und lese die Seite.')).toBe(true); + expect(looksLikeActionIntent('Ich öffne die Stellenseite und tippe deine Suchbegriffe ein.')).toBe(true); // umlaut-initial verb must match + }); + + it('looksLikeActionIntent does NOT nudge a German/English FINAL answer (advice, not action)', () => { + expect(looksLikeActionIntent('Dein Profil sieht stark aus — klare Überschrift und ein gutes Foto.')).toBe(false); + // "ich finde X gut" = opinion, not self-action; bare "jetzt" must not trigger. + expect(looksLikeActionIntent('Jetzt sieht dein Profil besser aus und ich finde den Abschnitt hilfreich.')).toBe(false); + // "such"/"enter" as ordinary English words must not be read as verbs. + expect(looksLikeActionIntent('This is such a strong profile; no changes needed.')).toBe(false); + expect(looksLikeActionIntent('That headline is entertainment-industry specific and reads well.')).toBe(false); + }); }); describe('AgenticTurnBrainClient — the ReAct loop', () => { @@ -226,6 +245,36 @@ describe('AgenticTurnBrainClient — the ReAct loop', () => { expect(result.responded).toBe(true); }); + it('drives a multi-site browse from a GERMAN narration: nudge → navigate → read → navigate → read → answer', async () => { + // The field scenario: the brain narrates a job search in German, then (once nudged) actually + // switches sites and reads each. Proves the loop does autonomous multi-step browsing — open a + // site, check it, open another, check it — not just a single read. + const navSpec: CapabilitySpec = { name: 'navigate', description: 'navigate the tab to a url', inputSchema: { url: 'string' }, isReadOnly: false, isDestructive: true }; + const client = makeAgent([ + 'Ich öffne zuerst die LinkedIn-Jobs-Seite und suche passende Stellen für dich.', // German narration, no tool → nudge + `${MARK} {"name":"navigate","input":{"url":"https://www.linkedin.com/jobs/"}}`, // switch to site 1 + `${MARK} {"name":"readPage","input":{}}`, // check site 1 + `${MARK} {"name":"navigate","input":{"url":"https://www.linkedin.com/jobs/view/42"}}`, // switch to site 2 + `${MARK} {"name":"readPage","input":{}}`, // check site 2 + 'Ich habe zwei passende Stellen gefunden, die zu AI Tooling und React passen.', // final answer + ]); + await client.open({ sessionId: 's', engineId: 'zai-coding-plan-glm-5.2', cwd: '/tmp' }); + await client.registerCapability({ sessionId: 's', clientId: 'c', spec: navSpec }); + await client.registerCapability({ sessionId: 's', clientId: 'c', spec: readSpec() }); + + let approvals = 0; + const { events, result } = await driveAgent(client, client.runTurn(req('t1', 'kannst du mir gute jobs suchen')), { + capability: () => ({ ok: true, output: 'PAGE CONTENT' }), + approval: () => { approvals++; return 'approve-session'; }, // approve navigate once, for the session + }); + + const caps = events.filter((e) => e.kind === 'capability-request'); + expect(events.some((e) => e.kind === 'notice' && /tool call/.test((e as { message: string }).message))).toBe(true); // German narration WAS nudged + expect(caps.length).toBe(4); // navigate, readPage, navigate, readPage — it switched sites twice and checked each + expect(approvals).toBe(1); // the 2nd navigate isn't re-gated (approve-session) + expect(result.responded).toBe(true); + }); + it('gives up nudging after the retry budget and returns the prose (no infinite loop)', async () => { const client = makeAgent(['Let me click the button.']); // always narrates, never emits a tool await client.open({ sessionId: 's', engineId: 'claude', cwd: '/tmp' }); diff --git a/tests/unit/call-command.test.ts b/tests/unit/call-command.test.ts index f67878a7..665b0d7d 100644 --- a/tests/unit/call-command.test.ts +++ b/tests/unit/call-command.test.ts @@ -175,6 +175,32 @@ describe('agon call command mapping', () => { expect(commands[0]).not.toContain('--oracle-gate'); }); + it('maps chrome to the browser-driving bridge', () => { + expect(buildCallCommands({ workflow: 'chrome', input: 'check the pricing page design' }).commands).toEqual([ + ['chrome', 'check the pricing page design'], + ]); + }); + + it('maps chrome with --auto-approve and --engine', () => { + expect(buildCallCommands({ + workflow: 'chrome', + input: 'click the login button and screenshot the form', + autoApprove: true, + engine: 'codex', + }).commands).toEqual([ + ['chrome', 'click the login button and screenshot the form', '--auto-approve', '--engine', 'codex'], + ]); + }); + + it('omits --auto-approve for chrome when not set', () => { + const { commands } = buildCallCommands({ workflow: 'chrome', input: 'read the page' }); + expect(commands[0]).not.toContain('--auto-approve'); + }); + + it('requires input for chrome', () => { + expect(() => buildCallCommands({ workflow: 'chrome' })).toThrow('agon call chrome requires a prompt/task argument'); + }); + it('maps synthesis to the synthesis command bridge', () => { expect(buildCallCommands({ workflow: 'synthesis', diff --git a/tests/unit/results-formatter.test.ts b/tests/unit/results-formatter.test.ts index 057a6f04..c440f5e5 100644 --- a/tests/unit/results-formatter.test.ts +++ b/tests/unit/results-formatter.test.ts @@ -155,6 +155,44 @@ describe('formatSessionResults', () => { expect(output).toContain('found a blocker'); }); + it('formats a chrome result with the page-driving header and answer', () => { + const results: SessionResult[] = [{ + type: 'chrome', + timestamp: '2026-04-07T22:40:00.000Z', + question: 'check the pricing page design', + engines: ['codex'], + winner: 'codex', + data: { + task: 'check the pricing page design', + answer: 'The hero is cluttered — three CTAs compete above the fold.', + engineId: 'codex', + pageActivity: true, + }, + }]; + + const output = formatSessionResults(results); + expect(output).toContain('CHROME #1'); + expect(output).toContain('check the pricing page design'); + expect(output).toContain('codex'); + expect(output).toContain('drove the page'); + expect(output).toContain('The hero is cluttered'); + }); + + it('marks a chrome result as text-only when no page tools ran', () => { + const results: SessionResult[] = [{ + type: 'chrome', + timestamp: '2026-04-07T22:41:00.000Z', + question: 'what is the WHATWG URL spec', + engines: ['agon'], + winner: 'agon', + data: { task: 'what is the WHATWG URL spec', answer: 'A living standard…', engineId: 'agon', pageActivity: false }, + }]; + + const output = formatSessionResults(results); + expect(output).toContain('CHROME #1'); + expect(output).toContain('text only (no page tools ran)'); + }); + it('numbers multiple results sequentially', () => { const results: SessionResult[] = [ {