diff --git a/CHANGELOG.md b/CHANGELOG.md index c896d03a..0fd95c69 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,10 @@ ## [Unreleased] +### Added + +- `agent-tty batch `: run an ordered sequence of input-and-`wait` steps against one session in a single invocation, supplied as a positional JSON array or `--file`. Each step is one verb (`type`, `paste`, `sendKeys`, `run`, or `wait`); every `wait` is anchored to a Wait Baseline (the Event Log sequence after the preceding input step) so it cannot match a stale screen the way a hand-written `run`/`wait`/`send-keys` loop can (ADR 0007). Fail-fast by default with a non-zero exit and a per-step `--json` envelope; `--keep-going` attempts every step. SIGINT/SIGTERM flushes a partial envelope (in-flight step `interrupted`, later steps `not-run`) ([#123](https://github.com/coder/agent-tty/issues/123)). + ## [v0.3.0] - 2026-06-03 ### Added diff --git a/CONTEXT.md b/CONTEXT.md index 595841dc..804b9f18 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -43,6 +43,18 @@ _Avoid_: Visual wait, snapshot wait A render condition where the visible text content of a **Semantic Snapshot** has remained unchanged for a requested duration. _Avoid_: Settled screen +**Batch**: +An ordered sequence of **Batch Steps** driven through one **Command Target** in a single `batch` invocation. It runs fail-fast: the first failed **Batch Step** stops the run unless the caller opts into continuing. +_Avoid_: Pipeline, script, macro + +**Batch Step**: +A single ordered action within a **Batch**: either one input or control action sent to the **Command Target** (text, paste, key chord, or a **Waited Run**), or one **Render Wait**. +_Avoid_: Command, instruction + +**Wait Baseline**: +The **Event Log** point a **Render Wait** must observe a **Semantic Snapshot** beyond before it can match, so the wait reflects screen state from that point onward rather than stale pre-step content. +_Avoid_: afterSeq, sequence floor + **Live Host Eligible Session**: A **Session** where callers should ask the live session host for fresh state. @@ -219,6 +231,11 @@ _Avoid_: bare "agent", "Coder agent" - A **Waited Run** may produce one **Run Completion**, time out for its caller, or be interrupted by **Session** exit. - Caller timeout does not cancel the underlying **Run Completion**; it may still be observed later to keep internal completion bytes out of artifacts. - After **Session** exit, an unobserved **Run Completion** can no longer arrive. +- A **Batch** is driven through exactly one **Command Target**, resolved once for the whole invocation. +- A **Batch** is not atomic: input already applied to a **Session** cannot be undone, so a failed **Batch** leaves the **Session** in whatever state its completed **Batch Steps** produced. +- A **Render Wait** that is a **Batch Step** is anchored to a **Wait Baseline** equal to the **Event Log** sequence recorded after the preceding input **Batch Step**, so it cannot match a **Semantic Snapshot** that predates that step. +- A standalone **Render Wait** may be given an explicit **Wait Baseline**; without one it matches against the latest **Semantic Snapshot**. +- A **Batch** stops at the first failed **Batch Step** — a timed-out **Render Wait**, or an input action against a **Session** that is no longer a **Command Target** — unless the caller opts into continuing. - A **Promoted Hero Demo** replaces the existing recursive README demo entirely; the old recursive bundle is deleted rather than maintained in parallel. - The **Hero Claim Boundary** narrows the README claim after that deletion: the outer TUI is presentation, while inner `agent-tty` artifacts are the product proof. - An **Exploratory Hero Demo** is the preferred **Hero Demo** scenario because it shows the coding-agent TUI discovering the `agent-tty` skill and CLI before producing inner `agent-tty` proof artifacts. @@ -284,3 +301,4 @@ _Avoid_: bare "agent", "Coder agent" - "helper proof" was used during design discussion, but the canonical scenario is now **Exploratory Hero Demo**: success criteria and output paths are fixed, while the coding agent chooses the command flow inside a configurable fixed review window. - "demo" and "proof" are not interchangeable for coding-agent recordings: a **Hero Demo** optimizes for stable presentation, while a **Recursive Dogfood Proof** optimizes for self-dogfood coverage. - "agent" is overloaded across four referents: this project's **Triage Agent** (a Claude Code instance), Coder's **Coder workspace agent** (the SSH/exec daemon), a generic AFK implementation agent (the actor on `ready-for-agent` issues — Phase 2 of the triage pipeline), and — in **Session Dashboard** product copy only — the external client driving a **Session** (often an AI coding agent). The last sense is deliberately **not** a domain term: the **Session Dashboard** and **Live View** are defined over **Sessions**, not agents, and the **Event Log** does not record which client sent input. Do not make the dashboard agent-aware (grouping or filtering by agent identity) without first extending the domain model. Always qualify in code comments and docs. +- "batch" is overloaded: a **Batch** (an ordered **Batch Step** sequence driven through one **Command Target** by the `batch` command) is unrelated to a **Triage Batch** (the set of issues processed by one **AFK Triage** invocation). They live in different subsystems; always rely on the qualifier. diff --git a/README.md b/README.md index f0e5cf2a..bf63fc40 100644 --- a/README.md +++ b/README.md @@ -111,7 +111,7 @@ Full reproducer, transcripts, and proof bundles are in [`dogfood/agent-uses-agen ## Command surface -Every user-facing command takes `--json` and returns a stable, machine-readable envelope. The commands cover the session lifecycle (`create`, `list`, `inspect`, `destroy`, `gc`), input and control (`run`, `type`, `paste`, `send-keys`, `resize`, `signal`, `mark`), observation and capture (`wait`, `snapshot`, `screenshot`, `record export`), the live `dashboard`, and environment checks (`version`, `doctor`, `skills`). +Every user-facing command takes `--json` and returns a stable, machine-readable envelope. The commands cover the session lifecycle (`create`, `list`, `inspect`, `destroy`, `gc`), input and control (`run`, `type`, `paste`, `send-keys`, `batch`, `resize`, `signal`, `mark`), observation and capture (`wait`, `snapshot`, `screenshot`, `record export`), the live `dashboard`, and environment checks (`version`, `doctor`, `skills`). See [`docs/USAGE.md`](./docs/USAGE.md) for the full flag reference and [`docs/TROUBLESHOOTING.md`](./docs/TROUBLESHOOTING.md) for renderer and environment issues. diff --git a/docs/USAGE.md b/docs/USAGE.md index 519e6625..ae2c3d5f 100644 --- a/docs/USAGE.md +++ b/docs/USAGE.md @@ -104,6 +104,80 @@ Useful flags: - `--exit`: wait for the process to exit. - `--timeout `: maximum wait time in milliseconds, with `0` meaning infinite. +## `batch` + +Use `batch` to run an ordered sequence of input-and-`wait` steps against one session in a single invocation, instead of coordinating separate `run`/`type`/`paste`/`send-keys`/`wait` calls. Each `wait` step is anchored to a Wait Baseline — it only considers screen state produced _after_ the preceding input step — so a batch cannot race ahead and match a stale screen the way a hand-written shell loop can. + +```bash +agent-tty batch '[steps]' --json +agent-tty batch --file ./steps.json --json +agent-tty batch '[steps]' --keep-going --json +``` + +Steps are a JSON array; each step is exactly one verb. The shape mirrors the rest of the CLI: + +```json +[ + { "run": "nvim --clean", "noWait": true }, + { "wait": { "screenStableMs": 1000 } }, + { "sendKeys": ["i"] }, + { "type": "hello" }, + { "sendKeys": ["Escape"] }, + { "type": ":wq" }, + { "sendKeys": ["Enter"] }, + { "wait": { "text": "written" } } +] +``` + +- `type` / `paste`: a string of literal text. +- `sendKeys`: a non-empty array of key names — individual named keys or single characters (e.g. `["Enter"]`, `["Ctrl+C"]`, `["Escape", "Enter"]`). Multi-character literal text such as `:wq` is not a key name; send it with a `type` step. +- `run`: a command string, with optional `noWait` (fire-and-forget) and `timeout` (ms). A `run` step is a waited run by default. +- `wait`: the same conditions as the `wait` command — `text`, `regex`, `screenStableMs`, `cursorRow`, `cursorCol`, and `timeout` (ms). + +Input source and flags: + +- A positional `[steps]` JSON array **xor** `--file ` — supply exactly one. Passing both, or neither, is an `INVALID_INPUT` error. +- `--keep-going`: attempt every step regardless of failures. By default a batch is **fail-fast** — the first failed step (a timed-out `wait`, or input to a session that is no longer commandable) stops the run, and the remaining steps are recorded `not-run`. A batch is not atomic: already-applied input cannot be undone. +- `--json`: emit a machine-readable command envelope. + +The `--json` result is a per-step envelope: + +```json +{ + "ok": true, + "command": "batch", + "result": { + "steps": [ + { + "index": 0, + "kind": "run", + "status": "completed", + "seq": 4, + "noWait": true, + "runOutcome": "started", + "durationMs": 12 + }, + { + "index": 1, + "kind": "wait", + "status": "completed", + "waitBaseline": 4, + "matched": true, + "timedOut": false, + "capturedAtSeq": 9, + "durationMs": 1003 + } + ], + "completedCount": 2, + "failedIndices": [] + } +} +``` + +Each step record carries its `index`, `kind`, `status` (`completed` | `failed` | `not-run` | `interrupted`), and `durationMs`. Input steps report the Event Log `seq` they produced; `wait` steps report the `waitBaseline` they were anchored to plus `matched` / `timedOut` / `matchedText` / `capturedAtSeq`. `completedCount` and `failedIndices` summarize the run. A fail-fast batch exits non-zero with the failed step's exit code (e.g. `11` for a `WAIT_TIMEOUT`); `--keep-going` exits `1` if any step failed. If the process is interrupted by SIGINT/SIGTERM, batch flushes the same envelope with the in-flight step marked `interrupted` and later steps `not-run`, then exits non-zero. + +The Wait Baseline fixes stale-match only. It does **not** fix echo-match: a `wait` can still match the terminal's echo of a just-typed command (the echo renders _after_ the baseline). Use a distinctive output token or a `screenStableMs` wait rather than waiting for text you just typed. Interrupting a batch mid-`wait` leaves that wait's command still running on the session (the wait is abandoned, not cancelled), exactly like a caller timeout on `run`. + ## Screenshots And Recording Exports Screenshots and WebM export use the `ghostty-web` reference renderer through Playwright/Chromium. diff --git a/docs/adr/0007-render-wait-baseline.md b/docs/adr/0007-render-wait-baseline.md new file mode 100644 index 00000000..cc910e1c --- /dev/null +++ b/docs/adr/0007-render-wait-baseline.md @@ -0,0 +1,63 @@ +--- +status: accepted +--- + +# Render waits accept an optional Wait Baseline + +## Context + +The `batch` command runs an ordered sequence of **Batch Steps** — input actions +and **Render Waits** — through one **Command Target** with no human pacing +between them. A **Render Wait** today (`waitForRender` in +`src/host/hostMain.ts`) polls the renderer every 200 ms and matches against the +**latest** **Semantic Snapshot**; `WaitForRenderParams` +(`src/protocol/schemas.ts`) carries text/regex/screenStableMs/cursor/timeout and +nothing about event-log position. + +That is fine for a human invoking `wait` once, but unsafe for steps that run +back-to-back. A wait step can match the screen left by the _previous_ step +before the current step has rendered (stale-match), and a `screenStableMs` wait +can declare that _old_ screen "stable" before the new input even appears. The +batch then advances on a false premise and sends later keystrokes into the wrong +state. This is the property that separates `batch` from a hand-written shell +loop, so it has to be correct. + +## Decision + +A **Render Wait** accepts an optional **Wait Baseline**: an **Event Log** +sequence (`afterSeq`) it must observe a **Semantic Snapshot** _strictly beyond_ +before it may match or accrue **Screen Stability**. The `batch` executor sets +each wait step's baseline to the **Event Log** sequence recorded after the +preceding input **Batch Step**, so a wait only ever reflects state at or after +its own step. The standalone `wait --after-seq ` exposes the same gate, since +`snapshot` and `wait` already return a `capturedAtSeq` callers can chain. + +- `afterSeq` is added to `WaitForRenderParams`; the host poll and the offline + replay matcher reject any snapshot whose `capturedAtSeq` is not strictly + greater than the baseline. +- With no baseline a **Render Wait** behaves exactly as before (matches the + latest snapshot), so the change is backward compatible. + +## Consequences + +- `batch` is meaningfully safer than scripting the existing commands in a loop: + each wait is anchored to its own step rather than racing the previous step's + screen. +- The **Wait Baseline** fixes **stale-match** only. It does **not** fix + _echo-match_ — a `wait --text "foo"` matching the terminal's echo of a + just-typed `foo`, which renders _after_ the baseline. Echo-match stays the + caller's concern (use a distinctive output token or `screenStableMs`), exactly + as with the `wait` command today. +- A small amount of protocol and matcher surface grows (one optional field plus + a `capturedAtSeq > afterSeq` gate in the live poll and the offline matcher). + Offline replay can only apply the floor against the single latest snapshot it + reconstructs. + +## Alternatives considered + +- **Require the visible text to change from a pre-step capture.** Rejected: + heuristic rather than exact, never matches a step that legitimately reproduces + identical text, and does not use the canonical **Event Log**. +- **No baseline in v1 (match the latest screen, document the foot-gun).** + Rejected: it leaves `batch` only marginally safer than a shell loop, and the + stale-match failure is silent and order-dependent. diff --git a/docs/prd/batch-command/PRD.md b/docs/prd/batch-command/PRD.md new file mode 100644 index 00000000..dc2c0dd5 --- /dev/null +++ b/docs/prd/batch-command/PRD.md @@ -0,0 +1,84 @@ +# PRD: `batch` command + +## Problem Statement + +Driving a terminal through `agent-tty` today means one CLI invocation per action: `run`, then `wait`, then `send-keys`, then `wait` again, each a separate process. For an AI coding agent — or a human — scripting a multi-step interaction with a **Session**, that has two recurring hazards: + +- There is no safe way to express "do this, then wait for _its_ effect, then do the next thing" as one unit. A **Render Wait** issued right after input can match the screen left by the _previous_ action before the new one has rendered, so the script races ahead and sends later keystrokes into the wrong screen. +- The per-action ceremony is verbose, and every caller has to re-implement the same failure handling (stop when a wait times out) and the same sequencing by hand in a shell loop. + +## Solution + +A new `batch` command runs an ordered sequence of **Batch Steps** against one **Command Target** in a single invocation. Each **Batch Step** is exactly one action — `type`, `paste`, `send-keys`, `run`, or a `wait` (**Render Wait**). The caller supplies the steps as a JSON array, either inline or from a file. + +Every `wait` step is anchored to a **Wait Baseline**: it only considers screen state produced _after_ the preceding input step, so a **Batch** is meaningfully safer than a hand-written loop — it cannot match a stale pre-step screen. A **Batch** is fail-fast by default: the first failed **Batch Step** (a timed-out **Render Wait**, or input to a **Session** that is no longer a **Command Target**) stops the run, so later steps never fire against an unexpected screen. + +## User Stories + +1. As an AI coding agent, I want to run an ordered sequence of terminal actions in one `batch` call, so that I don't coordinate many separate CLI invocations. +2. As an AI coding agent, I want each `wait` step to observe only the screen produced after the preceding input step, so that my batch never matches a stale screen and races ahead. +3. As an AI coding agent, I want to drive an interactive TUI — open it, wait for it to settle, send key chords, wait for a label — in a single batch, so that I can reproduce a TUI workflow deterministically. +4. As a human automating a shell setup, I want to chain `run` and `wait` steps in one command, so that I can express a setup sequence without a sleep-and-grep shell loop. +5. As a caller, I want a batch to stop at the first failed step by default, so that later keystrokes don't land in an unexpected screen state. +6. As a caller, I want a `--keep-going` option, so that I can run a best-effort batch that attempts every step regardless of failures. +7. As a caller, I want the result to tell me exactly which steps completed and which one failed, so that I can diagnose where the interaction broke, since a batch is not atomic and already-sent input cannot be undone. +8. As a caller, I want to supply steps either as an inline positional JSON string or via `--file`, so that I can choose between quick one-offs and reusable step files. +9. As a caller, I want a clear validation error when I pass both inline steps and `--file`, or neither, so that ambiguous input fails fast with a helpful message. +10. As a caller, I want each step to be a single action with one verb, so that the step format is unambiguous and mirrors the rest of the CLI. +11. As a caller, I want `wait` steps to reuse the exact conditions of the `wait` command (text, regex, screen stability, cursor, timeout), so that I don't learn a second wait vocabulary. +12. As a caller using `--json`, I want a stable machine-readable envelope with per-step outcomes, so that I can program against the batch result. +13. As a caller, I want a non-zero exit code when a batch fails fast, so that scripts and agents can detect failure without parsing output. +14. As an agent, I want a `run` step to behave as a **Waited Run** by default and to support a no-wait option, so that command-completion semantics match the standalone `run`. +15. As an agent, I want to set a per-`wait`-step timeout, so that different steps can wait for different durations. +16. As a developer of agent-tty, I want the standalone `wait` command to also accept an explicit **Wait Baseline**, so that the primitive is reusable outside batch (for example, chaining a wait after a captured sequence from `snapshot`). +17. As a caller, I want a batch to target one **Command Target** resolved once at the start, so that the whole sequence applies to a single consistent **Session**. +18. As a caller, I want a batch to stop if the target **Session** exits or becomes non-commandable mid-sequence, so that I never send input to a dead session. +19. As a caller, I want the echo-match limitation documented, so that I know a `wait` can still match the echo of a just-typed command and write distinctive waits or use screen stability instead. +20. As an agent, I want batch to work uniformly whether I am driving a shell or a full-screen TUI, so that one mechanism covers both. +21. As a maintainer, I want the batch orchestration logic to be testable without a real PTY or renderer, so that ordering, baseline threading, and fail-fast are covered by fast, isolated tests. + +## Implementation Decisions + +- A new `batch ` command. Steps are supplied as a positional JSON string XOR a `--file ` (mutually exclusive — a validation error if both or neither are given), mirroring the existing input-source convention. No stdin source in v1. +- A **Batch Step** is a tagged union with exactly one verb key: `type`, `paste`, `send-keys` (a list of key names), `run` (a **Waited Run** that supports a no-wait option), or `wait` (a **Render Wait** carrying the standard conditions: text, regex, screen-stable duration, cursor row/col, timeout). `send-keys` and `run` are modeled distinctly rather than as uniform text steps. +- **Wait Baseline**: the **Render Wait** parameters gain an optional event-log sequence floor. The live host poll and the offline replay matcher reject any **Semantic Snapshot** whose captured sequence is not strictly greater than the baseline. The batch executor records the **Event Log** sequence after each input step and passes it as the next `wait` step's baseline. The standalone `wait` command also exposes this baseline as a flag. This decision is recorded in ADR 0007. It fixes stale-match; it does not fix echo-match. +- The executor runs **client-side**: `batch` orchestrates the existing per-step input and **Render Wait** operations and threads baselines between them. Input results return their **Event Log** sequence so the executor can anchor the following wait; `run`, `send-keys`, and `mark` already return one, and the `type` and `paste` results gain it. +- **Fail-fast** by default: the first failed **Batch Step** stops the run and yields a non-zero exit. `--keep-going` attempts every step regardless. A **Batch** is not atomic; the result reports which steps completed. This default deliberately diverges from agent-browser's continue-by-default, because terminal steps are stateful and dependent. +- **Deep modules**: a **Batch Plan** parser (JSON to validated steps; pure) and a **Batch executor** (ordered execution, baseline threading, fail-fast, and result accumulation, driven through an injected step-driver interface so it runs without a real PTY or renderer). +- **Result envelope** (`--json`): a per-step array recording each step's index, kind, the input sequence or wait baseline, the wait outcome (matched, timed-out, matched text), and duration; plus an overall completed-step count and the failed-step index when fail-fast triggers. The **Command Target** is resolved once for the whole invocation. + +## Testing Decisions + +Good tests assert external behavior, not implementation details. + +- **Batch executor (unit).** Drive the executor with a fake step-driver and assert ordering, **Wait Baseline** threading (each wait receives the prior input step's sequence), fail-fast versus `--keep-going`, and result accumulation — with no PTY or renderer. The current render-wait matcher unit tests are prior art for pure-logic coverage. +- **Batch Plan parser (unit).** Assert the tagged-union step kinds parse, the positional-XOR-file rule is enforced, and malformed or empty plans are rejected with clear errors. +- **Wait Baseline gate (unit).** Assert the render-wait matcher never matches a **Semantic Snapshot** at or below the baseline and can match one strictly above it, covering both the live and offline paths. This guards the ADR-0007 invariant. +- **Batch CLI (integration).** Against an isolated `AGENT_TTY_HOME` with a real **Session**: a multi-step plan, the fail-fast exit code, and the `--json` envelope shape. The existing CLI integration tests are prior art. +- **Batch end-to-end.** Drive a real TUI (for example `nvim --clean`) through a batch and assert the rendered result. The existing fixture-driven e2e flows are prior art. + +## Out of Scope + +- Capture steps inside a batch (taking a snapshot or screenshot, or exporting a recording, as a step). v1 batch is input plus wait; capture stays a separate command the caller runs around the batch. +- A stdin source for steps. +- Fixing echo-match (a `wait` matching the terminal's echo of a just-typed command). The **Wait Baseline** fixes stale-match only; echo-match stays the caller's responsibility — use a distinctive output token or a screen-stability wait — exactly as with the `wait` command today. +- A host-side batch RPC or single-round-trip execution. v1 executes client-side; a host-side executor is a possible later optimization. +- Inline waits attached to input steps, and any control flow (conditionals, loops, retries). v1 is a flat, linear sequence of single-action steps. +- Atomic or transactional rollback. Already-sent input cannot be undone. + +## Further Notes + +- The `batch` verb matches the closest analog, vercel-labs/agent-browser, which uses `batch` and keeps `wait` a separate step. agent-tty's fail-fast default is the deliberate divergence. +- The domain terms **Batch**, **Batch Step**, and **Wait Baseline** are defined in the project glossary, and the **Wait Baseline** decision is ADR 0007. The glossary terms, ADR 0007, and this PRD are on branch `feat/batch-command`. +- A small illustrative plan (shape only): + + ```json + [ + { "run": "nvim --clean", "noWait": true }, + { "wait": { "screenStableMs": 1000 } }, + { "sendKeys": ["i"] }, + { "type": "hello" }, + { "sendKeys": ["Escape", ":wq", "Enter"] }, + { "wait": { "text": "written" } } + ] + ``` diff --git a/skill-data/agent-tty/SKILL.md b/skill-data/agent-tty/SKILL.md index cb1cee0b..d090d61b 100644 --- a/skill-data/agent-tty/SKILL.md +++ b/skill-data/agent-tty/SKILL.md @@ -49,6 +49,7 @@ agent-tty --home run 'command here' --json agent-tty --home type 'literal text' --json agent-tty --home paste 'multiline payload' --json agent-tty --home send-keys Enter Ctrl+C --json +agent-tty --home batch '[{"run":"htop","noWait":true},{"wait":{"screenStableMs":1000}}]' --json # Observation and proof agent-tty --home wait --text 'ready' --json @@ -71,15 +72,22 @@ agent-tty --home "$AGENT_HOME" snapshot "$SESSION_ID" --format text --json ### Drive an interactive CLI or TUI +Use `batch` to run an ordered sequence of input-and-`wait` steps in one call instead of separate `run`/`wait`/`send-keys` invocations. Each `wait` step is anchored to a Wait Baseline — it only observes screen state produced _after_ the preceding input step, so the sequence cannot race ahead and match a stale screen. A batch stops at the first failed step by default (`--keep-going` attempts every step). + ```bash AGENT_HOME="$(mktemp -d)" SESSION_ID=$(agent-tty --home "$AGENT_HOME" create --json -- /bin/bash | jq -r '.result.sessionId') -agent-tty --home "$AGENT_HOME" run "$SESSION_ID" '' --no-wait --json -agent-tty --home "$AGENT_HOME" wait "$SESSION_ID" --screen-stable-ms 1000 --json -agent-tty --home "$AGENT_HOME" send-keys "$SESSION_ID" Down Down Enter --json +agent-tty --home "$AGENT_HOME" batch "$SESSION_ID" '[ + { "run": "", "noWait": true }, + { "wait": { "screenStableMs": 1000 } }, + { "sendKeys": ["Down", "Down", "Enter"] }, + { "wait": { "text": "" } } +]' --json agent-tty --home "$AGENT_HOME" screenshot "$SESSION_ID" --json ``` +A `wait` can still match the _echo_ of a just-typed command, so use a distinctive output token or `screenStableMs` rather than waiting for text you just typed. + ### Export reviewer-facing artifacts ```bash diff --git a/src/batch/executor.ts b/src/batch/executor.ts new file mode 100644 index 00000000..d6c9155d --- /dev/null +++ b/src/batch/executor.ts @@ -0,0 +1,332 @@ +import type { BatchPlan, BatchStep } from './plan.js'; +import type { + BatchResult, + BatchStepRecord, + RunStepRecord, + WaitStepRecord, +} from './result.js'; +import type { StepDriver } from './stepDriver.js'; + +import { CliError } from '../cli/errors.js'; +import { ERROR_CODES, makeCliError } from '../protocol/errors.js'; +import { unreachable } from '../util/assert.js'; +import { unreachedStepRecord } from './result.js'; + +export interface ExecuteBatchOptions { + plan: BatchPlan; + driver: StepDriver; + keepGoing: boolean; + // Re-check commandability around each Render Wait (rejecting with a CliError) + // so a Session that dies mid-Batch fails the wait rather than racing it. + assertCommandable?: () => Promise; + onStep?: (record: BatchStepRecord) => void; +} + +function toBatchStepError(error: CliError): { code: string; message: string } { + return { code: error.code, message: error.message }; +} + +// Reframe a thrown non-CliError as INTERNAL_ERROR so it is recorded against the +// step rather than escaping and discarding the per-step BatchResult. +function reframeAsInternalError(error: unknown, index: number): CliError { + return makeCliError(ERROR_CODES.INTERNAL_ERROR, { + message: `Batch step ${String(index)} failed with an unexpected error.`, + details: { + stepIndex: index, + reason: error instanceof Error ? error.message : String(error), + }, + cause: error, + }); +} + +function classifyRunOutcome(run: { + noWait: boolean; + completed?: boolean; + timedOut?: boolean; +}): RunStepRecord['runOutcome'] { + if (run.noWait) { + return 'started'; + } + if (run.completed === true) { + return 'completed'; + } + if (run.timedOut === true) { + return 'timedOut'; + } + // A Waited Run that neither completed nor timed out was interrupted by + // Session exit (the Run Completion can no longer arrive). + return 'sessionExited'; +} + +/** + * Run an ordered Batch through the injected StepDriver, threading each input + * step's Wait Baseline into the following Render Wait. Pure over the driver; + * fail-fast unless `keepGoing`, after which later steps are recorded `not-run`. + */ +export async function executeBatch( + opts: ExecuteBatchOptions, +): Promise { + const { plan, driver, keepGoing, assertCommandable, onStep } = opts; + + const steps: BatchStepRecord[] = []; + const failedIndices: number[] = []; + let completedCount = 0; + let lastInputSeq: number | undefined; + let stopped = false; + + const finalize = (record: BatchStepRecord): void => { + steps.push(record); + if (record.status === 'completed') { + completedCount += 1; + } else if (record.status === 'failed') { + failedIndices.push(record.index); + } + onStep?.(record); + }; + + for (let index = 0; index < plan.steps.length; index += 1) { + const step = plan.steps[index]; + if (step === undefined) { + continue; + } + + if (stopped) { + finalize(notRunRecord(step, index)); + continue; + } + + const startedAt = Date.now(); + let record: BatchStepRecord; + try { + record = await runStep( + step, + index, + driver, + lastInputSeq, + startedAt, + assertCommandable, + ); + } catch (error) { + const cliError = + error instanceof CliError + ? error + : reframeAsInternalError(error, index); + record = failedRecord(step, index, Date.now() - startedAt, cliError); + } + + // A non-wait step that produced a seq advances the Wait Baseline, even a + // failed Waited Run (it still injected its command), so a following wait + // cannot stale-match the pre-run screen under --keep-going. + if (record.kind !== 'wait' && record.seq !== undefined) { + lastInputSeq = record.seq; + } + + finalize(record); + + if (record.status === 'failed' && !keepGoing) { + stopped = true; + } + } + + return { steps, completedCount, failedIndices }; +} + +async function runStep( + step: BatchStep, + index: number, + driver: StepDriver, + lastInputSeq: number | undefined, + startedAt: number, + assertCommandable: (() => Promise) | undefined, +): Promise { + switch (step.kind) { + case 'type': { + const seq = await driver.type(step.text); + return { + index, + durationMs: Date.now() - startedAt, + kind: 'type', + status: 'completed', + seq, + }; + } + case 'paste': { + const seq = await driver.paste(step.text); + return { + index, + durationMs: Date.now() - startedAt, + kind: 'paste', + status: 'completed', + seq, + }; + } + case 'sendKeys': { + const seq = await driver.sendKeys(step.keys); + return { + index, + durationMs: Date.now() - startedAt, + kind: 'sendKeys', + status: 'completed', + seq, + }; + } + case 'run': + return runRunStep(step, index, driver, startedAt); + case 'wait': + return runWaitStep( + step, + index, + driver, + lastInputSeq, + startedAt, + assertCommandable, + ); + default: + return unreachable(step, `batch step kind dispatch at index ${index}`); + } +} + +async function runRunStep( + step: Extract, + index: number, + driver: StepDriver, + startedAt: number, +): Promise { + const result = await driver.run(step.command, step.noWait, step.timeoutMs); + const runOutcome = classifyRunOutcome({ + noWait: step.noWait, + ...(result.completed === undefined ? {} : { completed: result.completed }), + ...(result.timedOut === undefined ? {} : { timedOut: result.timedOut }), + }); + + const failed = !step.noWait && result.completed !== true; + const base: Omit = { + index, + durationMs: Date.now() - startedAt, + kind: 'run', + status: failed ? 'failed' : 'completed', + seq: result.seq, + noWait: step.noWait, + ...(result.completed === undefined ? {} : { completed: result.completed }), + ...(result.timedOut === undefined ? {} : { timedOut: result.timedOut }), + runOutcome, + }; + if (!failed) { + return base; + } + const code = + runOutcome === 'timedOut' + ? ERROR_CODES.WAIT_TIMEOUT + : ERROR_CODES.SESSION_NOT_RUNNING; + return { + ...base, + error: toBatchStepError( + makeCliError(code, { + message: + runOutcome === 'timedOut' + ? `Waited Run at step ${String(index)} timed out before completing.` + : `Waited Run at step ${String(index)} was interrupted by Session exit before completing.`, + }), + ), + }; +} + +async function runWaitStep( + step: Extract, + index: number, + driver: StepDriver, + lastInputSeq: number | undefined, + startedAt: number, + assertCommandable: (() => Promise) | undefined, +): Promise { + await assertCommandable?.(); + + const result = await driver.wait( + step.condition, + lastInputSeq, + step.timeoutMs, + ); + + const baseline = + lastInputSeq === undefined ? {} : { waitBaseline: lastInputSeq }; + const matchedText = + result.matchedText === undefined ? {} : { matchedText: result.matchedText }; + const observations = { + matched: result.matched, + timedOut: result.timedOut, + ...matchedText, + capturedAtSeq: result.capturedAtSeq, + }; + + // A timed-out wait (equivalently an unmatched result) is not a thrown error + // from the driver, so classify it here as a failed step with its own code. + if (result.timedOut || !result.matched) { + return { + index, + durationMs: Date.now() - startedAt, + kind: 'wait', + status: 'failed', + ...baseline, + ...observations, + error: toBatchStepError( + makeCliError(ERROR_CODES.WAIT_TIMEOUT, { + message: `Render wait at step ${String(index)} timed out before its condition was met.`, + }), + ), + }; + } + + // Re-check after the match; if the Session died in that window, keep the + // observations on the failed record rather than emitting a bare error. + try { + await assertCommandable?.(); + } catch (error) { + if (error instanceof CliError) { + return { + index, + durationMs: Date.now() - startedAt, + kind: 'wait', + status: 'failed', + ...baseline, + ...observations, + error: toBatchStepError(error), + }; + } + throw error; + } + + return { + index, + durationMs: Date.now() - startedAt, + kind: 'wait', + status: 'completed', + ...baseline, + ...observations, + }; +} + +function failedRecord( + step: BatchStep, + index: number, + durationMs: number, + error: CliError, +): BatchStepRecord { + const base = { index, durationMs, status: 'failed' as const }; + const stepError = toBatchStepError(error); + switch (step.kind) { + case 'run': + return { ...base, kind: 'run', noWait: step.noWait, error: stepError }; + case 'wait': + return { ...base, kind: 'wait', error: stepError }; + case 'type': + case 'paste': + case 'sendKeys': + return { ...base, kind: step.kind, error: stepError }; + default: + return unreachable(step, `batch failed-step kind at index ${index}`); + } +} + +function notRunRecord(step: BatchStep, index: number): BatchStepRecord { + return unreachedStepRecord(step, index, 'not-run'); +} diff --git a/src/batch/plan.ts b/src/batch/plan.ts new file mode 100644 index 00000000..30000644 --- /dev/null +++ b/src/batch/plan.ts @@ -0,0 +1,225 @@ +import { z } from 'zod'; + +import type { PreparedRenderWaitCondition } from '../renderWait/matcher.js'; + +import { assertValidKeyName } from '../pty/keyEncoder.js'; +import { ERROR_CODES, makeCliError } from '../protocol/errors.js'; +import { prepareRenderWaitCondition } from '../renderWait/matcher.js'; +import { invariant, unreachable } from '../util/assert.js'; + +export type BatchStep = + | { kind: 'type'; text: string } + | { kind: 'paste'; text: string } + | { kind: 'sendKeys'; keys: string[] } + | { + kind: 'run'; + command: string; + noWait: boolean; + timeoutMs: number | undefined; + } + | { + kind: 'wait'; + condition: PreparedRenderWaitCondition; + timeoutMs: number | undefined; + }; + +export interface BatchPlan { + steps: BatchStep[]; +} + +const VERB_KEYS = ['type', 'paste', 'sendKeys', 'run', 'wait'] as const; + +type VerbKey = (typeof VERB_KEYS)[number]; + +// Default for an omitted `wait` step timeout, matching the `wait` command; an +// explicit `timeout: 0` still means infinite. Keeps an unattended Batch from +// hanging forever on a wait whose condition never appears. +const DEFAULT_WAIT_TIMEOUT_MS = 600_000; + +const TypeStepSchema = z.object({ type: z.string().min(1) }).strict(); +const PasteStepSchema = z.object({ paste: z.string().min(1) }).strict(); +const SendKeysStepSchema = z + .object({ sendKeys: z.array(z.string().min(1)).min(1) }) + .strict(); +const RunStepSchema = z + .object({ + run: z.string().min(1), + noWait: z.boolean().optional(), + timeout: z.number().int().positive().optional(), + }) + .strict(); +const WaitStepSchema = z + .object({ + wait: z + .object({ + text: z.string().optional(), + regex: z.string().optional(), + screenStableMs: z.number().int().positive().optional(), + cursorRow: z.number().int().nonnegative().optional(), + cursorCol: z.number().int().nonnegative().optional(), + timeout: z.number().int().nonnegative().optional(), + }) + .strict(), + }) + .strict(); + +function invalidInput(message: string, stepIndex?: number): never { + throw makeCliError(ERROR_CODES.INVALID_INPUT, { + message, + ...(stepIndex === undefined ? {} : { details: { stepIndex } }), + }); +} + +function isPlainObject(value: unknown): value is Record { + return typeof value === 'object' && value !== null && !Array.isArray(value); +} + +function presentVerbKeys(step: Record): VerbKey[] { + return VERB_KEYS.filter((key) => Object.hasOwn(step, key)); +} + +function parseStep(rawStep: unknown, index: number): BatchStep { + if (!isPlainObject(rawStep)) { + return invalidInput( + `Batch step ${String(index)} must be a JSON object`, + index, + ); + } + + const verbs = presentVerbKeys(rawStep); + if (verbs.length === 0) { + return invalidInput( + `Batch step ${String(index)} must have exactly one of type|paste|sendKeys|run|wait; found none`, + index, + ); + } + if (verbs.length > 1) { + return invalidInput( + `Batch step ${String(index)} must have exactly one of type|paste|sendKeys|run|wait; found ${verbs.join(', ')}`, + index, + ); + } + + const verb = verbs[0]; + invariant(verb !== undefined, 'exactly one verb key is present'); + + switch (verb) { + case 'type': + return parseTypeStep(rawStep, index); + case 'paste': + return parsePasteStep(rawStep, index); + case 'sendKeys': + return parseSendKeysStep(rawStep, index); + case 'run': + return parseRunStep(rawStep, index); + case 'wait': + return parseWaitStep(rawStep, index); + default: + return unreachable(verb, `Batch step ${String(index)} verb dispatch`); + } +} + +function unwrapStep( + schema: z.ZodType, + rawStep: unknown, + index: number, +): T { + const result = schema.safeParse(rawStep); + if (!result.success) { + throw makeCliError(ERROR_CODES.INVALID_INPUT, { + message: `Batch step ${String(index)} is invalid`, + details: { stepIndex: index, issues: result.error.issues }, + }); + } + return result.data; +} + +function parseTypeStep( + rawStep: Record, + index: number, +): BatchStep { + const data = unwrapStep(TypeStepSchema, rawStep, index); + return { kind: 'type', text: data.type }; +} + +function parsePasteStep( + rawStep: Record, + index: number, +): BatchStep { + const data = unwrapStep(PasteStepSchema, rawStep, index); + return { kind: 'paste', text: data.paste }; +} + +function parseSendKeysStep( + rawStep: Record, + index: number, +): BatchStep { + const data = unwrapStep(SendKeysStepSchema, rawStep, index); + + const keys = data.sendKeys; + for (const key of keys) { + assertValidKeyName(key); + } + return { kind: 'sendKeys', keys }; +} + +function parseRunStep( + rawStep: Record, + index: number, +): BatchStep { + const { run, noWait, timeout } = unwrapStep(RunStepSchema, rawStep, index); + return { + kind: 'run', + command: run, + noWait: noWait ?? false, + timeoutMs: timeout, + }; +} + +function parseWaitStep( + rawStep: Record, + index: number, +): BatchStep { + const { wait } = unwrapStep(WaitStepSchema, rawStep, index); + const condition = prepareRenderWaitCondition({ + text: wait.text, + regex: wait.regex, + screenStableMs: wait.screenStableMs, + cursorRow: wait.cursorRow, + cursorCol: wait.cursorCol, + }); + + // Absent -> finite default; explicit 0 -> infinite. + const timeoutMs = + wait.timeout === undefined + ? DEFAULT_WAIT_TIMEOUT_MS + : wait.timeout === 0 + ? undefined + : wait.timeout; + return { kind: 'wait', condition, timeoutMs }; +} + +/** + * Parse a JSON string into a validated Batch Plan. Pure: no fs or rpc. Every + * failure throws a CliError so the whole plan is rejected before any input is + * sent (a Batch is not atomic). + */ +export function parseBatchPlan(raw: string): BatchPlan { + let parsed: unknown; + try { + parsed = JSON.parse(raw); + } catch { + return invalidInput('Batch steps must be valid JSON.'); + } + + if (!Array.isArray(parsed)) { + return invalidInput('Batch steps must be a JSON array.'); + } + + if (parsed.length === 0) { + return invalidInput('Batch must contain at least one step.'); + } + + const steps = parsed.map((rawStep, index) => parseStep(rawStep, index)); + return { steps }; +} diff --git a/src/batch/result.ts b/src/batch/result.ts new file mode 100644 index 00000000..d8522877 --- /dev/null +++ b/src/batch/result.ts @@ -0,0 +1,139 @@ +import { z } from 'zod'; + +import type { BatchPlan, BatchStep } from './plan.js'; + +import { unreachable } from '../util/assert.js'; + +// `interrupted` is the in-flight step abandoned by a SIGINT/SIGTERM flush (its +// RPC is not awaited, so its outcome is unknown); `not-run` is a later step the +// executor never reached. Only the CLI signal handler produces `interrupted`. +export const StepStatusSchema = z.enum([ + 'completed', + 'failed', + 'not-run', + 'interrupted', +]); +export type StepStatus = z.infer; + +export const BatchStepErrorSchema = z + .object({ + code: z.string(), + message: z.string(), + }) + .strict(); +export type BatchStepError = z.infer; + +const NonNegativeIntSchema = z.number().int().nonnegative(); + +export const InputStepRecordSchema = z + .object({ + index: NonNegativeIntSchema, + durationMs: NonNegativeIntSchema, + kind: z.enum(['type', 'paste', 'sendKeys']), + status: StepStatusSchema, + seq: NonNegativeIntSchema.optional(), + error: BatchStepErrorSchema.optional(), + }) + .strict(); +export type InputStepRecord = z.infer; + +export const RunStepRecordSchema = z + .object({ + index: NonNegativeIntSchema, + durationMs: NonNegativeIntSchema, + kind: z.literal('run'), + status: StepStatusSchema, + seq: NonNegativeIntSchema.optional(), + noWait: z.boolean(), + completed: z.boolean().optional(), + timedOut: z.boolean().optional(), + runOutcome: z + .enum(['completed', 'timedOut', 'sessionExited', 'started']) + .optional(), + error: BatchStepErrorSchema.optional(), + }) + .strict(); +export type RunStepRecord = z.infer; + +export const WaitStepRecordSchema = z + .object({ + index: NonNegativeIntSchema, + durationMs: NonNegativeIntSchema, + kind: z.literal('wait'), + status: StepStatusSchema, + waitBaseline: NonNegativeIntSchema.optional(), + matched: z.boolean().optional(), + timedOut: z.boolean().optional(), + matchedText: z.string().optional(), + capturedAtSeq: NonNegativeIntSchema.optional(), + error: BatchStepErrorSchema.optional(), + }) + .strict(); +export type WaitStepRecord = z.infer; + +export const BatchStepRecordSchema = z.discriminatedUnion('kind', [ + InputStepRecordSchema, + RunStepRecordSchema, + WaitStepRecordSchema, +]); +export type BatchStepRecord = z.infer; + +export const BatchResultSchema = z + .object({ + steps: z.array(BatchStepRecordSchema), + completedCount: NonNegativeIntSchema, + failedIndices: z.array(NonNegativeIntSchema), + }) + .strict(); +export type BatchResult = z.infer; + +// Shape a record for a Batch Step that never finalized (`not-run` or +// `interrupted`): no seq, zero duration, no observed outcome. +export function unreachedStepRecord( + step: BatchStep, + index: number, + status: 'not-run' | 'interrupted', +): BatchStepRecord { + const base = { index, durationMs: 0, status }; + switch (step.kind) { + case 'run': + return { ...base, kind: 'run', noWait: step.noWait }; + case 'wait': + return { ...base, kind: 'wait' }; + case 'type': + case 'paste': + case 'sendKeys': + return { ...base, kind: step.kind }; + default: + return unreachable(step, `batch unreached-step kind at index ${index}`); + } +} + +/** + * Build a partial BatchResult from the finalized records plus the plan: the + * first unreached step (in flight when the signal arrived) is `interrupted`, + * the rest `not-run`. Lets the signal handler flush without awaiting the RPC. + */ +export function buildPartialBatchResult( + plan: BatchPlan, + recorded: readonly BatchStepRecord[], +): BatchResult { + const steps = [...recorded]; + for (let index = recorded.length; index < plan.steps.length; index += 1) { + const step = plan.steps[index]; + if (step === undefined) { + continue; + } + const status = index === recorded.length ? 'interrupted' : 'not-run'; + steps.push(unreachedStepRecord(step, index, status)); + } + + return { + steps, + completedCount: steps.filter((record) => record.status === 'completed') + .length, + failedIndices: steps + .filter((record) => record.status === 'failed') + .map((record) => record.index), + }; +} diff --git a/src/batch/stepDriver.ts b/src/batch/stepDriver.ts new file mode 100644 index 00000000..335ab25c --- /dev/null +++ b/src/batch/stepDriver.ts @@ -0,0 +1,148 @@ +import type { z } from 'zod'; + +import type { RunResult, WaitForRenderResult } from '../protocol/messages.js'; +import type { PreparedRenderWaitCondition } from '../renderWait/matcher.js'; + +import { sendRpc } from '../host/rpcClient.js'; +import { + PasteResultSchema, + RunResultSchema, + SendKeysResultSchema, + TypeResultSchema, + WaitForRenderResultSchema, +} from '../protocol/messages.js'; +import { parseValidatedResult } from '../protocol/validation.js'; +import { invariant } from '../util/assert.js'; + +/** + * The seam the Batch executor drives, injected so it runs without a real PTY or + * renderer. Input verbs resolve the Event Log seq they produced; `wait` takes a + * prior seq back as `afterSeq` (the Wait Baseline). + */ +export interface StepDriver { + type(text: string): Promise; + paste(text: string): Promise; + sendKeys(keys: string[]): Promise; + run( + command: string, + noWait: boolean, + timeoutMs: number | undefined, + ): Promise; + wait( + condition: PreparedRenderWaitCondition, + afterSeq: number | undefined, + timeoutMs: number | undefined, + ): Promise; +} + +const DEFAULT_RUN_TIMEOUT_MS = 30_000; +const NO_WAIT_RUN_TRANSPORT_TIMEOUT_MS = 10_000; +const RUN_TRANSPORT_PADDING_MS = 10_000; +const WAIT_TRANSPORT_PADDING_MS = 5_000; + +function parseOrThrow( + schema: z.ZodType, + raw: unknown, + method: string, +): T { + return parseValidatedResult( + schema, + raw, + `Unexpected response shape from the session host for "${method}".`, + ); +} + +// Build the strict `waitForRender` params: omit undefined optionals, drop the +// prepared `compiledRegex`, and take `afterSeq` from the Wait Baseline argument. +function buildWaitParams( + condition: PreparedRenderWaitCondition, + afterSeq: number | undefined, + timeoutMs: number | undefined, + rendererName: string | undefined, +): Record { + return { + ...(condition.text === undefined ? {} : { text: condition.text }), + ...(condition.regex === undefined ? {} : { regex: condition.regex }), + ...(condition.screenStableMs === undefined + ? {} + : { screenStableMs: condition.screenStableMs }), + ...(condition.cursorRow === undefined + ? {} + : { cursorRow: condition.cursorRow }), + ...(condition.cursorCol === undefined + ? {} + : { cursorCol: condition.cursorCol }), + ...(afterSeq === undefined ? {} : { afterSeq }), + ...(timeoutMs === undefined ? {} : { timeoutMs }), + ...(rendererName === undefined ? {} : { rendererName }), + }; +} + +/** + * The production StepDriver: each verb is one RPC, with transport timeouts that + * pad the host's own deadline. Unlike `runWaitCommand` it does NOT fall back to + * offline replay — a dead host mid-Batch is a failed step, so the CliError + * propagates for the executor to classify. + */ +export function createRpcStepDriver( + socketPath: string, + rendererName?: string, +): StepDriver { + invariant(socketPath.length > 0, 'socketPath must be a non-empty string'); + + return { + async type(text: string): Promise { + const raw = await sendRpc(socketPath, 'type', { text }); + return parseOrThrow(TypeResultSchema, raw, 'type').seq; + }, + + async paste(text: string): Promise { + const raw = await sendRpc(socketPath, 'paste', { text }); + return parseOrThrow(PasteResultSchema, raw, 'paste').seq; + }, + + async sendKeys(keys: string[]): Promise { + const raw = await sendRpc(socketPath, 'sendKeys', { keys }); + return parseOrThrow(SendKeysResultSchema, raw, 'sendKeys').seq; + }, + + async run( + command: string, + noWait: boolean, + timeoutMs: number | undefined, + ): Promise { + const params: Record = { command, noWait }; + if (!noWait && timeoutMs !== undefined) { + params.timeoutMs = timeoutMs; + } + const transportTimeoutMs = noWait + ? NO_WAIT_RUN_TRANSPORT_TIMEOUT_MS + : (timeoutMs ?? DEFAULT_RUN_TIMEOUT_MS) + RUN_TRANSPORT_PADDING_MS; + const raw = await sendRpc(socketPath, 'run', params, transportTimeoutMs); + return parseOrThrow(RunResultSchema, raw, 'run'); + }, + + async wait( + condition: PreparedRenderWaitCondition, + afterSeq: number | undefined, + timeoutMs: number | undefined, + ): Promise { + const params = buildWaitParams( + condition, + afterSeq, + timeoutMs, + rendererName, + ); + // timeoutMs undefined means an infinite wait (0 -> infinite transport). + const transportTimeoutMs = + timeoutMs === undefined ? 0 : timeoutMs + WAIT_TRANSPORT_PADDING_MS; + const raw = await sendRpc( + socketPath, + 'waitForRender', + params, + transportTimeoutMs, + ); + return parseOrThrow(WaitForRenderResultSchema, raw, 'waitForRender'); + }, + }; +} diff --git a/src/cli/commands/batch.ts b/src/cli/commands/batch.ts new file mode 100644 index 00000000..97e67279 --- /dev/null +++ b/src/cli/commands/batch.ts @@ -0,0 +1,341 @@ +import assert from 'node:assert/strict'; +import { lstat, readFile, stat } from 'node:fs/promises'; +import { constants as osConstants } from 'node:os'; +import process from 'node:process'; + +import type { CommandContext } from '../context.js'; +import type { BatchPlan } from '../../batch/plan.js'; +import type { BatchResult, BatchStepRecord } from '../../batch/result.js'; + +import { resolveCommandTarget } from '../commandTarget.js'; +import { exitCodeForError } from '../exitCodes.js'; +import { emitSuccess, emitSuccessSync } from '../output.js'; +import { executeBatch } from '../../batch/executor.js'; +import { parseBatchPlan } from '../../batch/plan.js'; +import { buildPartialBatchResult } from '../../batch/result.js'; +import { createRpcStepDriver } from '../../batch/stepDriver.js'; +import { ERROR_CODES, makeCliError } from '../../protocol/errors.js'; +import { MAX_INPUT_FILE_SIZE } from './inputSource.js'; + +const KEEP_GOING_FAILURE_EXIT_CODE = 1; + +const INTERRUPT_SIGNALS = ['SIGINT', 'SIGTERM'] as const; +type InterruptSignal = (typeof INTERRUPT_SIGNALS)[number]; + +// Conventional 128 + signal-number exit code (SIGINT -> 130, SIGTERM -> 143). +function signalExitCode(signal: InterruptSignal): number { + const signo = osConstants.signals[signal]; + return typeof signo === 'number' ? 128 + signo : KEEP_GOING_FAILURE_EXIT_CODE; +} + +interface CommandOptions { + context: CommandContext; + json: boolean; + sessionId: string; + steps: string | undefined; + file?: string; + keepGoing: boolean; +} + +const BATCH_USAGE = + 'Usage: agent-tty batch [steps] [--file ]'; + +function invalidInput( + message: string, + details?: Record, + cause?: unknown, +): ReturnType { + return makeCliError(ERROR_CODES.INVALID_INPUT, { + message, + ...(details === undefined ? {} : { details }), + ...(cause === undefined ? {} : { cause }), + }); +} + +function isErrnoException(error: unknown): error is NodeJS.ErrnoException { + return error instanceof Error && 'code' in error; +} + +function inputFileError( + filePath: string, + error: unknown, +): ReturnType { + if (isErrnoException(error) && error.code === 'ENOENT') { + return invalidInput( + `Steps file "${filePath}" was not found.`, + { file: filePath }, + error, + ); + } + if ( + isErrnoException(error) && + ['EACCES', 'EPERM'].includes(error.code ?? '') + ) { + return invalidInput( + `Steps file "${filePath}" is not readable.`, + { file: filePath }, + error, + ); + } + return invalidInput( + `Failed to read steps file "${filePath}".`, + { file: filePath }, + error, + ); +} + +// Resolve the raw JSON step source from the positional argument XOR --file, +// with the same file safety as resolveCommandInputText (regular file, 10 MB cap). +async function resolveBatchSource(options: CommandOptions): Promise { + if (options.steps !== undefined && options.file !== undefined) { + throw invalidInput( + `Positional [steps] argument and --file are mutually exclusive. ${BATCH_USAGE}`, + { steps: options.steps, file: options.file }, + ); + } + + if (options.steps === undefined && options.file === undefined) { + throw invalidInput( + `Missing steps. Provide either a positional [steps] JSON array or --file . ${BATCH_USAGE}`, + ); + } + + if (options.steps !== undefined) { + if (options.steps.length === 0) { + throw invalidInput('Steps must not be empty.'); + } + return options.steps; + } + + const filePath = options.file; + assert(typeof filePath === 'string', '--file must resolve to a string path'); + assert(filePath.length > 0, '--file path must be a non-empty string'); + + let fileStats: Awaited>; + try { + fileStats = await lstat(filePath); + } catch (error: unknown) { + throw inputFileError(filePath, error); + } + + if (!fileStats.isFile()) { + throw invalidInput( + `Steps file "${filePath}" must be a regular file. Directories, symlinks, and device files are not supported.`, + { file: filePath }, + ); + } + + const contentStats = await stat(filePath).catch((error: unknown) => { + throw inputFileError(filePath, error); + }); + if (contentStats.size > MAX_INPUT_FILE_SIZE) { + throw invalidInput( + `Steps file "${filePath}" exceeds the 10 MB limit for --file input.`, + { + file: filePath, + sizeBytes: contentStats.size, + maxSizeBytes: MAX_INPUT_FILE_SIZE, + }, + ); + } + + let content: string; + try { + content = await readFile(filePath, 'utf8'); + } catch (error: unknown) { + throw inputFileError(filePath, error); + } + + if (content.length === 0) { + throw invalidInput(`Steps file "${filePath}" must not be empty.`, { + file: filePath, + }); + } + + return content; +} + +function stepOutcome(record: BatchStepRecord): string { + if (record.status === 'not-run') { + return 'not-run'; + } + if (record.status === 'failed') { + return record.error === undefined + ? 'failed' + : `failed (${record.error.code})`; + } + if (record.status === 'interrupted') { + return 'interrupted'; + } + switch (record.kind) { + case 'wait': + return record.matchedText === undefined + ? 'matched' + : `matched "${record.matchedText}"`; + case 'run': + return record.runOutcome ?? 'completed'; + case 'type': + case 'paste': + case 'sendKeys': + return 'completed'; + } +} + +function firstFailedStep(result: BatchResult): BatchStepRecord | undefined { + const firstFailedIndex = result.failedIndices[0]; + if (firstFailedIndex === undefined) { + return undefined; + } + return result.steps.find((record) => record.index === firstFailedIndex); +} + +function stepFailureReason(record: BatchStepRecord): string { + if (record.status !== 'failed' || record.error === undefined) { + return 'failed'; + } + return `${record.error.code}: ${record.error.message}`; +} + +export function buildBatchLines( + result: BatchResult, + keepGoing = false, +): string[] { + const lines = result.steps.map( + (record) => + `[${String(record.index)}] ${record.kind} ${stepOutcome(record)} (${String(record.durationMs)}ms)`, + ); + lines.push( + `${String(result.completedCount)}/${String(result.steps.length)} steps completed`, + ); + + if (result.failedIndices.length > 0) { + if (keepGoing) { + lines.push(`failed steps: ${result.failedIndices.join(', ')}`); + } else { + const failed = firstFailedStep(result); + const index = result.failedIndices[0]; + lines.push( + `failed at step ${String(index)}: ${failed === undefined ? 'failed' : stepFailureReason(failed)}`, + ); + } + } + + return lines; +} + +interface InterruptFlushOptions { + plan: BatchPlan; + json: boolean; + keepGoing: boolean; +} + +/** + * Run the executor under SIGINT/SIGTERM handlers that flush a partial envelope + * (in-flight step `interrupted`, later steps `not-run`) and exit. The in-flight + * RPC is NOT awaited, so an interrupted Waited Run keeps running on the Session + * — like caller-timeout (CONTEXT.md); already-applied input cannot be undone. + */ +async function executeWithInterruptFlush( + options: InterruptFlushOptions, + run: (onStep: (record: BatchStepRecord) => void) => Promise, +): Promise { + const records: BatchStepRecord[] = []; + let flushed = false; + + const handleSignal = (signal: InterruptSignal): void => { + // A second signal during the flush must not re-enter and double-write. + if (flushed) { + return; + } + flushed = true; + + const partial = buildPartialBatchResult(options.plan, records); + // Sync flush: an async write would be truncated by the process.exit below + // for a partial envelope larger than the OS pipe buffer. + emitSuccessSync({ + command: 'batch', + json: options.json, + result: partial, + lines: buildBatchLines(partial, options.keepGoing), + }); + process.exit(signalExitCode(signal)); + }; + + const handlers = INTERRUPT_SIGNALS.map((signal) => { + const handler = (): void => { + handleSignal(signal); + }; + process.on(signal, handler); + return { signal, handler } as const; + }); + + try { + return await run((record) => { + records.push(record); + }); + } finally { + for (const { signal, handler } of handlers) { + process.off(signal, handler); + } + } +} + +export async function runBatchCommand(options: CommandOptions): Promise { + const raw = await resolveBatchSource(options); + + // Parse before resolving the target: a malformed plan fails fast without a + // live Session, and nothing is sent until the whole plan validates. + const plan = parseBatchPlan(raw); + + const target = await resolveCommandTarget({ + home: options.context.home, + sessionId: options.sessionId, + }); + + const driver = createRpcStepDriver( + target.socketPath, + options.context.rendererDefault, + ); + + // Re-resolve the target around each Render Wait (fresh manifest read) so a + // Session that dies mid-Batch fails the wait rather than racing a dead one. + const assertCommandable = async (): Promise => { + await resolveCommandTarget({ + home: options.context.home, + sessionId: options.sessionId, + }); + }; + + const result = await executeWithInterruptFlush( + { plan, json: options.json, keepGoing: options.keepGoing }, + (onStep) => + executeBatch({ + plan, + driver, + keepGoing: options.keepGoing, + assertCommandable, + onStep, + }), + ); + + // Set process.exitCode then always emitSuccess (doctor pattern), so a failed + // Batch still emits the full per-step envelope instead of a bare error. + if (result.failedIndices.length > 0) { + if (options.keepGoing) { + process.exitCode = KEEP_GOING_FAILURE_EXIT_CODE; + } else { + const failed = firstFailedStep(result); + process.exitCode = + failed?.status === 'failed' && failed.error !== undefined + ? exitCodeForError(failed.error.code) + : KEEP_GOING_FAILURE_EXIT_CODE; + } + } + + emitSuccess({ + command: 'batch', + json: options.json, + result, + lines: buildBatchLines(result, options.keepGoing), + }); +} diff --git a/src/cli/commands/paste.ts b/src/cli/commands/paste.ts index 97a2ce03..935fff92 100644 --- a/src/cli/commands/paste.ts +++ b/src/cli/commands/paste.ts @@ -1,13 +1,14 @@ import type { CommandContext } from '../context.js'; +import type { PasteResult } from '../../protocol/messages.js'; import { resolveCommandTarget } from '../commandTarget.js'; import { emitSuccess } from '../output.js'; import { sendRpc } from '../../host/rpcClient.js'; +import { PasteResultSchema } from '../../protocol/messages.js'; +import { ERROR_CODES, makeCliError } from '../../protocol/errors.js'; import { resolveCommandInputText } from './inputSource.js'; -export interface PasteResult { - [key: string]: never; -} +export type { PasteResult } from '../../protocol/messages.js'; interface CommandOptions { context: CommandContext; @@ -29,15 +30,22 @@ export async function runPasteCommand(options: CommandOptions): Promise { sessionId: options.sessionId, }); - await sendRpc(target.socketPath, 'paste', { + const rawResult: unknown = await sendRpc(target.socketPath, 'paste', { text, }); + const parsedResult = PasteResultSchema.safeParse(rawResult); + if (!parsedResult.success) { + throw makeCliError(ERROR_CODES.PROTOCOL_ERROR, { + message: 'Unexpected response from host', + details: { issues: parsedResult.error.issues }, + }); + } - const result: PasteResult = {}; + const result: PasteResult = { seq: parsedResult.data.seq }; emitSuccess({ command: 'paste', json: options.json, result, - lines: ['Pasted text into session.'], + lines: [`Pasted text into session at seq ${String(result.seq)}.`], }); } diff --git a/src/cli/commands/type.ts b/src/cli/commands/type.ts index 8b8492c5..83f20540 100644 --- a/src/cli/commands/type.ts +++ b/src/cli/commands/type.ts @@ -1,13 +1,14 @@ import type { CommandContext } from '../context.js'; +import type { TypeResult } from '../../protocol/messages.js'; import { resolveCommandTarget } from '../commandTarget.js'; import { emitSuccess } from '../output.js'; import { sendRpc } from '../../host/rpcClient.js'; +import { TypeResultSchema } from '../../protocol/messages.js'; +import { ERROR_CODES, makeCliError } from '../../protocol/errors.js'; import { resolveCommandInputText } from './inputSource.js'; -export interface TypeResult { - [key: string]: never; -} +export type { TypeResult } from '../../protocol/messages.js'; interface CommandOptions { context: CommandContext; @@ -32,15 +33,22 @@ export async function runTypeCommand(options: CommandOptions): Promise { sessionId: options.sessionId, }); - await sendRpc(target.socketPath, 'type', { + const rawResult: unknown = await sendRpc(target.socketPath, 'type', { text, }); + const parsedResult = TypeResultSchema.safeParse(rawResult); + if (!parsedResult.success) { + throw makeCliError(ERROR_CODES.PROTOCOL_ERROR, { + message: 'Unexpected response from host', + details: { issues: parsedResult.error.issues }, + }); + } - const result: TypeResult = {}; + const result: TypeResult = { seq: parsedResult.data.seq }; emitSuccess({ command: 'type', json: options.json, result, - lines: ['Typed text into session.'], + lines: [`Typed text into session at seq ${String(result.seq)}.`], }); } diff --git a/src/cli/commands/wait.ts b/src/cli/commands/wait.ts index 9bef44d5..0df5b761 100644 --- a/src/cli/commands/wait.ts +++ b/src/cli/commands/wait.ts @@ -41,6 +41,7 @@ interface CommandOptions { screenStableMs: number | undefined; cursorRow: number | undefined; cursorCol: number | undefined; + afterSeq: number | undefined; } const DEFAULT_WAIT_TIMEOUT_MS = 600_000; @@ -59,7 +60,8 @@ function isRenderWaitMode(options: CommandOptions): boolean { options.regex !== undefined || options.screenStableMs !== undefined || options.cursorRow !== undefined || - options.cursorCol !== undefined + options.cursorCol !== undefined || + options.afterSeq !== undefined ); } @@ -105,7 +107,7 @@ function buildOfflineRenderWaitResult( ): WaitForRenderResult { const match = matchRenderWaitSnapshot(preparedCondition, snapshot); - if (match.contentAndCursorMatched) { + if (match.contentAndCursorMatched && match.baselineMatched) { return { matched: match.matched, timedOut: false, @@ -127,6 +129,7 @@ function buildOfflineRenderWaitResult( screenStableMs: preparedCondition.screenStableMs, expectedCursorRow: preparedCondition.cursorRow, expectedCursorCol: preparedCondition.cursorCol, + afterSeq: preparedCondition.afterSeq, capturedAtSeq: match.capturedAtSeq, cursorRow: match.cursorRow, cursorCol: match.cursorCol, @@ -212,6 +215,16 @@ export async function runWaitCommand(options: CommandOptions): Promise { }); } + if ( + options.afterSeq !== undefined && + !isNonNegativeInteger(options.afterSeq) + ) { + throw makeCliError(ERROR_CODES.INVALID_INPUT, { + message: '--after-seq must be a non-negative integer.', + details: { afterSeq: options.afterSeq }, + }); + } + if ( options.timeout !== undefined && options.timeout !== 0 && @@ -231,6 +244,7 @@ export async function runWaitCommand(options: CommandOptions): Promise { screenStableMs: options.screenStableMs, cursorRow: options.cursorRow, cursorCol: options.cursorCol, + afterSeq: options.afterSeq, }); const effectiveTimeout = options.timeout ?? DEFAULT_WAIT_TIMEOUT_MS; @@ -240,6 +254,7 @@ export async function runWaitCommand(options: CommandOptions): Promise { screenStableMs: options.screenStableMs, cursorRow: options.cursorRow, cursorCol: options.cursorCol, + afterSeq: options.afterSeq, timeoutMs: effectiveTimeout === 0 ? undefined : effectiveTimeout, rendererName: options.context.rendererDefault, }; diff --git a/src/cli/exitCodes.ts b/src/cli/exitCodes.ts index 5c762361..eb0e7425 100644 --- a/src/cli/exitCodes.ts +++ b/src/cli/exitCodes.ts @@ -20,6 +20,7 @@ const EXIT_CODE_BY_ERROR_CODE: Readonly> = Object.freeze( [ERROR_CODES.PROTOCOL_ERROR]: 9, [ERROR_CODES.RPC_ERROR]: 9, [ERROR_CODES.REPLAY_ERROR]: 10, + [ERROR_CODES.WAIT_TIMEOUT]: 11, [ERROR_CODES.INTERNAL_ERROR]: 1, }, ); diff --git a/src/cli/main.ts b/src/cli/main.ts index 217d8ef2..adbc67c6 100644 --- a/src/cli/main.ts +++ b/src/cli/main.ts @@ -6,6 +6,7 @@ import { Command, CommanderError } from 'commander'; import type { CommandContext } from './context.js'; +import { runBatchCommand } from './commands/batch.js'; import { runCreateCommand } from './commands/create.js'; import { runDashboardCommand } from './commands/dashboard.js'; import { runDestroyCommand } from './commands/destroy.js'; @@ -492,6 +493,35 @@ async function main(): Promise { ), ); + program + .command('batch [steps]') + .description( + 'Run an ordered sequence of input-and-wait steps in one command', + ) + .option('--file ', 'Read the JSON step array from a file') + .option('--keep-going', 'Attempt every step regardless of failures', false) + .option('--json', 'Emit a JSON command envelope', false) + .action( + wrapAction( + 'batch', + async ( + sessionId: string, + steps: string | undefined, + options: { file?: string; keepGoing: boolean; json: boolean }, + context: CommandContext, + ) => { + await runBatchCommand({ + context, + json: options.json, + sessionId, + steps, + ...(options.file !== undefined ? { file: options.file } : {}), + keepGoing: options.keepGoing, + }); + }, + ), + ); + program .command('mark