coder · ThomasK33 · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,10 @@
 
 ## [Unreleased]
 
+### Added
+
+- `agent-tty batch <session-id>`: run an ordered sequence of input-and-`wait` steps against one session in a single invocation, supplied as a positional JSON array or `--file`. Each step is one verb (`type`, `paste`, `sendKeys`, `run`, or `wait`); every `wait` is anchored to a Wait Baseline (the Event Log sequence after the preceding input step) so it cannot match a stale screen the way a hand-written `run`/`wait`/`send-keys` loop can (ADR 0007). Fail-fast by default with a non-zero exit and a per-step `--json` envelope; `--keep-going` attempts every step. SIGINT/SIGTERM flushes a partial envelope (in-flight step `interrupted`, later steps `not-run`) ([#123](https://github.com/coder/agent-tty/issues/123)).
+
 ## [v0.3.0] - 2026-06-03
 
 ### Added

diff --git a/CONTEXT.md b/CONTEXT.md
@@ -43,6 +43,18 @@ _Avoid_: Visual wait, snapshot wait
 A render condition where the visible text content of a **Semantic Snapshot** has remained unchanged for a requested duration.
 _Avoid_: Settled screen
 
+**Batch**:
+An ordered sequence of **Batch Steps** driven through one **Command Target** in a single `batch` invocation. It runs fail-fast: the first failed **Batch Step** stops the run unless the caller opts into continuing.
+_Avoid_: Pipeline, script, macro
+
+**Batch Step**:
+A single ordered action within a **Batch**: either one input or control action sent to the **Command Target** (text, paste, key chord, or a **Waited Run**), or one **Render Wait**.
+_Avoid_: Command, instruction
+
+**Wait Baseline**:
+The **Event Log** point a **Render Wait** must observe a **Semantic Snapshot** beyond before it can match, so the wait reflects screen state from that point onward rather than stale pre-step content.
+_Avoid_: afterSeq, sequence floor
+
 **Live Host Eligible Session**:
 A **Session** where callers should ask the live session host for fresh state.
 
@@ -219,6 +231,11 @@ _Avoid_: bare "agent", "Coder agent"
 - A **Waited Run** may produce one **Run Completion**, time out for its caller, or be interrupted by **Session** exit.
 - Caller timeout does not cancel the underlying **Run Completion**; it may still be observed later to keep internal completion bytes out of artifacts.
 - After **Session** exit, an unobserved **Run Completion** can no longer arrive.
+- A **Batch** is driven through exactly one **Command Target**, resolved once for the whole invocation.
+- A **Batch** is not atomic: input already applied to a **Session** cannot be undone, so a failed **Batch** leaves the **Session** in whatever state its completed **Batch Steps** produced.
+- A **Render Wait** that is a **Batch Step** is anchored to a **Wait Baseline** equal to the **Event Log** sequence recorded after the preceding input **Batch Step**, so it cannot match a **Semantic Snapshot** that predates that step.
+- A standalone **Render Wait** may be given an explicit **Wait Baseline**; without one it matches against the latest **Semantic Snapshot**.
+- A **Batch** stops at the first failed **Batch Step** — a timed-out **Render Wait**, or an input action against a **Session** that is no longer a **Command Target** — unless the caller opts into continuing.
 - A **Promoted Hero Demo** replaces the existing recursive README demo entirely; the old recursive bundle is deleted rather than maintained in parallel.
 - The **Hero Claim Boundary** narrows the README claim after that deletion: the outer TUI is presentation, while inner `agent-tty` artifacts are the product proof.
 - An **Exploratory Hero Demo** is the preferred **Hero Demo** scenario because it shows the coding-agent TUI discovering the `agent-tty` skill and CLI before producing inner `agent-tty` proof artifacts.
@@ -284,3 +301,4 @@ _Avoid_: bare "agent", "Coder agent"
 - "helper proof" was used during design discussion, but the canonical scenario is now **Exploratory Hero Demo**: success criteria and output paths are fixed, while the coding agent chooses the command flow inside a configurable fixed review window.
 - "demo" and "proof" are not interchangeable for coding-agent recordings: a **Hero Demo** optimizes for stable presentation, while a **Recursive Dogfood Proof** optimizes for self-dogfood coverage.
 - "agent" is overloaded across four referents: this project's **Triage Agent** (a Claude Code instance), Coder's **Coder workspace agent** (the SSH/exec daemon), a generic AFK implementation agent (the actor on `ready-for-agent` issues — Phase 2 of the triage pipeline), and — in **Session Dashboard** product copy only — the external client driving a **Session** (often an AI coding agent). The last sense is deliberately **not** a domain term: the **Session Dashboard** and **Live View** are defined over **Sessions**, not agents, and the **Event Log** does not record which client sent input. Do not make the dashboard agent-aware (grouping or filtering by agent identity) without first extending the domain model. Always qualify in code comments and docs.
+- "batch" is overloaded: a **Batch** (an ordered **Batch Step** sequence driven through one **Command Target** by the `batch` command) is unrelated to a **Triage Batch** (the set of issues processed by one **AFK Triage** invocation). They live in different subsystems; always rely on the qualifier.
diff --git a/README.md b/README.md
@@ -111,7 +111,7 @@ Full reproducer, transcripts, and proof bundles are in [`dogfood/agent-uses-agen
 
 ## Command surface
 
-Every user-facing command takes `--json` and returns a stable, machine-readable envelope. The commands cover the session lifecycle (`create`, `list`, `inspect`, `destroy`, `gc`), input and control (`run`, `type`, `paste`, `send-keys`, `resize`, `signal`, `mark`), observation and capture (`wait`, `snapshot`, `screenshot`, `record export`), the live `dashboard`, and environment checks (`version`, `doctor`, `skills`).
+Every user-facing command takes `--json` and returns a stable, machine-readable envelope. The commands cover the session lifecycle (`create`, `list`, `inspect`, `destroy`, `gc`), input and control (`run`, `type`, `paste`, `send-keys`, `batch`, `resize`, `signal`, `mark`), observation and capture (`wait`, `snapshot`, `screenshot`, `record export`), the live `dashboard`, and environment checks (`version`, `doctor`, `skills`).
 
 See [`docs/USAGE.md`](./docs/USAGE.md) for the full flag reference and [`docs/TROUBLESHOOTING.md`](./docs/TROUBLESHOOTING.md) for renderer and environment issues.
 

diff --git a/docs/USAGE.md b/docs/USAGE.md
@@ -104,6 +104,80 @@ Useful flags:
 - `--exit`: wait for the process to exit.
 - `--timeout <ms>`: maximum wait time in milliseconds, with `0` meaning infinite.
 
+## `batch`
+
+Use `batch` to run an ordered sequence of input-and-`wait` steps against one session in a single invocation, instead of coordinating separate `run`/`type`/`paste`/`send-keys`/`wait` calls. Each `wait` step is anchored to a Wait Baseline — it only considers screen state produced _after_ the preceding input step — so a batch cannot race ahead and match a stale screen the way a hand-written shell loop can.
+
+```bash
+agent-tty batch <session-id> '[steps]' --json
+agent-tty batch <session-id> --file ./steps.json --json
+agent-tty batch <session-id> '[steps]' --keep-going --json
+```
+
+Steps are a JSON array; each step is exactly one verb. The shape mirrors the rest of the CLI:
+
+```json
+[
+  { "run": "nvim --clean", "noWait": true },
+  { "wait": { "screenStableMs": 1000 } },
+  { "sendKeys": ["i"] },
+  { "type": "hello" },
+  { "sendKeys": ["Escape"] },
+  { "type": ":wq" },
+  { "sendKeys": ["Enter"] },
+  { "wait": { "text": "written" } }
+]
+```
+
+- `type` / `paste`: a string of literal text.
+- `sendKeys`: a non-empty array of key names — individual named keys or single characters (e.g. `["Enter"]`, `["Ctrl+C"]`, `["Escape", "Enter"]`). Multi-character literal text such as `:wq` is not a key name; send it with a `type` step.
+- `run`: a command string, with optional `noWait` (fire-and-forget) and `timeout` (ms). A `run` step is a waited run by default.
+- `wait`: the same conditions as the `wait` command — `text`, `regex`, `screenStableMs`, `cursorRow`, `cursorCol`, and `timeout` (ms).
+
+Input source and flags:
+
+- A positional `[steps]` JSON array **xor** `--file <path>` — supply exactly one. Passing both, or neither, is an `INVALID_INPUT` error.
+- `--keep-going`: attempt every step regardless of failures. By default a batch is **fail-fast** — the first failed step (a timed-out `wait`, or input to a session that is no longer commandable) stops the run, and the remaining steps are recorded `not-run`. A batch is not atomic: already-applied input cannot be undone.
+- `--json`: emit a machine-readable command envelope.
+
+The `--json` result is a per-step envelope:
+
+```json
+{
+  "ok": true,
+  "command": "batch",
+  "result": {
+    "steps": [
+      {
+        "index": 0,
+        "kind": "run",
+        "status": "completed",
+        "seq": 4,
+        "noWait": true,
+        "runOutcome": "started",
+        "durationMs": 12
+      },
+      {
+        "index": 1,
+        "kind": "wait",
+        "status": "completed",
+        "waitBaseline": 4,
+        "matched": true,
+        "timedOut": false,
+        "capturedAtSeq": 9,
+        "durationMs": 1003
+      }
+    ],
+    "completedCount": 2,
+    "failedIndices": []
+  }
+}
+```
+
+Each step record carries its `index`, `kind`, `status` (`completed` | `failed` | `not-run` | `interrupted`), and `durationMs`. Input steps report the Event Log `seq` they produced; `wait` steps report the `waitBaseline` they were anchored to plus `matched` / `timedOut` / `matchedText` / `capturedAtSeq`. `completedCount` and `failedIndices` summarize the run. A fail-fast batch exits non-zero with the failed step's exit code (e.g. `11` for a `WAIT_TIMEOUT`); `--keep-going` exits `1` if any step failed. If the process is interrupted by SIGINT/SIGTERM, batch flushes the same envelope with the in-flight step marked `interrupted` and later steps `not-run`, then exits non-zero.
+
+The Wait Baseline fixes stale-match only. It does **not** fix echo-match: a `wait` can still match the terminal's echo of a just-typed command (the echo renders _after_ the baseline). Use a distinctive output token or a `screenStableMs` wait rather than waiting for text you just typed. Interrupting a batch mid-`wait` leaves that wait's command still running on the session (the wait is abandoned, not cancelled), exactly like a caller timeout on `run`.
+
 ## Screenshots And Recording Exports
 
 Screenshots and WebM export use the `ghostty-web` reference renderer through Playwright/Chromium.

diff --git a/docs/adr/0007-render-wait-baseline.md b/docs/adr/0007-render-wait-baseline.md
@@ -0,0 +1,63 @@
+---
+status: accepted
+---
+
+# Render waits accept an optional Wait Baseline
+
+## Context
+
+The `batch` command runs an ordered sequence of **Batch Steps** — input actions
+and **Render Waits** — through one **Command Target** with no human pacing
+between them. A **Render Wait** today (`waitForRender` in
+`src/host/hostMain.ts`) polls the renderer every 200 ms and matches against the
+**latest** **Semantic Snapshot**; `WaitForRenderParams`
+(`src/protocol/schemas.ts`) carries text/regex/screenStableMs/cursor/timeout and
+nothing about event-log position.
+
+That is fine for a human invoking `wait` once, but unsafe for steps that run
+back-to-back. A wait step can match the screen left by the _previous_ step
+before the current step has rendered (stale-match), and a `screenStableMs` wait
+can declare that _old_ screen "stable" before the new input even appears. The
+batch then advances on a false premise and sends later keystrokes into the wrong
+state. This is the property that separates `batch` from a hand-written shell
+loop, so it has to be correct.
+
+## Decision
+
+A **Render Wait** accepts an optional **Wait Baseline**: an **Event Log**
+sequence (`afterSeq`) it must observe a **Semantic Snapshot** _strictly beyond_
+before it may match or accrue **Screen Stability**. The `batch` executor sets
+each wait step's baseline to the **Event Log** sequence recorded after the
+preceding input **Batch Step**, so a wait only ever reflects state at or after
+its own step. The standalone `wait --after-seq <n>` exposes the same gate, since
+`snapshot` and `wait` already return a `capturedAtSeq` callers can chain.
+
+- `afterSeq` is added to `WaitForRenderParams`; the host poll and the offline
+  replay matcher reject any snapshot whose `capturedAtSeq` is not strictly
+  greater than the baseline.
+- With no baseline a **Render Wait** behaves exactly as before (matches the
+  latest snapshot), so the change is backward compatible.
+
+## Consequences
+
+- `batch` is meaningfully safer than scripting the existing commands in a loop:
+  each wait is anchored to its own step rather than racing the previous step's
+  screen.
+- The **Wait Baseline** fixes **stale-match** only. It does **not** fix
+  _echo-match_ — a `wait --text "foo"` matching the terminal's echo of a
+  just-typed `foo`, which renders _after_ the baseline. Echo-match stays the
+  caller's concern (use a distinctive output token or `screenStableMs`), exactly
+  as with the `wait` command today.
+- A small amount of protocol and matcher surface grows (one optional field plus
+  a `capturedAtSeq > afterSeq` gate in the live poll and the offline matcher).
+  Offline replay can only apply the floor against the single latest snapshot it
+  reconstructs.
+
+## Alternatives considered
+
+- **Require the visible text to change from a pre-step capture.** Rejected:
+  heuristic rather than exact, never matches a step that legitimately reproduces
+  identical text, and does not use the canonical **Event Log**.
+- **No baseline in v1 (match the latest screen, document the foot-gun).**
+  Rejected: it leaves `batch` only marginally safer than a shell loop, and the
+  stale-match failure is silent and order-dependent.