Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## [Unreleased]

### Added

- `agent-tty batch <session-id>`: run an ordered sequence of input-and-`wait` steps against one session in a single invocation, supplied as a positional JSON array or `--file`. Each step is one verb (`type`, `paste`, `sendKeys`, `run`, or `wait`); every `wait` is anchored to a Wait Baseline (the Event Log sequence after the preceding input step) so it cannot match a stale screen the way a hand-written `run`/`wait`/`send-keys` loop can (ADR 0007). Fail-fast by default with a non-zero exit and a per-step `--json` envelope; `--keep-going` attempts every step. SIGINT/SIGTERM flushes a partial envelope (in-flight step `interrupted`, later steps `not-run`) ([#123](https://github.com/coder/agent-tty/issues/123)).

## [v0.3.0] - 2026-06-03

### Added
Expand Down
18 changes: 18 additions & 0 deletions CONTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,18 @@ _Avoid_: Visual wait, snapshot wait
A render condition where the visible text content of a **Semantic Snapshot** has remained unchanged for a requested duration.
_Avoid_: Settled screen

**Batch**:
An ordered sequence of **Batch Steps** driven through one **Command Target** in a single `batch` invocation. It runs fail-fast: the first failed **Batch Step** stops the run unless the caller opts into continuing.
_Avoid_: Pipeline, script, macro

**Batch Step**:
A single ordered action within a **Batch**: either one input or control action sent to the **Command Target** (text, paste, key chord, or a **Waited Run**), or one **Render Wait**.
_Avoid_: Command, instruction

**Wait Baseline**:
The **Event Log** point a **Render Wait** must observe a **Semantic Snapshot** beyond before it can match, so the wait reflects screen state from that point onward rather than stale pre-step content.
_Avoid_: afterSeq, sequence floor

**Live Host Eligible Session**:
A **Session** where callers should ask the live session host for fresh state.

Expand Down Expand Up @@ -219,6 +231,11 @@ _Avoid_: bare "agent", "Coder agent"
- A **Waited Run** may produce one **Run Completion**, time out for its caller, or be interrupted by **Session** exit.
- Caller timeout does not cancel the underlying **Run Completion**; it may still be observed later to keep internal completion bytes out of artifacts.
- After **Session** exit, an unobserved **Run Completion** can no longer arrive.
- A **Batch** is driven through exactly one **Command Target**, resolved once for the whole invocation.
- A **Batch** is not atomic: input already applied to a **Session** cannot be undone, so a failed **Batch** leaves the **Session** in whatever state its completed **Batch Steps** produced.
- A **Render Wait** that is a **Batch Step** is anchored to a **Wait Baseline** equal to the **Event Log** sequence recorded after the preceding input **Batch Step**, so it cannot match a **Semantic Snapshot** that predates that step.
- A standalone **Render Wait** may be given an explicit **Wait Baseline**; without one it matches against the latest **Semantic Snapshot**.
- A **Batch** stops at the first failed **Batch Step** — a timed-out **Render Wait**, or an input action against a **Session** that is no longer a **Command Target** — unless the caller opts into continuing.
- A **Promoted Hero Demo** replaces the existing recursive README demo entirely; the old recursive bundle is deleted rather than maintained in parallel.
- The **Hero Claim Boundary** narrows the README claim after that deletion: the outer TUI is presentation, while inner `agent-tty` artifacts are the product proof.
- An **Exploratory Hero Demo** is the preferred **Hero Demo** scenario because it shows the coding-agent TUI discovering the `agent-tty` skill and CLI before producing inner `agent-tty` proof artifacts.
Expand Down Expand Up @@ -284,3 +301,4 @@ _Avoid_: bare "agent", "Coder agent"
- "helper proof" was used during design discussion, but the canonical scenario is now **Exploratory Hero Demo**: success criteria and output paths are fixed, while the coding agent chooses the command flow inside a configurable fixed review window.
- "demo" and "proof" are not interchangeable for coding-agent recordings: a **Hero Demo** optimizes for stable presentation, while a **Recursive Dogfood Proof** optimizes for self-dogfood coverage.
- "agent" is overloaded across four referents: this project's **Triage Agent** (a Claude Code instance), Coder's **Coder workspace agent** (the SSH/exec daemon), a generic AFK implementation agent (the actor on `ready-for-agent` issues — Phase 2 of the triage pipeline), and — in **Session Dashboard** product copy only — the external client driving a **Session** (often an AI coding agent). The last sense is deliberately **not** a domain term: the **Session Dashboard** and **Live View** are defined over **Sessions**, not agents, and the **Event Log** does not record which client sent input. Do not make the dashboard agent-aware (grouping or filtering by agent identity) without first extending the domain model. Always qualify in code comments and docs.
- "batch" is overloaded: a **Batch** (an ordered **Batch Step** sequence driven through one **Command Target** by the `batch` command) is unrelated to a **Triage Batch** (the set of issues processed by one **AFK Triage** invocation). They live in different subsystems; always rely on the qualifier.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Full reproducer, transcripts, and proof bundles are in [`dogfood/agent-uses-agen

## Command surface

Every user-facing command takes `--json` and returns a stable, machine-readable envelope. The commands cover the session lifecycle (`create`, `list`, `inspect`, `destroy`, `gc`), input and control (`run`, `type`, `paste`, `send-keys`, `resize`, `signal`, `mark`), observation and capture (`wait`, `snapshot`, `screenshot`, `record export`), the live `dashboard`, and environment checks (`version`, `doctor`, `skills`).
Every user-facing command takes `--json` and returns a stable, machine-readable envelope. The commands cover the session lifecycle (`create`, `list`, `inspect`, `destroy`, `gc`), input and control (`run`, `type`, `paste`, `send-keys`, `batch`, `resize`, `signal`, `mark`), observation and capture (`wait`, `snapshot`, `screenshot`, `record export`), the live `dashboard`, and environment checks (`version`, `doctor`, `skills`).

See [`docs/USAGE.md`](./docs/USAGE.md) for the full flag reference and [`docs/TROUBLESHOOTING.md`](./docs/TROUBLESHOOTING.md) for renderer and environment issues.

Expand Down
74 changes: 74 additions & 0 deletions docs/USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,80 @@ Useful flags:
- `--exit`: wait for the process to exit.
- `--timeout <ms>`: maximum wait time in milliseconds, with `0` meaning infinite.

## `batch`

Use `batch` to run an ordered sequence of input-and-`wait` steps against one session in a single invocation, instead of coordinating separate `run`/`type`/`paste`/`send-keys`/`wait` calls. Each `wait` step is anchored to a Wait Baseline — it only considers screen state produced _after_ the preceding input step — so a batch cannot race ahead and match a stale screen the way a hand-written shell loop can.

```bash
agent-tty batch <session-id> '[steps]' --json
agent-tty batch <session-id> --file ./steps.json --json
agent-tty batch <session-id> '[steps]' --keep-going --json
```

Steps are a JSON array; each step is exactly one verb. The shape mirrors the rest of the CLI:

```json
[
{ "run": "nvim --clean", "noWait": true },
{ "wait": { "screenStableMs": 1000 } },
{ "sendKeys": ["i"] },
{ "type": "hello" },
{ "sendKeys": ["Escape"] },
{ "type": ":wq" },
{ "sendKeys": ["Enter"] },
{ "wait": { "text": "written" } }
]
```

- `type` / `paste`: a string of literal text.
- `sendKeys`: a non-empty array of key names — individual named keys or single characters (e.g. `["Enter"]`, `["Ctrl+C"]`, `["Escape", "Enter"]`). Multi-character literal text such as `:wq` is not a key name; send it with a `type` step.
- `run`: a command string, with optional `noWait` (fire-and-forget) and `timeout` (ms). A `run` step is a waited run by default.
- `wait`: the same conditions as the `wait` command — `text`, `regex`, `screenStableMs`, `cursorRow`, `cursorCol`, and `timeout` (ms).

Input source and flags:

- A positional `[steps]` JSON array **xor** `--file <path>` — supply exactly one. Passing both, or neither, is an `INVALID_INPUT` error.
- `--keep-going`: attempt every step regardless of failures. By default a batch is **fail-fast** — the first failed step (a timed-out `wait`, or input to a session that is no longer commandable) stops the run, and the remaining steps are recorded `not-run`. A batch is not atomic: already-applied input cannot be undone.
- `--json`: emit a machine-readable command envelope.

The `--json` result is a per-step envelope:

```json
{
"ok": true,
"command": "batch",
"result": {
"steps": [
{
"index": 0,
"kind": "run",
"status": "completed",
"seq": 4,
"noWait": true,
"runOutcome": "started",
"durationMs": 12
},
{
"index": 1,
"kind": "wait",
"status": "completed",
"waitBaseline": 4,
"matched": true,
"timedOut": false,
"capturedAtSeq": 9,
"durationMs": 1003
}
],
"completedCount": 2,
"failedIndices": []
}
}
```

Each step record carries its `index`, `kind`, `status` (`completed` | `failed` | `not-run` | `interrupted`), and `durationMs`. Input steps report the Event Log `seq` they produced; `wait` steps report the `waitBaseline` they were anchored to plus `matched` / `timedOut` / `matchedText` / `capturedAtSeq`. `completedCount` and `failedIndices` summarize the run. A fail-fast batch exits non-zero with the failed step's exit code (e.g. `11` for a `WAIT_TIMEOUT`); `--keep-going` exits `1` if any step failed. If the process is interrupted by SIGINT/SIGTERM, batch flushes the same envelope with the in-flight step marked `interrupted` and later steps `not-run`, then exits non-zero.

The Wait Baseline fixes stale-match only. It does **not** fix echo-match: a `wait` can still match the terminal's echo of a just-typed command (the echo renders _after_ the baseline). Use a distinctive output token or a `screenStableMs` wait rather than waiting for text you just typed. Interrupting a batch mid-`wait` leaves that wait's command still running on the session (the wait is abandoned, not cancelled), exactly like a caller timeout on `run`.

## Screenshots And Recording Exports

Screenshots and WebM export use the `ghostty-web` reference renderer through Playwright/Chromium.
Expand Down
63 changes: 63 additions & 0 deletions docs/adr/0007-render-wait-baseline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
status: accepted
---

# Render waits accept an optional Wait Baseline

## Context

The `batch` command runs an ordered sequence of **Batch Steps** — input actions
and **Render Waits** — through one **Command Target** with no human pacing
between them. A **Render Wait** today (`waitForRender` in
`src/host/hostMain.ts`) polls the renderer every 200 ms and matches against the
**latest** **Semantic Snapshot**; `WaitForRenderParams`
(`src/protocol/schemas.ts`) carries text/regex/screenStableMs/cursor/timeout and
nothing about event-log position.

That is fine for a human invoking `wait` once, but unsafe for steps that run
back-to-back. A wait step can match the screen left by the _previous_ step
before the current step has rendered (stale-match), and a `screenStableMs` wait
can declare that _old_ screen "stable" before the new input even appears. The
batch then advances on a false premise and sends later keystrokes into the wrong
state. This is the property that separates `batch` from a hand-written shell
loop, so it has to be correct.

## Decision

A **Render Wait** accepts an optional **Wait Baseline**: an **Event Log**
sequence (`afterSeq`) it must observe a **Semantic Snapshot** _strictly beyond_
before it may match or accrue **Screen Stability**. The `batch` executor sets
each wait step's baseline to the **Event Log** sequence recorded after the
preceding input **Batch Step**, so a wait only ever reflects state at or after
its own step. The standalone `wait --after-seq <n>` exposes the same gate, since
`snapshot` and `wait` already return a `capturedAtSeq` callers can chain.

- `afterSeq` is added to `WaitForRenderParams`; the host poll and the offline
replay matcher reject any snapshot whose `capturedAtSeq` is not strictly
greater than the baseline.
- With no baseline a **Render Wait** behaves exactly as before (matches the
latest snapshot), so the change is backward compatible.

## Consequences

- `batch` is meaningfully safer than scripting the existing commands in a loop:
each wait is anchored to its own step rather than racing the previous step's
screen.
- The **Wait Baseline** fixes **stale-match** only. It does **not** fix
_echo-match_ — a `wait --text "foo"` matching the terminal's echo of a
just-typed `foo`, which renders _after_ the baseline. Echo-match stays the
caller's concern (use a distinctive output token or `screenStableMs`), exactly
as with the `wait` command today.
- A small amount of protocol and matcher surface grows (one optional field plus
a `capturedAtSeq > afterSeq` gate in the live poll and the offline matcher).
Offline replay can only apply the floor against the single latest snapshot it
reconstructs.

## Alternatives considered

- **Require the visible text to change from a pre-step capture.** Rejected:
heuristic rather than exact, never matches a step that legitimately reproduces
identical text, and does not use the canonical **Event Log**.
- **No baseline in v1 (match the latest screen, document the foot-gun).**
Rejected: it leaves `batch` only marginally safer than a shell loop, and the
stale-match failure is silent and order-dependent.
Loading
Loading