diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e5db8414..8aff6eed 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -84,6 +84,12 @@ jobs: - name: Test unit run: mise run test-unit + # Integration and e2e tests drive real PTY hosts and headless-browser + # renderers, which can transiently fail under machine load (e.g. a screenshot + # render or RPC hiccup) even when the code is correct. The `test:integration` + # and `test:e2e` npm scripts pass `--retry=2`, so a flaky attempt is retried + # in place instead of failing the shard; a genuine failure still fails all + # three attempts. Unit tests (`test:unit`) deliberately do NOT retry. test-integration: runs-on: ubuntu-latest timeout-minutes: 20 diff --git a/CONTEXT.md b/CONTEXT.md index 804b9f18..28d4843d 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -43,6 +43,10 @@ _Avoid_: Visual wait, snapshot wait A render condition where the visible text content of a **Semantic Snapshot** has remained unchanged for a requested duration. _Avoid_: Settled screen +**Screen Hash**: +A stable digest of a **Session**'s normalized visible screen text at a captured event-log sequence, used to tell whether the rendered screen content changed between two observations. It is computed from the same canonical visible text that the **Screen Stability** check and text **Render Wait** matching use, so the three never disagree. +_Avoid_: Screen checksum, frame hash, screenshot hash + **Batch**: An ordered sequence of **Batch Steps** driven through one **Command Target** in a single `batch` invocation. It runs fail-fast: the first failed **Batch Step** stops the run unless the caller opts into continuing. _Avoid_: Pipeline, script, macro @@ -228,6 +232,9 @@ _Avoid_: bare "agent", "Coder agent" - A **Render Wait** may include text, regex, cursor, or **Screen Stability** conditions. - A **Render Wait** may be evaluated by live host polling for a **Live Host Eligible Session** or by offline replay fallback for an **Offline Replay Eligible Session**. - Offline replay fallback can evaluate snapshot content and cursor position, but cannot prove elapsed **Screen Stability** duration from a single latest **Semantic Snapshot**. +- A **Screen Hash** changes exactly when the canonical visible text that the **Screen Stability** check compares changes; the two share one definition. +- A **Screen Hash** covers visible screen text only — not scrollback, cursor position, or styles — and is distinct from the pixel `sha256` recorded on a **Screenshot Result**. +- A result carries the **Screen Hash** of the **Semantic Snapshot** it observed: a **Snapshot Result**, a matched **Render Wait** result, and the offline host-unreachable fallback that still observed a snapshot (even when it reports `matched: false` because **Screen Stability** duration could not be proven offline). The hash is keyed on whether a snapshot was observed, not on whether the wait matched; a **Render Wait** that observes no snapshot — a live timeout, a consecutive-failure giveup, or a replay error — carries none. - A **Waited Run** may produce one **Run Completion**, time out for its caller, or be interrupted by **Session** exit. - Caller timeout does not cancel the underlying **Run Completion**; it may still be observed later to keep internal completion bytes out of artifacts. - After **Session** exit, an unobserved **Run Completion** can no longer arrive. diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 8108419b..dd734b07 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -46,6 +46,15 @@ If you touch the public bootstrap under `skills/` or the bundled runtime skills npm run intent:validate ``` +### Flaky integration and e2e tests + +Integration and e2e tests drive real PTY hosts and headless-browser renderers, so an individual test can transiently fail under machine load (most often a screenshot render or host RPC hiccup) even when the code is correct. To keep these flakes from causing spurious red: + +- `npm run test:integration`, `npm run test:e2e`, and the combined `npm run test` retry a failing test in place (`--retry=2`, up to three attempts). A genuine failure still fails all three attempts. +- `npm run test:unit` deliberately does **not** retry — unit tests must be deterministic, and the dedicated unit CI gate is the authority that catches real unit flakes. +- If an integration/e2e test fails _consistently_ (not just on one attempt), treat it as a real failure and investigate; do not raise the retry count to paper over it. +- When debugging a single browser-backed test locally, run it in isolation (`npm run test:e2e -- `); the full serial suite is the heaviest load and the most flake-prone. + ## Documentation and proof expectations - Keep the root docs split clear: `README.md` for overview and `RELEASE.md` for supported scope. diff --git a/docs/USAGE.md b/docs/USAGE.md index ae2c3d5f..7c33467e 100644 --- a/docs/USAGE.md +++ b/docs/USAGE.md @@ -104,6 +104,14 @@ Useful flags: - `--exit`: wait for the process to exit. - `--timeout `: maximum wait time in milliseconds, with `0` meaning infinite. +### Screen Hash + +`snapshot` results (both `--format structured` and `--format text`) and a **matched** `wait` result carry an optional `screenHash`: a lowercase 64-character hex SHA-256 of the visible screen text. Compare it across two calls to tell whether the visible screen actually changed — equal hashes mean identical visible content, even if the event-log sequence advanced on a no-op repaint. + +- It hashes the visible screen only. It is **not** a hash of the `--format text` output, which also includes scrollback, so the hash ignores scrollback growth. +- It is distinct from the `screenshot` result's pixel `sha256`: `screenHash` is content identity, the screenshot `sha256` is pixel identity, and the two are not interchangeable. +- A `wait` that times out (or finds the host unreachable with no observed screen) omits `screenHash`, so a missing hash unambiguously means "no screen was observed" rather than an error. + ## `batch` Use `batch` to run an ordered sequence of input-and-`wait` steps against one session in a single invocation, instead of coordinating separate `run`/`type`/`paste`/`send-keys`/`wait` calls. Each `wait` step is anchored to a Wait Baseline — it only considers screen state produced _after_ the preceding input step — so a batch cannot race ahead and match a stale screen the way a hand-written shell loop can. @@ -174,7 +182,7 @@ The `--json` result is a per-step envelope: } ``` -Each step record carries its `index`, `kind`, `status` (`completed` | `failed` | `not-run` | `interrupted`), and `durationMs`. Input steps report the Event Log `seq` they produced; `wait` steps report the `waitBaseline` they were anchored to plus `matched` / `timedOut` / `matchedText` / `capturedAtSeq`. `completedCount` and `failedIndices` summarize the run. A fail-fast batch exits non-zero with the failed step's exit code (e.g. `11` for a `WAIT_TIMEOUT`); `--keep-going` exits `1` if any step failed. If the process is interrupted by SIGINT/SIGTERM, batch flushes the same envelope with the in-flight step marked `interrupted` and later steps `not-run`, then exits non-zero. +Each step record carries its `index`, `kind`, `status` (`completed` | `failed` | `not-run` | `interrupted`), and `durationMs`. Input steps report the Event Log `seq` they produced; `wait` steps report the `waitBaseline` they were anchored to plus `matched` / `timedOut` / `matchedText` / `capturedAtSeq`, and a matched `wait` step also carries the `screenHash` of the screen it observed (see [Screen Hash](#screen-hash)). `completedCount` and `failedIndices` summarize the run. A fail-fast batch exits non-zero with the failed step's exit code (e.g. `11` for a `WAIT_TIMEOUT`); `--keep-going` exits `1` if any step failed. If the process is interrupted by SIGINT/SIGTERM, batch flushes the same envelope with the in-flight step marked `interrupted` and later steps `not-run`, then exits non-zero. The Wait Baseline fixes stale-match only. It does **not** fix echo-match: a `wait` can still match the terminal's echo of a just-typed command (the echo renders _after_ the baseline). Use a distinctive output token or a `screenStableMs` wait rather than waiting for text you just typed. Interrupting a batch mid-`wait` leaves that wait's command still running on the session (the wait is abandoned, not cancelled), exactly like a caller timeout on `run`. diff --git a/docs/prd/screen-hash/PRD.md b/docs/prd/screen-hash/PRD.md new file mode 100644 index 00000000..ee154ddf --- /dev/null +++ b/docs/prd/screen-hash/PRD.md @@ -0,0 +1,63 @@ +# PRD: Screen Hash on snapshot and wait results + +## Problem Statement + +A caller — often an AI coding agent — driving a **Session** repeatedly needs a cheap, reliable way to answer "did the rendered screen actually change since I last looked?" Today the only per-result identifier is the captured event-log sequence, but that advances on every chunk of output, including output that changes nothing visible: cursor-position queries, terminal-mode toggles, a spinner repainting the same glyphs. So two observations with different sequences can be the identical screen, and a caller comparing sequences sees changes that are not there. There is no stable token for the screen's content itself. + +## Solution + +Snapshot results and matched **Render Wait** results gain an optional **Screen Hash**: a stable digest of the **Session**'s normalized visible screen text at the captured event-log sequence. Equal hashes mean the visible content is identical; a changed hash means it genuinely changed. The **Screen Hash** is computed from the same canonical visible text that the **Screen Stability** check and text **Render Wait** matching already use, so "the hash changed" and "the stability check saw a change" can never disagree. + +## User Stories + +1. As an AI coding agent, I want a stable hash of the screen content on each snapshot, so that I can tell across two CLI calls whether the visible screen actually changed without diffing full text myself. +2. As an agent, I want the hash to stay equal when only the cursor moved, so that cursor motion alone does not look like a content change. +3. As an agent, I want the hash to stay equal when output occurred that changed nothing visible, so that I am not misled by the captured sequence advancing on a no-op repaint. +4. As an agent, I want the hash to change whenever the visible text changes, so that I can trust it as a content-changed signal. +5. As a caller, I want the **Screen Hash** on the snapshot result in both structured and text formats, so that I get it regardless of how I read the screen. +6. As a caller, I want the **Screen Hash** on a matched render-wait result, so that I know the content identity at the moment my wait condition was satisfied. +7. As a caller, I want the hash present whenever a result holds an **observed** **Semantic Snapshot** — including the offline host-unreachable `matched: false` fallback that still observed a snapshot — and omitted only when no snapshot was observed (a live timeout, a consecutive-failure giveup, or a replay error), so that a missing hash unambiguously means "no screen was observed" rather than signalling an error. +8. As a tooling author, I want the **Screen Hash** to be renderer-independent — the same screen yields the same hash under either renderer backend — so that I can compare hashes across sessions rendered by different backends. +9. As a maintainer, I want the **Screen Hash**, the **Screen Stability** compare, and text **Render Wait** matching to share one canonical visible-text definition, so that they can never disagree about what "the screen" is. +10. As a maintainer, I want adding the **Screen Hash** and routing the three consumers through one shared canonical-text definition to make no change in itself to the shipped screen-stability behavior, so that the only behavior change is the deliberate, characterization-pinned Phase 1 renderer convergence — not an accidental side effect of the hash. +11. As a caller, I want to understand that the **Screen Hash** is distinct from a screenshot's pixel digest, so that I use the right identity for content versus pixels. +12. As a caller, I want to understand that the **Screen Hash** covers the visible screen only, even though the text snapshot format also includes scrollback, so that I am not surprised that the hash ignores scrollback growth. +13. As a tool building recordings, I want a per-frame content hash, so that I can dedup consecutive identical frames in artifacts. +14. As a caller using `--json`, I want the hash as a lowercase 64-character hex string validated by the same digest schema as other hashes, so that the field shape is predictable. +15. As a caller, I want the **Screen Hash** to be optional on results, so that older artifacts and hosts that predate it still parse. + +## Implementation Decisions + +- Add an optional **Screen Hash** field — a lowercase 64-character SHA-256 hex digest — to the snapshot result (both structured and text formats) and to the matched render-wait result. +- In scope: a **Batch Step** record for a matched **Render Wait** step also carries the **Screen Hash**, mirrored from that step's render-wait result, so a batch run exposes the same content identity per wait step that a standalone wait does. +- The **Screen Hash** is the SHA-256 of the canonical visible-text string: the visible lines joined by newline, exactly as the host's screen-stability compare and the text matcher already build it. The shared canonical-text **definition** — `visibleLines[].text` joined by `\n`, sourced only from the snapshot (never `backend.getVisibleText()` or `cells[]`) — is unchanged by adding the hash. Cursor position, text styles, and scrollback are excluded. +- Converging the two renderer backends on one canonical screen form (Phase 1) intentionally changes the **default** `ghostty-web` backend's stability and text-wait **comparand** on screens with grapheme clusters, interior blank-cell gaps, or non-ASCII trailing characters: the canonical form is exactly `rows` lines, each decoded with full grapheme clusters with blank/zero cells as `' '`, then right-trimmed of trailing ASCII spaces (`0x20`) only. This is a deliberate, narrow change pinned by characterization tests, not a free behavior-preserving add; on plain ASCII screens the comparand is unchanged. +- Extract one shared canonical-screen-text helper and route the **Screen Hash**, the host **Screen Stability** compare, and the text **Render Wait** matcher through it, so the three share a single definition and cannot diverge. +- The hash is keyed on whether a result holds an **observed** **Semantic Snapshot**, not on whether the wait matched. A result carries the **Screen Hash** of the snapshot it observed: a matched live wait, a snapshot capture, and the offline host-unreachable fallback that still observed a latest snapshot (even when it returns `matched: false` because the **Screen Stability** duration could not be proven offline). The hash is omitted only when no snapshot was observed: a live wait that times out, a consecutive-failure giveup, or a replay error throw. +- Do not surface the **Screen Hash** on inspection or any path that does not already render a **Semantic Snapshot**; computing it must never force a renderer bootstrap that would not otherwise happen. +- Reuse the existing SHA-256 hex validator. The consolidation set is exactly: export `Sha256HexSchema` from `protocol/schemas.ts` and import it in `renderer/types.ts`. Deliberately left out of scope: the standalone regex copies in `storage/artifactManifest.ts` and the `invariant(/^[a-f0-9]{64}$/u.test(...))` checks (for example in `renderer/profiles.ts` and `renderer/bundledFont.ts`), which are not Zod schemas and are not part of this consolidation. +- The field is optional so existing persisted artifacts and older hosts continue to parse. + +## Testing Decisions + +Good tests assert external behavior, not implementation details. + +- **Canonical-text and hash helper (unit).** Same screen yields the same hash; cursor-only movement yields the same hash; a single visible-glyph change yields a different hash; a trailing-whitespace-only difference (before right-trim of ASCII spaces) yields a different hash — proving the canonical form is exactly what is hashed and the behavior is as specified. +- **UTF-8 encoding pinned (unit).** The hash is the SHA-256 of the UTF-8 bytes of the canonical visible text, asserted against a concrete golden digest so the encoding can never silently drift. Golden: a three-row screen whose canonical text is `"a\nb\nc"` hashes to `ea7fb08b7a2dc4619ffb7c7bb38d95a2047935fa165d71b12efd3852a2e6d0cc`. +- **Shared definition (unit).** The host **Screen Stability** compare and the **Render Wait** matcher consume the same canonical string the hash uses, so a later change to one cannot silently diverge from the others, and screen-stability behavior is demonstrably unchanged. +- **Cross-backend hash equality.** The same event log produces the same **Screen Hash** under both renderer backends, pinning the renderer-independence guarantee that is currently only an assumption. This test requires the optional native addon (`@coder/libghostty-vt-node`) and so must run on at least one CI job that has the addon installed; it skips gracefully where the addon is absent (including the sandbox), so the renderer-independence guarantee is not silently unverified. +- **Snapshot and wait envelope (integration).** Against an isolated home: the **Screen Hash** is present on a snapshot (structured and text), on a matched live wait, and on the offline host-unreachable `matched: false` fallback that still observed a snapshot; and absent on a timed-out live wait. The existing CLI integration tests are prior art. + +## Out of Scope + +- Per-frame **Screen Hash**es on recordings / `record export` (user story 13). v1 attaches the hash only where a result already holds an observed **Semantic Snapshot**; the export paths render no **Semantic Snapshot** per frame, so a recording-frame dedup hash is future scope rather than a v1 deliverable. +- A scrollback hash. The **Screen Hash** is visible-screen-only; a separate scrollback digest can be added later if a concrete need appears. +- A styled or per-cell hash. Transient style churn would make such a hash flap; the **Screen Hash** is text-content identity only. +- Pixel-level identity, and any **Screen Hash** on the **Screenshot Result**. A **Screenshot Result** carries only its pixel `sha256`; the content hash lives on the snapshot and wait results. The **Screen Hash** is the semantic counterpart to the pixel digest and the two are not interchangeable. +- New wait semantics built on the hash (for example, "wait until the screen content changes"). v1 only exposes the field; any hash-driven wait is future scope. +- Any change to the screen-stability behavior **beyond** the Phase 1 renderer-convergence change described in the Implementation Decisions. The canonical-text definition and the shared single-source unify are behavior-preserving; the only intended behavior change is the default `ghostty-web` backend's comparand on grapheme / interior-gap / non-ASCII-trailing screens, pinned by characterization tests. No new wait semantics are added. + +## Further Notes + +- The motivation differs from the comparable tool virtui, which hashes to avoid shipping screen bytes over a socket. agent-tty is a local CLI, so the value here is the stable content change-token and frame dedup, not transfer avoidance. +- The **Screen Hash** term is defined in the project glossary; this PRD and that term are on branch `feat/screen-hash`. No ADR was needed: the field is an optional add over the canonical string that already exists, and the one intended behavior change — the Phase 1 renderer convergence — is narrow, characterization-pinned, and easily reversible. diff --git a/package.json b/package.json index 7e4a8d90..962bb3d3 100644 --- a/package.json +++ b/package.json @@ -55,9 +55,9 @@ "release:finalize": "node ./scripts/release-finalize.mjs", "review-bundle": "tsx src/tools/review-bundle.ts", "smoke:install": "node ./scripts/smoke-install.mjs", - "test": "vitest run", - "test:e2e": "vitest run --maxWorkers=1 test/e2e", - "test:integration": "vitest run --maxWorkers=1 test/integration", + "test": "vitest run --retry=2", + "test:e2e": "vitest run --maxWorkers=1 --retry=2 test/e2e", + "test:integration": "vitest run --maxWorkers=1 --retry=2 test/integration", "test:unit": "vitest run test/unit", "test:watch": "vitest", "typecheck": "tsc -p tsconfig.json --noEmit", diff --git a/skill-data/agent-tty/SKILL.md b/skill-data/agent-tty/SKILL.md index d090d61b..bb803dac 100644 --- a/skill-data/agent-tty/SKILL.md +++ b/skill-data/agent-tty/SKILL.md @@ -70,6 +70,8 @@ agent-tty --home "$AGENT_HOME" run "$SESSION_ID" 'pwd && ls -la' --json agent-tty --home "$AGENT_HOME" snapshot "$SESSION_ID" --format text --json ``` +`snapshot` and a matched `wait` carry an optional `screenHash` (a hash of the visible screen text). Compare it across calls to tell whether the visible screen actually changed instead of diffing full text; equal hashes mean identical visible content even when the event sequence advanced. + ### Drive an interactive CLI or TUI Use `batch` to run an ordered sequence of input-and-`wait` steps in one call instead of separate `run`/`wait`/`send-keys` invocations. Each `wait` step is anchored to a Wait Baseline — it only observes screen state produced _after_ the preceding input step, so the sequence cannot race ahead and match a stale screen. A batch stops at the first failed step by default (`--keep-going` attempts every step). diff --git a/src/batch/executor.ts b/src/batch/executor.ts index d6c9155d..1a009f40 100644 --- a/src/batch/executor.ts +++ b/src/batch/executor.ts @@ -256,6 +256,9 @@ async function runWaitStep( timedOut: result.timedOut, ...matchedText, capturedAtSeq: result.capturedAtSeq, + ...(result.screenHash === undefined + ? {} + : { screenHash: result.screenHash }), }; // A timed-out wait (equivalently an unmatched result) is not a thrown error diff --git a/src/batch/result.ts b/src/batch/result.ts index d8522877..30836e81 100644 --- a/src/batch/result.ts +++ b/src/batch/result.ts @@ -2,6 +2,7 @@ import { z } from 'zod'; import type { BatchPlan, BatchStep } from './plan.js'; +import { Sha256HexSchema } from '../protocol/schemas.js'; import { unreachable } from '../util/assert.js'; // `interrupted` is the in-flight step abandoned by a SIGINT/SIGTERM flush (its @@ -66,6 +67,7 @@ export const WaitStepRecordSchema = z timedOut: z.boolean().optional(), matchedText: z.string().optional(), capturedAtSeq: NonNegativeIntSchema.optional(), + screenHash: Sha256HexSchema.optional(), error: BatchStepErrorSchema.optional(), }) .strict(); diff --git a/src/cli/commands/wait.ts b/src/cli/commands/wait.ts index 0df5b761..2729f55c 100644 --- a/src/cli/commands/wait.ts +++ b/src/cli/commands/wait.ts @@ -19,6 +19,7 @@ import { matchRenderWaitSnapshot, prepareRenderWaitCondition, } from '../../renderWait/matcher.js'; +import { computeScreenHash } from '../../renderer/canonicalScreen.js'; import { isTerminalSessionStatus } from '../../protocol/sessionStatusPolicy.js'; import { withOfflineReplayRenderer } from '../../replay/offlineReplay.js'; import { readManifestIfExists } from '../../storage/manifests.js'; @@ -117,6 +118,7 @@ function buildOfflineRenderWaitResult( cursorRow: match.cursorRow, cursorCol: match.cursorCol, capturedAtSeq: match.capturedAtSeq, + screenHash: computeScreenHash(snapshot), }; } diff --git a/src/host/hostMain.ts b/src/host/hostMain.ts index f027c992..2c322ab2 100644 --- a/src/host/hostMain.ts +++ b/src/host/hostMain.ts @@ -33,6 +33,10 @@ import type { WaitForRenderResult, WaitParams, } from '../protocol/messages.js'; +import { + canonicalVisibleText, + computeScreenHash, +} from '../renderer/canonicalScreen.js'; import { DEFAULT_RENDERER_NAME, resolveRendererName, @@ -901,9 +905,7 @@ export async function runHost(sessionId: string): Promise { throwIfAborted(signal); const snapshot = await backend.snapshot(); throwIfAborted(signal); - const visibleText = snapshot.visibleLines - .map((line) => line.text) - .join('\n'); + const visibleText = canonicalVisibleText(snapshot); const capturedAtSeq = snapshot.capturedAtSeq; latestCapturedAtSeq = capturedAtSeq; consecutiveFailures = 0; @@ -943,6 +945,7 @@ export async function runHost(sessionId: string): Promise { cursorRow: match.cursorRow, cursorCol: match.cursorCol, capturedAtSeq, + screenHash: computeScreenHash(snapshot), }); } } catch (pollError) { diff --git a/src/protocol/schemas.ts b/src/protocol/schemas.ts index 97e4d781..70dca889 100644 --- a/src/protocol/schemas.ts +++ b/src/protocol/schemas.ts @@ -25,7 +25,7 @@ export const ReplayTimingModeSchema = z.enum([ ]); export type ReplayTimingMode = z.infer; const SessionEnvSchema = z.record(NonEmptyStringSchema, z.string()); -const Sha256HexSchema = z +export const Sha256HexSchema = z .string() .regex( /^[a-f0-9]{64}$/u, @@ -342,6 +342,7 @@ export const StructuredSnapshotResultSchema = z visibleLines: z.array(VisibleLineSchema), scrollbackLines: z.array(VisibleLineSchema).optional(), cells: z.array(RichSnapshotLineSchema).optional(), + screenHash: Sha256HexSchema.optional(), }) .strict(); export type StructuredSnapshotResult = z.infer< @@ -358,6 +359,7 @@ export const TextSnapshotResultSchema = z cursorRow: NonNegativeIntSchema, cursorCol: NonNegativeIntSchema, text: z.string(), + screenHash: Sha256HexSchema.optional(), }) .strict(); export type TextSnapshotResult = z.infer; @@ -455,6 +457,7 @@ export const WaitForRenderResultSchema = z cursorRow: NonNegativeIntSchema.optional(), cursorCol: NonNegativeIntSchema.optional(), capturedAtSeq: NonNegativeIntSchema, + screenHash: Sha256HexSchema.optional(), }) .strict(); export const RecordExportResultSchema = z diff --git a/src/renderWait/matcher.ts b/src/renderWait/matcher.ts index 371480d8..5578d609 100644 --- a/src/renderWait/matcher.ts +++ b/src/renderWait/matcher.ts @@ -2,6 +2,10 @@ import type { WaitForRenderParams } from '../protocol/messages.js'; import type { SemanticSnapshot } from '../renderer/types.js'; import { ERROR_CODES, makeCliError } from '../protocol/errors.js'; +import { + canonicalVisibleLines, + canonicalVisibleText, +} from '../renderer/canonicalScreen.js'; import { invariant } from '../util/assert.js'; import { MAX_WAIT_FOR_RENDER_REGEX_LENGTH, @@ -301,8 +305,8 @@ export function matchRenderWaitSnapshot( ); } - const visibleLines = snapshot.visibleLines.map((line) => line.text); - const visibleText = visibleLines.join('\n'); + const visibleLines = canonicalVisibleLines(snapshot); + const visibleText = canonicalVisibleText(snapshot); let textMatched = false; let matchedText: string | undefined; diff --git a/src/renderer/canonicalScreen.ts b/src/renderer/canonicalScreen.ts new file mode 100644 index 00000000..f214c60b --- /dev/null +++ b/src/renderer/canonicalScreen.ts @@ -0,0 +1,41 @@ +import { sha256Hex } from '../util/hash.js'; + +/** + * The minimal snapshot shape needed to derive the canonical visible text: the + * ordered visible lines, each carrying its already-decoded `text`. + * + * Compatible with `Pick`. + */ +interface CanonicalScreenSource { + readonly visibleLines: ReadonlyArray<{ readonly text: string }>; +} + +/** + * The ordered canonical visible lines of a snapshot. + * + * The body is VERBATIM the inline expression already at hostMain.ts:904-906 and + * matcher.ts:304-305 — no trim/pad/normalize is applied, so it is + * behavior-preserving. The source is `visibleLines[].text` ONLY; it must NEVER + * read `backend.getVisibleText()` (divergent native impl) or `cells[]` (the + * dashboard's alternate source). + */ +export function canonicalVisibleLines(s: CanonicalScreenSource): string[] { + return s.visibleLines.map((line) => line.text); +} + +/** + * The canonical visible text of a snapshot: its canonical visible lines joined + * with `\n`. See {@link canonicalVisibleLines} for the no-normalization + * guarantee and source constraint. + */ +export function canonicalVisibleText(s: CanonicalScreenSource): string { + return canonicalVisibleLines(s).join('\n'); +} + +/** + * The screen hash of a snapshot: the lowercase 64-character SHA-256 hex of the + * UTF-8 bytes of its canonical visible text. + */ +export function computeScreenHash(s: CanonicalScreenSource): string { + return sha256Hex(canonicalVisibleText(s)); +} diff --git a/src/renderer/ghosttyWeb/backend.ts b/src/renderer/ghosttyWeb/backend.ts index 21636131..5acfad6c 100644 --- a/src/renderer/ghosttyWeb/backend.ts +++ b/src/renderer/ghosttyWeb/backend.ts @@ -471,19 +471,85 @@ const EMBEDDED_HARNESS_HTML = ` return { cols, rows, activeBuffer, viewportY: bottomViewportY }; } + // Strip ONLY trailing ASCII spaces (0x20). Unlike String.prototype.trimEnd + // this preserves other trailing whitespace (tabs, NBSP, etc.), keeping the + // canonical visible text aligned with the libghostty-vt backend. + function stripTrailingAsciiSpaces(text) { + let end = text.length; + while (end > 0 && text.charCodeAt(end - 1) === 0x20) { + end -= 1; + } + return end === text.length ? text : text.slice(0, end); + } + + // Build a single canonical line by concatenating each column's FULL + // grapheme cluster, then right-trimming trailing ASCII spaces only. + // readColumn(col) returns { grapheme, width }: grapheme is the cell's + // full grapheme cluster; width is the cell's column span. A wide glyph's + // trailing spacer has width 0 and contributes NOTHING, so a row of + // 'A'+wide('漢')+wide('字')+'B' decodes to 'A漢字B' — matching the + // libghostty-vt backend's visibleLines[].text (its cells[] likewise + // carries the spacer as '' rather than ' '). A genuine blank interior + // cell decodes to a single ' ' so interior gaps survive and trailing + // gaps trim away. The live engine returns the NUL codepoint (U+0000) for + // a blank cell — getGrapheme yields [0], so getGraphemeString runs + // String.fromCodePoint(0) and produces a NUL, not ' ' (its empty-array + // ' ' fallback never fires). Those NULs would survive + // stripTrailingAsciiSpaces (it strips only 0x20) and diverge from the + // native backend, so a kept cell whose grapheme is a lone NUL is + // normalized to ' ' here. + function decodeGraphemeLine(readColumn, cols) { + let text = ''; + for (let col = 0; col < cols; col += 1) { + const column = readColumn(col); + invariant( + column !== null && typeof column === 'object', + 'decoded column must be an object', + ); + assertStringValue( + column.grapheme, + 'decoded grapheme must be a string', + ); + invariant( + Number.isInteger(column.width) && column.width >= 0, + 'decoded cell width must be a non-negative integer', + ); + if (column.width === 0) { + continue; + } + text += column.grapheme === '\\u0000' ? ' ' : column.grapheme; + } + return stripTrailingAsciiSpaces(text); + } + + // Width for a single column, used to drop wide-glyph trailing spacers. + // getCell() always returns a cell (out-of-range columns synthesize a + // blank width-1 cell), so a missing cell is treated as a blank column. + function readCellWidth(line, col) { + const cell = line?.getCell(col); + if (cell === undefined || cell === null) { + return 1; + } + const width = cell.getWidth(); + return Number.isInteger(width) && width >= 0 ? width : 1; + } + function decodeVisibleLines(terminal) { terminal.scrollToBottom(); const { cols, rows, activeBuffer, viewportY } = getNormalizedViewportState(terminal); + const wasmTerm = terminal.wasmTerm; + invariant(wasmTerm, 'terminal WASM instance is unavailable'); const visibleLines = []; for (let row = 0; row < rows; row += 1) { const line = activeBuffer.getLine(viewportY + row); - const text = - line === undefined ? '' : line.translateToString(true, 0, cols); - invariant( - typeof text === 'string', - \`decoded line \${row} must be a string\`, + const text = decodeGraphemeLine( + (col) => ({ + grapheme: wasmTerm.getGraphemeString(row, col), + width: readCellWidth(line, col), + }), + cols, ); visibleLines.push({ row, text }); } @@ -503,14 +569,18 @@ const EMBEDDED_HARNESS_HTML = ` return []; } + const wasmTerm = terminal.wasmTerm; + invariant(wasmTerm, 'terminal WASM instance is unavailable'); + const scrollbackLines = []; for (let row = 0; row < viewportY; row += 1) { const line = activeBuffer.getLine(row); - const text = - line === undefined ? '' : line.translateToString(true, 0, cols); - invariant( - typeof text === 'string', - \`decoded scrollback line \${row} must be a string\`, + const text = decodeGraphemeLine( + (col) => ({ + grapheme: wasmTerm.getScrollbackGraphemeString(row, col), + width: readCellWidth(line, col), + }), + cols, ); scrollbackLines.push({ row, text }); } @@ -532,10 +602,26 @@ const EMBEDDED_HARNESS_HTML = ` return \`#\${colorValue.toString(16).padStart(6, '0')}\`; } - function decodeSnapshotCell(cell) { + function decodeSnapshotCell(cell, graphemeChar) { invariant(cell !== undefined, 'snapshot cell must be defined'); - const char = cell.getChars(); - assertStringValue(char, 'snapshot cell char must be a string'); + const baseChars = cell.getChars(); + assertStringValue(baseChars, 'snapshot cell char must be a string'); + assertStringValue( + graphemeChar, + 'snapshot cell grapheme must be a string', + ); + // Deliberate, converged decision (matches the libghostty-vt backend): + // in cells[] a codepoint-0 cell — both a genuine blank AND a wide + // glyph's trailing spacer — is '', whereas in visibleLines[].text a + // genuine blank renders as ' ' and only the width-0 spacer is dropped + // (see decodeGraphemeLine). The cells[] grid is column-addressed, so a + // blank and a spacer are both empty placeholders there; the text line + // is a reading-order string, so a blank is a real space but a spacer + // is layout, not content. Non-empty cells use the FULL grapheme + // cluster so continuation codepoints (emoji ZWJ, NFD combining marks) + // are not dropped. The Screen Hash sources visibleLines[].text, never + // cells[], so this asymmetry never reaches the hash. + const char = baseChars === '' ? '' : graphemeChar; const isInverse = cell.isInverse() === 1; const fgColor = cell.getFgColor(); @@ -556,6 +642,8 @@ const EMBEDDED_HARNESS_HTML = ` terminal.scrollToBottom(); const { cols, rows, activeBuffer, viewportY } = getNormalizedViewportState(terminal); + const wasmTerm = terminal.wasmTerm; + invariant(wasmTerm, 'terminal WASM instance is unavailable'); const nullCell = activeBuffer.getNullCell(); const cells = []; @@ -574,7 +662,12 @@ const EMBEDDED_HARNESS_HTML = ` const line = activeBuffer.getLine(viewportY + row); const rowCells = []; for (let col = 0; col < cols; col += 1) { - rowCells.push(decodeSnapshotCell(line?.getCell(col) ?? nullCell)); + rowCells.push( + decodeSnapshotCell( + line?.getCell(col) ?? nullCell, + wasmTerm.getGraphemeString(row, col), + ), + ); } invariant( @@ -831,6 +924,77 @@ let servedAssetsPromise: Promise< ReadonlyMap > | null = null; +/** + * One decoded terminal column: the cell's full grapheme cluster plus its + * column span. `width === 0` marks a wide glyph's trailing spacer column. + */ +export interface GhosttyDecodedColumn { + grapheme: string; + width: number; +} + +/** + * Strip ONLY trailing ASCII spaces (0x20). Unlike String.prototype.trimEnd + * this preserves other trailing whitespace (tabs, NBSP, etc.), keeping the + * canonical visible text aligned with the libghostty-vt backend. + * + * Exported as the host-testable twin of the identical function embedded in + * EMBEDDED_HARNESS_HTML; the harness copy is the browser runtime and cannot + * import this module, so the two must stay byte-for-byte in sync. + */ +export function stripTrailingAsciiSpaces(text: string): string { + let end = text.length; + while (end > 0 && text.charCodeAt(end - 1) === 0x20) { + end -= 1; + } + return end === text.length ? text : text.slice(0, end); +} + +/** + * Assemble one canonical visible line from a per-column reader, then + * right-trim trailing ASCII spaces. A width-0 column (a wide glyph's trailing + * spacer) contributes nothing, so a row of `A`+wide(`漢`)+wide(`字`)+`B` + * yields `A漢字B` — matching the libghostty-vt backend's visibleLines[].text. + * A genuine blank interior cell decodes to a single ' ', so interior gaps + * survive and trailing gaps trim away. Non-empty cells contribute their FULL + * grapheme cluster, so continuation codepoints (emoji ZWJ, NFD combining + * marks) are preserved instead of being truncated to the base codepoint. + * + * The live ghostty-web engine returns the NUL codepoint (U+0000) for a blank + * cell: getGrapheme yields `[0]`, so getGraphemeString runs + * `String.fromCodePoint(0)` and produces a NUL, not ' ' (its empty-array + * ' ' fallback never fires). Left as-is those NULs would survive + * stripTrailingAsciiSpaces (which strips only 0x20) and diverge from the + * native backend's right-trimmed ' '-blank form, so a kept cell whose grapheme + * is a lone NUL is normalized to ' ' here. + * + * Exported as the host-testable twin of the decodeGraphemeLine function + * embedded in EMBEDDED_HARNESS_HTML; keep the two in sync. + */ +export function assembleCanonicalLine( + cols: number, + readColumn: (col: number) => GhosttyDecodedColumn, +): string { + assertNonNegativeInteger( + cols, + 'canonical line cols must be a non-negative integer', + ); + let text = ''; + for (let col = 0; col < cols; col += 1) { + const column = readColumn(col); + assertString(column.grapheme, 'decoded grapheme must be a string'); + assertNonNegativeInteger( + column.width, + 'decoded cell width must be a non-negative integer', + ); + if (column.width === 0) { + continue; + } + text += column.grapheme === '\u0000' ? ' ' : column.grapheme; + } + return stripTrailingAsciiSpaces(text); +} + function assertNonNegativeInteger( value: unknown, message: string, diff --git a/src/renderer/libghosttyVt/backend.ts b/src/renderer/libghosttyVt/backend.ts index 67bb0ec7..5f68a9ff 100644 --- a/src/renderer/libghosttyVt/backend.ts +++ b/src/renderer/libghosttyVt/backend.ts @@ -113,6 +113,32 @@ function validateNativeVisibleLines( }); } +/** + * Pad the native visible lines to exactly `rows` entries by appending blank + * trailing lines (`text: ''`). The native ReadLine path already right-trims + * trailing ASCII spaces, expands full grapheme clusters, and renders blank + * cells as ' ' (terminal.cc), but it omits trailing blank rows, so only the + * line count needs aligning with the canonical pad-to-rows form. Each visible + * line's `row` is its index (native emits contiguous 0-based rows), so the + * appended lines continue that sequence. This converges the libghostty-vt + * backend's `visibleLines[].text` with the ghostty-web backend so the two + * agree on the Screen Hash. See docs/prd/screen-hash/PRD.md. + */ +function padVisibleLinesToRows( + lines: readonly NativeVisibleLine[], + rows: number, +): NativeVisibleLine[] { + invariant( + lines.length <= rows, + 'native visible line count must not exceed terminal rows', + ); + const padded: NativeVisibleLine[] = [...lines]; + for (let row = padded.length; row < rows; row += 1) { + padded.push({ row, text: '' }); + } + return padded; +} + function assertNativeSnapshot(snapshot: unknown): TerminalSnapshot { invariant( snapshot !== null && typeof snapshot === 'object', @@ -156,6 +182,10 @@ function assertNativeSnapshot(snapshot: unknown): TerminalSnapshot { candidate.visibleLines, 'snapshot.visibleLines', ); + // The native ReadLine path omits trailing blank rows, so it may emit fewer + // than `rows` visible lines; snapshot() pads the gap to exactly `rows` to + // match the canonical pad-to-rows form (see padVisibleLinesToRows). Only the + // permissive upper bound is enforced here. invariant( visibleLines.length <= candidate.rows, 'snapshot visible line count must fit terminal rows', @@ -543,7 +573,10 @@ export class LibghosttyVtBackend implements RendererBackend { cursorRow: nativeSnapshot.cursorRow, cursorCol: nativeSnapshot.cursorCol, isAltScreen: nativeSnapshot.isAltScreen, - visibleLines: nativeSnapshot.visibleLines, + visibleLines: padVisibleLinesToRows( + nativeSnapshot.visibleLines, + nativeSnapshot.rows, + ), ...(nativeSnapshot.scrollbackLines === undefined ? {} : { scrollbackLines: nativeSnapshot.scrollbackLines }), diff --git a/src/renderer/types.ts b/src/renderer/types.ts index 421a2ef8..41f63f70 100644 --- a/src/renderer/types.ts +++ b/src/renderer/types.ts @@ -4,6 +4,7 @@ import { MarkerEventPayloadSchema, RichSnapshotLineSchema, RunCompleteEventPayloadSchema, + Sha256HexSchema, VisibleLineSchema, type VisibleLine, } from '../protocol/schemas.js'; @@ -17,12 +18,6 @@ const ThemeSchema = z.enum(['dark', 'light']); const HexColorSchema = z .string() .regex(/^#[0-9a-fA-F]{6}$/u, 'must be a hex color like #1e1e2e'); -const Sha256HexSchema = z - .string() - .regex( - /^[a-f0-9]{64}$/u, - 'must be a 64-character lowercase SHA-256 hex string', - ); const BundledFontStyleSchema = z.enum(['normal', 'italic', 'oblique']); const RoutePathSchema = z .string() diff --git a/src/snapshot/capture.ts b/src/snapshot/capture.ts index ed766c8a..bac0f0cc 100644 --- a/src/snapshot/capture.ts +++ b/src/snapshot/capture.ts @@ -4,6 +4,7 @@ import type { SemanticSnapshot } from '../renderer/types.js'; import { ERROR_CODES, makeCliError } from '../protocol/errors.js'; import { SnapshotResultSchema } from '../protocol/schemas.js'; import { parseValidatedResult } from '../protocol/validation.js'; +import { computeScreenHash } from '../renderer/canonicalScreen.js'; import { appendArtifactWithRollback, createArtifactEntry, @@ -34,9 +35,11 @@ export function createSnapshotResult( ...snapshot.visibleLines.map((line) => line.text), ]; + const screenHash = computeScreenHash(snapshot); + const snapshotResult: SnapshotResult = format === 'structured' - ? { format: 'structured' as const, ...snapshot } + ? { format: 'structured' as const, ...snapshot, screenHash } : { format: 'text' as const, sessionId: snapshot.sessionId, @@ -46,6 +49,7 @@ export function createSnapshotResult( cursorRow: snapshot.cursorRow, cursorCol: snapshot.cursorCol, text: textLines.join('\n'), + screenHash, }; return parseSnapshotResult( diff --git a/src/util/hash.ts b/src/util/hash.ts new file mode 100644 index 00000000..1b320da2 --- /dev/null +++ b/src/util/hash.ts @@ -0,0 +1,9 @@ +import { createHash } from 'node:crypto'; + +/** + * Returns the lowercase 64-character SHA-256 hex digest of the UTF-8 bytes of + * `text`. + */ +export function sha256Hex(text: string): string { + return createHash('sha256').update(text, 'utf8').digest('hex'); +} diff --git a/test/integration/cross-backend-screen-hash.test.ts b/test/integration/cross-backend-screen-hash.test.ts new file mode 100644 index 00000000..81df453c --- /dev/null +++ b/test/integration/cross-backend-screen-hash.test.ts @@ -0,0 +1,184 @@ +import { afterEach, describe, expect, it } from 'vitest'; + +import { computeScreenHash } from '../../src/renderer/canonicalScreen.js'; +import { resolveProfile } from '../../src/renderer/profiles.js'; +import type { RendererBackend } from '../../src/renderer/backend.js'; +import type { ReplayInput } from '../../src/renderer/types.js'; +import { GhosttyWebBackend } from '../../src/renderer/ghosttyWeb/index.js'; +import { LibghosttyVtBackend } from '../../src/renderer/libghosttyVt/index.js'; + +// Gate the whole suite on the optional native engine: when +// @coder/libghostty-vt-node is unavailable there is no second renderer to +// compare against, so every case skips cleanly (mirrors the nativeAvailable +// pattern in test/e2e/libghostty-vt-renderer.test.ts). Do NOT fall back to a +// length>0 guard — a converged blank/short screen is a valid case here. +let nativeAvailable = false; +let nativeSkipReason = ''; +try { + await import('@coder/libghostty-vt-node'); + nativeAvailable = true; +} catch (error) { + nativeSkipReason = error instanceof Error ? error.message : String(error); +} +const maybeIt = nativeAvailable ? it : it.skip; + +const PROFILE = resolveProfile('reference-dark'); +const SESSION_ID = 'cross-backend-screen-hash'; +const SHA_256_HEX = /^[a-f0-9]{64}$/u; + +function timestampFor(seq: number): string { + return new Date(Date.UTC(2026, 5, 5, 12, 0, seq)).toISOString(); +} + +function singleOutputReplayInput( + data: string, + options: { cols?: number; rows?: number } = {}, +): ReplayInput { + return { + sessionId: SESSION_ID, + initialCols: options.cols ?? 80, + initialRows: options.rows ?? 24, + targetSeq: 0, + events: [ + { + seq: 0, + ts: timestampFor(0), + type: 'output', + payload: { data }, + }, + ], + }; +} + +interface CrossBackendResult { + webHash: string; + nativeHash: string; + webLines: string[]; + nativeLines: string[]; +} + +describe('cross-backend screen hash', { timeout: 120_000 }, () => { + const backends: RendererBackend[] = []; + + afterEach(async () => { + while (backends.length > 0) { + const backend = backends.pop(); + if (backend !== undefined) { + await backend.dispose(); + } + } + }); + + // Boot BOTH renderer backends over the SAME ReplayInput, then route each + // snapshot's visibleLines through computeScreenHash. Returning the raw lines + // too makes any divergence legible in the assertion diff. + async function hashBothBackends( + input: ReplayInput, + ): Promise { + const webBackend = new GhosttyWebBackend(SESSION_ID, PROFILE); + backends.push(webBackend); + const nativeBackend = new LibghosttyVtBackend(SESSION_ID, PROFILE); + backends.push(nativeBackend); + + await webBackend.boot(); + await nativeBackend.boot(); + + await webBackend.replayTo(input); + await nativeBackend.replayTo(input); + + const webSnapshot = await webBackend.snapshot(); + const nativeSnapshot = await nativeBackend.snapshot(); + + return { + webHash: computeScreenHash(webSnapshot), + nativeHash: computeScreenHash(nativeSnapshot), + webLines: webSnapshot.visibleLines.map((line) => line.text), + nativeLines: nativeSnapshot.visibleLines.map((line) => line.text), + }; + } + + async function expectAgreement(input: ReplayInput): Promise { + const result = await hashBothBackends(input); + // Compare the decoded lines first so a mismatch surfaces the offending + // text, then assert the hashes themselves agree. + expect(result.nativeLines).toEqual(result.webLines); + expect(result.webHash).toMatch(SHA_256_HEX); + expect(result.nativeHash).toBe(result.webHash); + } + + maybeIt( + nativeAvailable + ? 'agrees on an ASCII full screen' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + const rows = Array.from( + { length: 12 }, + (_, index) => `row ${String(index)} of ascii content`, + ).join('\r\n'); + await expectAgreement(singleOutputReplayInput(rows)); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on an interior cursor-positioned gap' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // Write 'a' at the home position, jump to row 1 col 6 (CSI 1;6H is + // 1-based), then write 'b'. The interior cols between them are genuine + // blank cells that both backends must render as spaces, yielding + // 'a b' on row 0 after trailing-space trimming. + await expectAgreement(singleOutputReplayInput('a\x1b[1;6Hb')); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on CJK wide glyphs' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // 'kanji-kanji-te-su-to' in CJK: each glyph occupies two columns. + await expectAgreement(singleOutputReplayInput('漢字テスト')); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on grapheme clusters (NFD combining mark and a ZWJ family emoji)' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // 'e' + combining acute accent (NFD) on row 0, then a ZWJ family emoji + // (man + ZWJ + woman + ZWJ + girl + ZWJ + boy) on row 1. Both backends + // must keep the FULL grapheme cluster rather than dropping continuation + // codepoints. + const combiningE = 'e\u0301'; + const zwjFamily = + '\u{1f468}\u200d\u{1f469}\u200d\u{1f467}\u200d\u{1f466}'; + await expectAgreement( + singleOutputReplayInput(`${combiningE}\r\n${zwjFamily}`), + ); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on a line with a trailing non-breaking space' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // NBSP (U+00A0) is not ASCII 0x20, so neither backend trims it; the + // trailing NBSP must survive identically in the canonical text. + await expectAgreement(singleOutputReplayInput('value\u00a0')); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on a short, mostly-blank screen' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // A single short line leaves the rest of the viewport blank, exercising + // the libghostty pad-to-rows alignment against ghostty-web's full grid. + await expectAgreement(singleOutputReplayInput('hi')); + }, + ); +}); diff --git a/test/integration/screen-hash.test.ts b/test/integration/screen-hash.test.ts new file mode 100644 index 00000000..e96159b1 --- /dev/null +++ b/test/integration/screen-hash.test.ts @@ -0,0 +1,234 @@ +import { mkdtemp, realpath } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { afterEach, beforeEach, describe, expect, it } from 'vitest'; + +import type { + SnapshotResult, + WaitForRenderResult, +} from '../../src/protocol/messages.js'; +import { + cleanupHome, + createSession, + crashSession, + destroySession, + inspectSession, + readEvents, + runCli, + sleep, + type SuccessEnvelope, + type WaitResult, +} from '../helpers.js'; + +// A session that emits a stable marker and then idles, so the rendered screen +// has settled visible content the wait and snapshot paths can hash. +const SESSION_COMMAND = [ + '/bin/sh', + '-c', + "printf 'booting\\n'; sleep 1; printf 'Ready\\n'; exec cat", +] as const; +const HOOK_TIMEOUT_MS = 30_000; + +const SHA_256_HEX = /^[a-f0-9]{64}$/u; + +type StructuredSnapshot = Extract; +type TextSnapshot = Extract; + +async function waitForOutputMarker( + testHome: string, + sessionId: string, + marker: string, +): Promise { + const waitResult = runCli( + ['wait', sessionId, '--idle-ms', '200', '--timeout', '10000', '--json'], + { AGENT_TTY_HOME: testHome }, + 15_000, + ); + + expect(waitResult.status).toBe(0); + expect(waitResult.stderr).toBe(''); + const waitEnvelope = JSON.parse( + waitResult.stdout, + ) as SuccessEnvelope; + expect(waitEnvelope.ok).toBe(true); + expect(waitEnvelope.result.timedOut).toBe(false); + + const deadline = Date.now() + 10_000; + while (Date.now() < deadline) { + const events = await readEvents(testHome, sessionId).catch(() => []); + const output = events + .filter((event) => event.type === 'output') + .map((event) => { + const data = event.payload.data; + return typeof data === 'string' ? data : ''; + }) + .join(''); + + if (output.includes(marker)) { + return; + } + + await sleep(100); + } + + throw new Error(`timed out waiting for output marker ${marker}`); +} + +describe('screen hash integration', { timeout: 120_000 }, () => { + let testHome = ''; + let sessionId = ''; + + beforeEach(async () => { + // oxfmt-ignore + testHome = await realpath(await mkdtemp(join(tmpdir(), 'agent-tty-screen-hash-'))); + sessionId = createSession(testHome, [...SESSION_COMMAND]); + await waitForOutputMarker(testHome, sessionId, 'booting'); + }, HOOK_TIMEOUT_MS); + + afterEach(async () => { + destroySession(testHome, sessionId); + await cleanupHome(testHome); + sessionId = ''; + testHome = ''; + }, HOOK_TIMEOUT_MS); + + it('includes screenHash on a structured snapshot', () => { + const result = runCli( + ['snapshot', sessionId, '--format', 'structured', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.format).toBe('structured'); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); + + it('includes screenHash on a text snapshot', () => { + const result = runCli( + ['snapshot', sessionId, '--format', 'text', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse(result.stdout) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.format).toBe('text'); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); + + it('agrees on screenHash between structured and text snapshots of the same screen', () => { + const structured = runCli( + ['snapshot', sessionId, '--format', 'structured', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + const text = runCli( + ['snapshot', sessionId, '--format', 'text', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(structured.status).toBe(0); + expect(text.status).toBe(0); + const structuredEnvelope = JSON.parse( + structured.stdout, + ) as SuccessEnvelope; + const textEnvelope = JSON.parse( + text.stdout, + ) as SuccessEnvelope; + + expect(structuredEnvelope.result.screenHash).toMatch(SHA_256_HEX); + expect(textEnvelope.result.screenHash).toBe( + structuredEnvelope.result.screenHash, + ); + }); + + it('includes screenHash on a matched render wait', () => { + const result = runCli( + ['wait', sessionId, '--text', 'Ready', '--timeout', '15000', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.matched).toBe(true); + expect(envelope.result.timedOut).toBe(false); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); + + it('omits screenHash on a timed-out render wait', () => { + const result = runCli( + [ + 'wait', + sessionId, + '--text', + 'TEXT_THAT_NEVER_APPEARS', + '--timeout', + '2000', + '--json', + ], + { AGENT_TTY_HOME: testHome }, + 15_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.matched).toBe(false); + expect(envelope.result.timedOut).toBe(true); + expect(envelope.result.screenHash).toBeUndefined(); + }); + + it('includes screenHash on the offline host-unreachable matched:false fallback', async () => { + // Settle on the visible marker, then kill the host so the wait falls back + // to offline replay. A screen-stability wait cannot prove the stable + // duration from a single offline snapshot, so it returns matched:false — + // but a Semantic Snapshot was still observed, so the hash is present. + await waitForOutputMarker(testHome, sessionId, 'Ready'); + + crashSession(testHome, sessionId); + await sleep(500); + expect(inspectSession(testHome, sessionId).status).toBe('failed'); + + const result = runCli( + [ + 'wait', + sessionId, + '--screen-stable-ms', + '1000', + '--timeout', + '5000', + '--json', + ], + { AGENT_TTY_HOME: testHome }, + 15_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.matched).toBe(false); + expect(envelope.result.timedOut).toBe(false); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); +}); diff --git a/test/unit/batch/executor.test.ts b/test/unit/batch/executor.test.ts index c23c9e2f..a8fa2233 100644 --- a/test/unit/batch/executor.test.ts +++ b/test/unit/batch/executor.test.ts @@ -434,6 +434,51 @@ describe('executeBatch', () => { }); }); + describe('wait screenHash', () => { + const SCREEN_HASH = 'a'.repeat(64); + + it('carries the observed screenHash onto a matched wait step', async () => { + const { driver } = createFakeDriver({ + waitResults: [ + { + matched: true, + timedOut: false, + capturedAtSeq: 7, + screenHash: SCREEN_HASH, + }, + ], + }); + const result = await executeBatch({ + plan: plan([{ wait: { text: 'done' } }]), + driver, + keepGoing: false, + }); + + expect(result.steps[0]).toMatchObject({ + kind: 'wait', + status: 'completed', + screenHash: SCREEN_HASH, + }); + }); + + it('omits screenHash on a timed-out wait step', async () => { + const { driver } = createFakeDriver({ + waitResults: [{ matched: false, timedOut: true, capturedAtSeq: 5 }], + }); + const result = await executeBatch({ + plan: plan([{ wait: { text: 'never' } }]), + driver, + keepGoing: false, + }); + + expect(result.steps[0]).toMatchObject({ + kind: 'wait', + status: 'failed', + }); + expect(result.steps[0]).not.toHaveProperty('screenHash'); + }); + }); + describe('run completion classification', () => { it('fails a Waited Run that timed out with a timedOut runOutcome', async () => { const { driver } = createFakeDriver({ diff --git a/test/unit/commands/golden-envelopes.test.ts b/test/unit/commands/golden-envelopes.test.ts index 9f9ea408..bd53bd9b 100644 --- a/test/unit/commands/golden-envelopes.test.ts +++ b/test/unit/commands/golden-envelopes.test.ts @@ -910,6 +910,7 @@ const goldenResultContracts: readonly GoldenResultContractCase[] = [ ], }, ], + screenHash: 'a'.repeat(64), }, invalidResult: {}, extraFieldResult: { @@ -1088,6 +1089,7 @@ const goldenResultContracts: readonly GoldenResultContractCase[] = [ cursorRow: 4, cursorCol: 0, capturedAtSeq: 9, + screenHash: 'a'.repeat(64), }, invalidResult: { matched: true, diff --git a/test/unit/commands/snapshot.test.ts b/test/unit/commands/snapshot.test.ts index 75393f58..3320157b 100644 --- a/test/unit/commands/snapshot.test.ts +++ b/test/unit/commands/snapshot.test.ts @@ -62,6 +62,7 @@ vi.mock('../../../src/storage/sessionPaths.js', () => ({ import { createTestSemanticSnapshot } from '../../helpers.js'; import { runSnapshotCommand } from '../../../src/cli/commands/snapshot.js'; +import { computeScreenHash } from '../../../src/renderer/canonicalScreen.js'; import { createLogger } from '../../../src/util/logger.js'; const TEST_CONTEXT = { @@ -424,6 +425,7 @@ describe('snapshot command', () => { const result = { format: 'structured' as const, ...snapshot, + screenHash: computeScreenHash(snapshot), }; mocks.readManifestIfExists.mockResolvedValue(createExitedSessionRecord()); installOfflineReplaySuccessMock(); @@ -605,6 +607,7 @@ describe('snapshot command', () => { cursorRow: 0, cursorCol: 0, text: 'offline output', + screenHash: computeScreenHash(createTestSemanticSnapshot()), }; mocks.sendRpc.mockRejectedValue( new CliError(ERROR_CODES.HOST_UNREACHABLE, 'host unreachable'), diff --git a/test/unit/commands/wait.test.ts b/test/unit/commands/wait.test.ts index 22ffc885..0030387f 100644 --- a/test/unit/commands/wait.test.ts +++ b/test/unit/commands/wait.test.ts @@ -40,6 +40,7 @@ vi.mock('../../../src/storage/sessionPaths.js', () => ({ })); import { createTestSemanticSnapshot } from '../../helpers.js'; +import { computeScreenHash } from '../../../src/renderer/canonicalScreen.js'; import { runWaitCommand } from '../../../src/cli/commands/wait.js'; import { createLogger } from '../../../src/util/logger.js'; @@ -608,6 +609,12 @@ describe('wait command', () => { cursorRow: 0, cursorCol: 0, capturedAtSeq: 15, + screenHash: computeScreenHash( + createTestSemanticSnapshot({ + capturedAtSeq: 15, + visibleLines: [{ row: 0, text: 'offline hello output' }], + }), + ), }, lines: ['Matched: hello', 'Cursor: row 0, col 0', 'capturedAtSeq: 15'], }); @@ -666,6 +673,14 @@ describe('wait command', () => { cursorRow: 2, cursorCol: 3, capturedAtSeq: 17, + screenHash: computeScreenHash( + createTestSemanticSnapshot({ + capturedAtSeq: 17, + visibleLines: [{ row: 0, text: 'offline Ready output' }], + cursorRow: 2, + cursorCol: 3, + }), + ), }, lines: [ 'Host became unreachable before the wait condition could be fully verified; returning the latest offline snapshot state.', diff --git a/test/unit/renderer/canonicalScreen.test.ts b/test/unit/renderer/canonicalScreen.test.ts new file mode 100644 index 00000000..e37a79ef --- /dev/null +++ b/test/unit/renderer/canonicalScreen.test.ts @@ -0,0 +1,66 @@ +import { describe, expect, it } from 'vitest'; + +import { + canonicalVisibleLines, + canonicalVisibleText, + computeScreenHash, +} from '../../../src/renderer/canonicalScreen.js'; + +const linesOf = (...texts: string[]) => ({ + visibleLines: texts.map((text) => ({ text })), +}); + +describe('canonicalVisibleLines / canonicalVisibleText', () => { + it('returns the verbatim line texts and their newline join', () => { + const snapshot = linesOf('one', 'two', 'three'); + + expect(canonicalVisibleLines(snapshot)).toEqual(['one', 'two', 'three']); + expect(canonicalVisibleText(snapshot)).toBe('one\ntwo\nthree'); + }); +}); + +describe('computeScreenHash', () => { + it('returns the same hash for identical visible lines', () => { + const a = linesOf('alpha', 'beta'); + const b = linesOf('alpha', 'beta'); + + expect(computeScreenHash(a)).toBe(computeScreenHash(b)); + }); + + it('ignores fields outside visibleLines (e.g. cursor position)', () => { + const base = linesOf('alpha', 'beta'); + const withCursor = { + ...base, + cursorRow: 7, + cursorCol: 13, + }; + + expect(computeScreenHash(withCursor)).toBe(computeScreenHash(base)); + }); + + it('changes when a single glyph changes', () => { + const before = linesOf('alpha', 'beta'); + const after = linesOf('alpha', 'beto'); + + expect(computeScreenHash(after)).not.toBe(computeScreenHash(before)); + }); + + it('changes when only trailing whitespace differs (no normalization)', () => { + const trimmed = linesOf('alpha', 'beta'); + const trailing = linesOf('alpha', 'beta '); + + expect(computeScreenHash(trailing)).not.toBe(computeScreenHash(trimmed)); + }); + + it('pins the canonical digest of a fixed non-ASCII fixture', () => { + // 'café' is "cafe" + a combining acute accent (NFD); '漢字' exercises + // multibyte UTF-8; the third line carries trailing spaces. Pinning the + // concrete digest locks the canonical string assembly and UTF-8 encoding. + const fixture = linesOf('café', '漢字', 'trailing '); + + expect(canonicalVisibleText(fixture)).toBe('café\n漢字\ntrailing '); + expect(computeScreenHash(fixture)).toBe( + 'e813b95ab8cd844d5a3eff7d6e447a3c3c0cc79300085c701f2b9193efbaa1f3', + ); + }); +}); diff --git a/test/unit/renderer/ghosttyWebDecode.test.ts b/test/unit/renderer/ghosttyWebDecode.test.ts new file mode 100644 index 00000000..dd33acc5 --- /dev/null +++ b/test/unit/renderer/ghosttyWebDecode.test.ts @@ -0,0 +1,158 @@ +import { describe, expect, it } from 'vitest'; + +import { + assembleCanonicalLine, + stripTrailingAsciiSpaces, + type GhosttyDecodedColumn, +} from '../../../src/renderer/ghosttyWeb/backend.js'; + +// A decoded column as the ghostty-web harness reads it. `grapheme` mirrors +// wasmTerm.getGraphemeString(row, col). The lib's empty-array fallback +// (node_modules/ghostty-web/dist/ghostty-web.js: +// `getGraphemeString = !g||g.length===0 ? " " : String.fromCodePoint(...g)`) +// suggests a blank yields ' ', but the live engine returns getGrapheme === [0] +// for a blank cell, so the fallback never fires and getGraphemeString actually +// returns String.fromCodePoint(0) === a NUL — confirmed by booting +// both backends over 'hi' (see test/integration/cross-backend-screen-hash). +// assembleCanonicalLine normalizes that lone NUL back to ' ', so the fixtures +// below feed the NUL the engine really emits and assert the normalization. +// `width` mirrors cell.getWidth(): 1 for a normal cell, 2 for a wide glyph's +// lead column, 0 for its trailing spacer column. +type ColumnSpec = GhosttyDecodedColumn; + +const BLANK: ColumnSpec = { grapheme: '\u0000', width: 1 }; +const SPACER: ColumnSpec = { grapheme: '\u0000', width: 0 }; + +function wide(grapheme: string): readonly ColumnSpec[] { + return [{ grapheme, width: 2 }, SPACER]; +} + +function cell(grapheme: string): ColumnSpec { + return { grapheme, width: 1 }; +} + +// Assemble a full line over a fixed-width grid of column specs, padding any +// columns past the supplied cells with blanks (the lib's getCell synthesizes a +// blank width-1 cell out of range). +function assemble(cells: readonly ColumnSpec[], cols: number): string { + return assembleCanonicalLine(cols, (col) => cells[col] ?? BLANK); +} + +describe('assembleCanonicalLine (ghostty-web canonical visible text)', () => { + it('drops wide-glyph trailing spacers so a CJK row matches the native backend text', () => { + // Native libghostty-vt pins this exact layout's visibleLines[].text as + // 'A漢字B' (test/unit/renderer/libghosttyVtBackend.test.ts:301) and its + // cells[] carries each spacer as '' — so the converged ghostty-web text + // must NOT inject a space for the spacer columns. + const row: ColumnSpec[] = [ + cell('A'), + ...wide('漢'), + ...wide('字'), + cell('B'), + ]; + expect(assemble(row, row.length)).toBe('A漢字B'); + }); + + it('drops the emoji wide-glyph spacer while keeping a real trailing-content space', () => { + // 'rocket 🚀 done' — the 🚀 occupies cols 7-8 (8 is the width-0 spacer), + // col 9 is a genuine space, then 'done'. Matches the native row layout in + // libghosttyVtBackend.test.ts. + const row: ColumnSpec[] = [ + cell('r'), + cell('o'), + cell('c'), + cell('k'), + cell('e'), + cell('t'), + cell(' '), + ...wide('🚀'), + cell(' '), + cell('d'), + cell('o'), + cell('n'), + cell('e'), + ]; + expect(assemble(row, row.length)).toBe('rocket 🚀 done'); + }); + + it('preserves interior blank cells as single spaces', () => { + const row: ColumnSpec[] = [cell('a'), BLANK, BLANK, cell('b')]; + expect(assemble(row, 4)).toBe('a b'); + }); + + it('right-trims trailing ASCII spaces only, padding out to the full width', () => { + const row: ColumnSpec[] = [cell('h'), cell('i')]; + expect(assemble(row, 10)).toBe('hi'); + }); + + it('keeps non-space trailing whitespace (tab) instead of JS trimEnd', () => { + const row: ColumnSpec[] = [cell('h'), cell('i'), cell('\t')]; + // A bare trailing tab survives; a following blank ASCII space is trimmed. + expect(assemble(row, 5)).toBe('hi\t'); + }); + + it('preserves a full NFD combining-mark grapheme cluster', () => { + // NFD 'é' = 'e' + U+0301. getGraphemeString returns the whole cluster for + // the lead cell; the old getChars() path would have dropped the mark. + const combined = 'é'; + const row: ColumnSpec[] = [cell(combined), cell('x')]; + expect(assemble(row, 4)).toBe(`${combined}x`); + }); + + it('preserves an emoji ZWJ grapheme cluster', () => { + // Family emoji built with ZWJ; the harness reads it as one wide grapheme. + const family = '\u{1F468}‍\u{1F469}‍\u{1F467}'; + const row: ColumnSpec[] = [...wide(family), cell('!')]; + expect(assemble(row, row.length)).toBe(`${family}!`); + }); + + it('returns the empty string for an all-blank line', () => { + expect(assemble([], 8)).toBe(''); + }); + + it('reads exactly cols columns regardless of how many cells are supplied', () => { + const seen: number[] = []; + assembleCanonicalLine(6, (col) => { + seen.push(col); + return BLANK; + }); + expect(seen).toEqual([0, 1, 2, 3, 4, 5]); + }); + + it('rejects a non-string grapheme', () => { + expect(() => + assembleCanonicalLine(1, () => ({ + grapheme: 42 as unknown as string, + width: 1, + })), + ).toThrow('decoded grapheme must be a string'); + }); + + it('rejects a negative or non-integer width', () => { + expect(() => + assembleCanonicalLine(1, () => ({ grapheme: 'a', width: -1 })), + ).toThrow('decoded cell width must be a non-negative integer'); + expect(() => + assembleCanonicalLine(1, () => ({ grapheme: 'a', width: 1.5 })), + ).toThrow('decoded cell width must be a non-negative integer'); + }); +}); + +describe('stripTrailingAsciiSpaces', () => { + it('removes only trailing 0x20 spaces', () => { + expect(stripTrailingAsciiSpaces('abc ')).toBe('abc'); + }); + + it('leaves interior spaces and the string untouched when there is no trailing space', () => { + expect(stripTrailingAsciiSpaces('a b c')).toBe('a b c'); + }); + + it('does not strip a trailing tab or non-breaking space', () => { + expect(stripTrailingAsciiSpaces('abc\t')).toBe('abc\t'); + expect(stripTrailingAsciiSpaces('abc ')).toBe('abc '); + }); + + it('returns the empty string for an all-space input', () => { + expect(stripTrailingAsciiSpaces(' ')).toBe(''); + }); +}); diff --git a/test/unit/renderer/libghosttyVtBackend.test.ts b/test/unit/renderer/libghosttyVtBackend.test.ts index 1d66a22a..bf01833f 100644 --- a/test/unit/renderer/libghosttyVtBackend.test.ts +++ b/test/unit/renderer/libghosttyVtBackend.test.ts @@ -263,9 +263,16 @@ describe('LibghosttyVtBackend', () => { cursorRow: 1, cursorCol: 2, isAltScreen: true, + // snapshot() pads visibleLines to exactly `rows` with blank trailing + // lines so the canonical visible text converges with the ghostty-web + // backend (see padVisibleLinesToRows). The native fixture emits 2 lines + // for rows: 5, so rows 2-4 are padded blanks. visibleLines: [ { row: 0, text: 'hello world' }, { row: 1, text: 'prompt>' }, + { row: 2, text: '' }, + { row: 3, text: '' }, + { row: 4, text: '' }, ], scrollbackLines: [{ row: 0, text: 'scrolled' }], cells: [ diff --git a/test/unit/snapshot/capture.test.ts b/test/unit/snapshot/capture.test.ts index d0055559..12ddb9a9 100644 --- a/test/unit/snapshot/capture.test.ts +++ b/test/unit/snapshot/capture.test.ts @@ -13,6 +13,7 @@ import { createSnapshotResult, persistSnapshotArtifact, } from '../../../src/snapshot/capture.js'; +import { computeScreenHash } from '../../../src/renderer/canonicalScreen.js'; import { readArtifactManifest } from '../../../src/storage/artifactManifest.js'; import { artifactPath, @@ -40,6 +41,7 @@ describe('snapshot capture', () => { expect(createSnapshotResult(snapshot, 'structured')).toEqual({ format: 'structured', ...snapshot, + screenHash: computeScreenHash(snapshot), }); }); @@ -223,7 +225,11 @@ describe('snapshot capture', () => { rendererBackend: 'test-backend', }); - expect(result).toEqual({ format: 'structured', ...snapshot }); + expect(result).toEqual({ + format: 'structured', + ...snapshot, + screenHash: computeScreenHash(snapshot), + }); const filename = snapshotFilename(5, 'structured'); expect( JSON.parse( @@ -268,6 +274,7 @@ describe('snapshot capture', () => { cursorRow: 0, cursorCol: 0, text: 'first visible line\nsecond visible line', + screenHash: computeScreenHash(snapshot), }); const filename = snapshotFilename(5, 'text'); @@ -314,6 +321,7 @@ describe('snapshot capture', () => { cursorRow: 0, cursorCol: 0, text: 'scrolled\naway\nvisible output', + screenHash: computeScreenHash(snapshot), }); const filename = snapshotFilename(5, 'text'); diff --git a/test/unit/util/hash.test.ts b/test/unit/util/hash.test.ts new file mode 100644 index 00000000..9d48f1af --- /dev/null +++ b/test/unit/util/hash.test.ts @@ -0,0 +1,27 @@ +import { createHash } from 'node:crypto'; + +import { describe, expect, it } from 'vitest'; + +import { sha256Hex } from '../../../src/util/hash.js'; + +describe('sha256Hex', () => { + it('matches the known SHA-256 digest of "abc"', () => { + expect(sha256Hex('abc')).toBe( + 'ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad', + ); + }); + + it('returns the empty-string digest for ""', () => { + expect(sha256Hex('')).toBe( + 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', + ); + }); + + it('hashes the UTF-8 bytes of a non-ASCII string', () => { + const value = 'café漢字'; + + expect(sha256Hex(value)).toBe( + createHash('sha256').update(Buffer.from(value, 'utf8')).digest('hex'), + ); + }); +});