From 8d243ae7647c2c7f44eca39e945bb14a818587b6 Mon Sep 17 00:00:00 2001 From: Thomas Kosiewski Date: Fri, 5 Jun 2026 13:41:05 +0200 Subject: [PATCH 1/4] docs(context): add Screen Hash glossary term Capture the screenHash design from the grill-with-docs pass: a stable digest of normalized visible screen text, computed from the same canonical visible text as the Screen Stability check and text Render Wait matching (unified, no behavior change), and distinct from the screenshot pixel sha256. Change-Id: I73d19ebe921f316bff9dab166c8ba756f0cdd3fe Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: Thomas Kosiewski --- CONTEXT.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/CONTEXT.md b/CONTEXT.md index 804b9f18..d85c5479 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -43,6 +43,10 @@ _Avoid_: Visual wait, snapshot wait A render condition where the visible text content of a **Semantic Snapshot** has remained unchanged for a requested duration. _Avoid_: Settled screen +**Screen Hash**: +A stable digest of a **Session**'s normalized visible screen text at a captured event-log sequence, used to tell whether the rendered screen content changed between two observations. It is computed from the same canonical visible text that the **Screen Stability** check and text **Render Wait** matching use, so the three never disagree. +_Avoid_: Screen checksum, frame hash, screenshot hash + **Batch**: An ordered sequence of **Batch Steps** driven through one **Command Target** in a single `batch` invocation. It runs fail-fast: the first failed **Batch Step** stops the run unless the caller opts into continuing. _Avoid_: Pipeline, script, macro @@ -228,6 +232,9 @@ _Avoid_: bare "agent", "Coder agent" - A **Render Wait** may include text, regex, cursor, or **Screen Stability** conditions. - A **Render Wait** may be evaluated by live host polling for a **Live Host Eligible Session** or by offline replay fallback for an **Offline Replay Eligible Session**. - Offline replay fallback can evaluate snapshot content and cursor position, but cannot prove elapsed **Screen Stability** duration from a single latest **Semantic Snapshot**. +- A **Screen Hash** changes exactly when the canonical visible text that the **Screen Stability** check compares changes; the two share one definition. +- A **Screen Hash** covers visible screen text only — not scrollback, cursor position, or styles — and is distinct from the pixel `sha256` recorded on a **Screenshot Result**. +- A **Snapshot Result** and a matched **Render Wait** result may carry the **Screen Hash** of the **Semantic Snapshot** they observed; a **Render Wait** that times out or finds the host unreachable carries none. - A **Waited Run** may produce one **Run Completion**, time out for its caller, or be interrupted by **Session** exit. - Caller timeout does not cancel the underlying **Run Completion**; it may still be observed later to keep internal completion bytes out of artifacts. - After **Session** exit, an unobserved **Run Completion** can no longer arrive. From ef28d84bbcb19edcad4fc213b12887dc77e7a3ad Mon Sep 17 00:00:00 2001 From: Thomas Kosiewski Date: Fri, 5 Jun 2026 13:46:23 +0200 Subject: [PATCH 2/4] docs(prd): add screenHash PRD Product requirements for an optional screenHash field on snapshot and render-wait results: a stable content change-token computed from the canonical visible text and unified with the Screen Stability compare, with no behavior change. Produced via the to-prd flow. Change-Id: Icbd22c29b3785476ee13ed62c2c100cdc45d7c22 Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: Thomas Kosiewski --- docs/prd/screen-hash/PRD.md | 59 +++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 docs/prd/screen-hash/PRD.md diff --git a/docs/prd/screen-hash/PRD.md b/docs/prd/screen-hash/PRD.md new file mode 100644 index 00000000..d4238078 --- /dev/null +++ b/docs/prd/screen-hash/PRD.md @@ -0,0 +1,59 @@ +# PRD: Screen Hash on snapshot and wait results + +## Problem Statement + +A caller — often an AI coding agent — driving a **Session** repeatedly needs a cheap, reliable way to answer "did the rendered screen actually change since I last looked?" Today the only per-result identifier is the captured event-log sequence, but that advances on every chunk of output, including output that changes nothing visible: cursor-position queries, terminal-mode toggles, a spinner repainting the same glyphs. So two observations with different sequences can be the identical screen, and a caller comparing sequences sees changes that are not there. There is no stable token for the screen's content itself. + +## Solution + +Snapshot results and matched **Render Wait** results gain an optional **Screen Hash**: a stable digest of the **Session**'s normalized visible screen text at the captured event-log sequence. Equal hashes mean the visible content is identical; a changed hash means it genuinely changed. The **Screen Hash** is computed from the same canonical visible text that the **Screen Stability** check and text **Render Wait** matching already use, so "the hash changed" and "the stability check saw a change" can never disagree. + +## User Stories + +1. As an AI coding agent, I want a stable hash of the screen content on each snapshot, so that I can tell across two CLI calls whether the visible screen actually changed without diffing full text myself. +2. As an agent, I want the hash to stay equal when only the cursor moved, so that cursor motion alone does not look like a content change. +3. As an agent, I want the hash to stay equal when output occurred that changed nothing visible, so that I am not misled by the captured sequence advancing on a no-op repaint. +4. As an agent, I want the hash to change whenever the visible text changes, so that I can trust it as a content-changed signal. +5. As a caller, I want the **Screen Hash** on the snapshot result in both structured and text formats, so that I get it regardless of how I read the screen. +6. As a caller, I want the **Screen Hash** on a matched render-wait result, so that I know the content identity at the moment my wait condition was satisfied. +7. As a caller, I want a render wait that times out or finds the host unreachable to simply omit the hash, so that a missing hash unambiguously means "no screen was observed" rather than signalling an error. +8. As a tooling author, I want the **Screen Hash** to be renderer-independent — the same screen yields the same hash under either renderer backend — so that I can compare hashes across sessions rendered by different backends. +9. As a maintainer, I want the **Screen Hash**, the **Screen Stability** compare, and text **Render Wait** matching to share one canonical visible-text definition, so that they can never disagree about what "the screen" is. +10. As a maintainer, I want adding the **Screen Hash** to make no change to the shipped screen-stability behavior, so that existing waits behave exactly as before. +11. As a caller, I want to understand that the **Screen Hash** is distinct from a screenshot's pixel digest, so that I use the right identity for content versus pixels. +12. As a caller, I want to understand that the **Screen Hash** covers the visible screen only, even though the text snapshot format also includes scrollback, so that I am not surprised that the hash ignores scrollback growth. +13. As a tool building recordings, I want a per-frame content hash, so that I can dedup consecutive identical frames in artifacts. +14. As a caller using `--json`, I want the hash as a lowercase 64-character hex string validated by the same digest schema as other hashes, so that the field shape is predictable. +15. As a caller, I want the **Screen Hash** to be optional on results, so that older artifacts and hosts that predate it still parse. + +## Implementation Decisions + +- Add an optional **Screen Hash** field — a lowercase 64-character SHA-256 hex digest — to the snapshot result (both structured and text formats) and to the matched render-wait result. +- The **Screen Hash** is the SHA-256 of the canonical visible-text string: the visible lines joined by newline, exactly as the host's screen-stability compare and the text matcher already build it. Trailing whitespace is kept (no new normalization), so adding the hash makes zero change to the shipped screen-stability behavior. Cursor position, text styles, and scrollback are excluded. +- Extract one shared canonical-screen-text helper and route the **Screen Hash**, the host **Screen Stability** compare, and the text **Render Wait** matcher through it, so the three share a single definition and cannot diverge. +- A render wait that times out or finds the host unreachable carries no **Screen Hash**, because there is no observed **Semantic Snapshot** to hash. On a matched wait, the hash is that of the matched snapshot. +- Do not surface the **Screen Hash** on inspection or any path that does not already render a **Semantic Snapshot**; computing it must never force a renderer bootstrap that would not otherwise happen. +- Reuse the existing SHA-256 hex validator, consolidating its duplicate definitions into one. +- The field is optional so existing persisted artifacts and older hosts continue to parse. + +## Testing Decisions + +Good tests assert external behavior, not implementation details. + +- **Canonical-text and hash helper (unit).** Same screen yields the same hash; cursor-only movement yields the same hash; a single visible-glyph change yields a different hash; a trailing-whitespace-only difference yields a different hash — proving trailing whitespace is intentionally retained and the behavior is unchanged. +- **Shared definition (unit).** The host **Screen Stability** compare and the **Render Wait** matcher consume the same canonical string the hash uses, so a later change to one cannot silently diverge from the others, and screen-stability behavior is demonstrably unchanged. +- **Cross-backend hash equality.** The same event log produces the same **Screen Hash** under both renderer backends, pinning the renderer-independence guarantee that is currently only an assumption. +- **Snapshot and wait envelope (integration).** Against an isolated home: the **Screen Hash** is present on a snapshot (structured and text) and on a matched wait, and absent on a timed-out wait. The existing CLI integration tests are prior art. + +## Out of Scope + +- A scrollback hash. The **Screen Hash** is visible-screen-only; a separate scrollback digest can be added later if a concrete need appears. +- A styled or per-cell hash. Transient style churn would make such a hash flap; the **Screen Hash** is text-content identity only. +- Pixel-level identity. That is already served by the screenshot pixel digest; the **Screen Hash** is its semantic counterpart and the two are not interchangeable. +- New wait semantics built on the hash (for example, "wait until the screen content changes"). v1 only exposes the field; any hash-driven wait is future scope. +- Any change to the screen-stability behavior. The unify is deliberately behavior-preserving. + +## Further Notes + +- The motivation differs from the comparable tool virtui, which hashes to avoid shipping screen bytes over a socket. agent-tty is a local CLI, so the value here is the stable content change-token and frame dedup, not transfer avoidance. +- The **Screen Hash** term is defined in the project glossary; this PRD and that term are on branch `feat/screen-hash`. No ADR was needed: the design is behavior-preserving and easily reversible — an optional field over the canonical string that already exists. From 71ee0e74b54c66638f298eaee105308fd76cf716 Mon Sep 17 00:00:00 2001 From: Thomas Kosiewski Date: Sat, 6 Jun 2026 13:14:46 +0200 Subject: [PATCH 3/4] feat(screen-hash): add Screen Hash to snapshot and render-wait results MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add an optional `screenHash` field — a lowercase 64-char SHA-256 of the canonical visible-screen text (`visibleLines[].text` joined by `\n`) — to snapshot results (structured and text) and to render-wait results that observed a Semantic Snapshot. It gives a caller a stable content-change token that is unaffected by cursor motion or no-op repaints. - Extract one shared canonical-visible-text helper (src/renderer/canonicalScreen.ts) and route the Screen Hash, the host Screen Stability compare, and the text Render Wait matcher through it; add a UTF-8-pinned sha256Hex util and consolidate the duplicate Sha256HexSchema into one exported definition. - Hash any observed snapshot: matched live waits, snapshot captures, and the offline host-unreachable matched:false fallback carry it; live timeouts, consecutive-failure giveups, and replay errors omit it. - Carry the hash on matched batch wait-step records. - Converge both renderer backends on one canonical screen form so the hash is renderer-independent: ghostty-web now decodes full grapheme clusters, keeps interior blank cells as spaces, and right-trims ASCII spaces only; libghostty-vt pads visible lines to exactly `rows`. This intentionally aligns the default ghostty-web stability/text-wait comparand on grapheme / interior-gap / non-ASCII-trailing screens, pinned by characterization and cross-backend tests. Closes #125. Change-Id: I698af5dbf8f66c70f49661712652b24d70415f0a Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: Thomas Kosiewski --- CONTEXT.md | 2 +- docs/USAGE.md | 10 +- docs/prd/screen-hash/PRD.md | 26 +- skill-data/agent-tty/SKILL.md | 2 + src/batch/executor.ts | 3 + src/batch/result.ts | 2 + src/cli/commands/wait.ts | 2 + src/host/hostMain.ts | 9 +- src/protocol/schemas.ts | 5 +- src/renderWait/matcher.ts | 8 +- src/renderer/canonicalScreen.ts | 41 +++ src/renderer/ghosttyWeb/backend.ts | 192 ++++++++++++-- src/renderer/libghosttyVt/backend.ts | 35 ++- src/renderer/types.ts | 7 +- src/snapshot/capture.ts | 6 +- src/util/hash.ts | 9 + .../cross-backend-screen-hash.test.ts | 184 ++++++++++++++ test/integration/screen-hash.test.ts | 234 ++++++++++++++++++ test/unit/batch/executor.test.ts | 45 ++++ test/unit/commands/golden-envelopes.test.ts | 2 + test/unit/commands/snapshot.test.ts | 3 + test/unit/commands/wait.test.ts | 15 ++ test/unit/renderer/canonicalScreen.test.ts | 66 +++++ test/unit/renderer/ghosttyWebDecode.test.ts | 158 ++++++++++++ .../unit/renderer/libghosttyVtBackend.test.ts | 7 + test/unit/snapshot/capture.test.ts | 10 +- test/unit/util/hash.test.ts | 27 ++ 27 files changed, 1068 insertions(+), 42 deletions(-) create mode 100644 src/renderer/canonicalScreen.ts create mode 100644 src/util/hash.ts create mode 100644 test/integration/cross-backend-screen-hash.test.ts create mode 100644 test/integration/screen-hash.test.ts create mode 100644 test/unit/renderer/canonicalScreen.test.ts create mode 100644 test/unit/renderer/ghosttyWebDecode.test.ts create mode 100644 test/unit/util/hash.test.ts diff --git a/CONTEXT.md b/CONTEXT.md index d85c5479..28d4843d 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -234,7 +234,7 @@ _Avoid_: bare "agent", "Coder agent" - Offline replay fallback can evaluate snapshot content and cursor position, but cannot prove elapsed **Screen Stability** duration from a single latest **Semantic Snapshot**. - A **Screen Hash** changes exactly when the canonical visible text that the **Screen Stability** check compares changes; the two share one definition. - A **Screen Hash** covers visible screen text only — not scrollback, cursor position, or styles — and is distinct from the pixel `sha256` recorded on a **Screenshot Result**. -- A **Snapshot Result** and a matched **Render Wait** result may carry the **Screen Hash** of the **Semantic Snapshot** they observed; a **Render Wait** that times out or finds the host unreachable carries none. +- A result carries the **Screen Hash** of the **Semantic Snapshot** it observed: a **Snapshot Result**, a matched **Render Wait** result, and the offline host-unreachable fallback that still observed a snapshot (even when it reports `matched: false` because **Screen Stability** duration could not be proven offline). The hash is keyed on whether a snapshot was observed, not on whether the wait matched; a **Render Wait** that observes no snapshot — a live timeout, a consecutive-failure giveup, or a replay error — carries none. - A **Waited Run** may produce one **Run Completion**, time out for its caller, or be interrupted by **Session** exit. - Caller timeout does not cancel the underlying **Run Completion**; it may still be observed later to keep internal completion bytes out of artifacts. - After **Session** exit, an unobserved **Run Completion** can no longer arrive. diff --git a/docs/USAGE.md b/docs/USAGE.md index ae2c3d5f..7c33467e 100644 --- a/docs/USAGE.md +++ b/docs/USAGE.md @@ -104,6 +104,14 @@ Useful flags: - `--exit`: wait for the process to exit. - `--timeout `: maximum wait time in milliseconds, with `0` meaning infinite. +### Screen Hash + +`snapshot` results (both `--format structured` and `--format text`) and a **matched** `wait` result carry an optional `screenHash`: a lowercase 64-character hex SHA-256 of the visible screen text. Compare it across two calls to tell whether the visible screen actually changed — equal hashes mean identical visible content, even if the event-log sequence advanced on a no-op repaint. + +- It hashes the visible screen only. It is **not** a hash of the `--format text` output, which also includes scrollback, so the hash ignores scrollback growth. +- It is distinct from the `screenshot` result's pixel `sha256`: `screenHash` is content identity, the screenshot `sha256` is pixel identity, and the two are not interchangeable. +- A `wait` that times out (or finds the host unreachable with no observed screen) omits `screenHash`, so a missing hash unambiguously means "no screen was observed" rather than an error. + ## `batch` Use `batch` to run an ordered sequence of input-and-`wait` steps against one session in a single invocation, instead of coordinating separate `run`/`type`/`paste`/`send-keys`/`wait` calls. Each `wait` step is anchored to a Wait Baseline — it only considers screen state produced _after_ the preceding input step — so a batch cannot race ahead and match a stale screen the way a hand-written shell loop can. @@ -174,7 +182,7 @@ The `--json` result is a per-step envelope: } ``` -Each step record carries its `index`, `kind`, `status` (`completed` | `failed` | `not-run` | `interrupted`), and `durationMs`. Input steps report the Event Log `seq` they produced; `wait` steps report the `waitBaseline` they were anchored to plus `matched` / `timedOut` / `matchedText` / `capturedAtSeq`. `completedCount` and `failedIndices` summarize the run. A fail-fast batch exits non-zero with the failed step's exit code (e.g. `11` for a `WAIT_TIMEOUT`); `--keep-going` exits `1` if any step failed. If the process is interrupted by SIGINT/SIGTERM, batch flushes the same envelope with the in-flight step marked `interrupted` and later steps `not-run`, then exits non-zero. +Each step record carries its `index`, `kind`, `status` (`completed` | `failed` | `not-run` | `interrupted`), and `durationMs`. Input steps report the Event Log `seq` they produced; `wait` steps report the `waitBaseline` they were anchored to plus `matched` / `timedOut` / `matchedText` / `capturedAtSeq`, and a matched `wait` step also carries the `screenHash` of the screen it observed (see [Screen Hash](#screen-hash)). `completedCount` and `failedIndices` summarize the run. A fail-fast batch exits non-zero with the failed step's exit code (e.g. `11` for a `WAIT_TIMEOUT`); `--keep-going` exits `1` if any step failed. If the process is interrupted by SIGINT/SIGTERM, batch flushes the same envelope with the in-flight step marked `interrupted` and later steps `not-run`, then exits non-zero. The Wait Baseline fixes stale-match only. It does **not** fix echo-match: a `wait` can still match the terminal's echo of a just-typed command (the echo renders _after_ the baseline). Use a distinctive output token or a `screenStableMs` wait rather than waiting for text you just typed. Interrupting a batch mid-`wait` leaves that wait's command still running on the session (the wait is abandoned, not cancelled), exactly like a caller timeout on `run`. diff --git a/docs/prd/screen-hash/PRD.md b/docs/prd/screen-hash/PRD.md index d4238078..ee154ddf 100644 --- a/docs/prd/screen-hash/PRD.md +++ b/docs/prd/screen-hash/PRD.md @@ -16,10 +16,10 @@ Snapshot results and matched **Render Wait** results gain an optional **Screen H 4. As an agent, I want the hash to change whenever the visible text changes, so that I can trust it as a content-changed signal. 5. As a caller, I want the **Screen Hash** on the snapshot result in both structured and text formats, so that I get it regardless of how I read the screen. 6. As a caller, I want the **Screen Hash** on a matched render-wait result, so that I know the content identity at the moment my wait condition was satisfied. -7. As a caller, I want a render wait that times out or finds the host unreachable to simply omit the hash, so that a missing hash unambiguously means "no screen was observed" rather than signalling an error. +7. As a caller, I want the hash present whenever a result holds an **observed** **Semantic Snapshot** — including the offline host-unreachable `matched: false` fallback that still observed a snapshot — and omitted only when no snapshot was observed (a live timeout, a consecutive-failure giveup, or a replay error), so that a missing hash unambiguously means "no screen was observed" rather than signalling an error. 8. As a tooling author, I want the **Screen Hash** to be renderer-independent — the same screen yields the same hash under either renderer backend — so that I can compare hashes across sessions rendered by different backends. 9. As a maintainer, I want the **Screen Hash**, the **Screen Stability** compare, and text **Render Wait** matching to share one canonical visible-text definition, so that they can never disagree about what "the screen" is. -10. As a maintainer, I want adding the **Screen Hash** to make no change to the shipped screen-stability behavior, so that existing waits behave exactly as before. +10. As a maintainer, I want adding the **Screen Hash** and routing the three consumers through one shared canonical-text definition to make no change in itself to the shipped screen-stability behavior, so that the only behavior change is the deliberate, characterization-pinned Phase 1 renderer convergence — not an accidental side effect of the hash. 11. As a caller, I want to understand that the **Screen Hash** is distinct from a screenshot's pixel digest, so that I use the right identity for content versus pixels. 12. As a caller, I want to understand that the **Screen Hash** covers the visible screen only, even though the text snapshot format also includes scrollback, so that I am not surprised that the hash ignores scrollback growth. 13. As a tool building recordings, I want a per-frame content hash, so that I can dedup consecutive identical frames in artifacts. @@ -29,31 +29,35 @@ Snapshot results and matched **Render Wait** results gain an optional **Screen H ## Implementation Decisions - Add an optional **Screen Hash** field — a lowercase 64-character SHA-256 hex digest — to the snapshot result (both structured and text formats) and to the matched render-wait result. -- The **Screen Hash** is the SHA-256 of the canonical visible-text string: the visible lines joined by newline, exactly as the host's screen-stability compare and the text matcher already build it. Trailing whitespace is kept (no new normalization), so adding the hash makes zero change to the shipped screen-stability behavior. Cursor position, text styles, and scrollback are excluded. +- In scope: a **Batch Step** record for a matched **Render Wait** step also carries the **Screen Hash**, mirrored from that step's render-wait result, so a batch run exposes the same content identity per wait step that a standalone wait does. +- The **Screen Hash** is the SHA-256 of the canonical visible-text string: the visible lines joined by newline, exactly as the host's screen-stability compare and the text matcher already build it. The shared canonical-text **definition** — `visibleLines[].text` joined by `\n`, sourced only from the snapshot (never `backend.getVisibleText()` or `cells[]`) — is unchanged by adding the hash. Cursor position, text styles, and scrollback are excluded. +- Converging the two renderer backends on one canonical screen form (Phase 1) intentionally changes the **default** `ghostty-web` backend's stability and text-wait **comparand** on screens with grapheme clusters, interior blank-cell gaps, or non-ASCII trailing characters: the canonical form is exactly `rows` lines, each decoded with full grapheme clusters with blank/zero cells as `' '`, then right-trimmed of trailing ASCII spaces (`0x20`) only. This is a deliberate, narrow change pinned by characterization tests, not a free behavior-preserving add; on plain ASCII screens the comparand is unchanged. - Extract one shared canonical-screen-text helper and route the **Screen Hash**, the host **Screen Stability** compare, and the text **Render Wait** matcher through it, so the three share a single definition and cannot diverge. -- A render wait that times out or finds the host unreachable carries no **Screen Hash**, because there is no observed **Semantic Snapshot** to hash. On a matched wait, the hash is that of the matched snapshot. +- The hash is keyed on whether a result holds an **observed** **Semantic Snapshot**, not on whether the wait matched. A result carries the **Screen Hash** of the snapshot it observed: a matched live wait, a snapshot capture, and the offline host-unreachable fallback that still observed a latest snapshot (even when it returns `matched: false` because the **Screen Stability** duration could not be proven offline). The hash is omitted only when no snapshot was observed: a live wait that times out, a consecutive-failure giveup, or a replay error throw. - Do not surface the **Screen Hash** on inspection or any path that does not already render a **Semantic Snapshot**; computing it must never force a renderer bootstrap that would not otherwise happen. -- Reuse the existing SHA-256 hex validator, consolidating its duplicate definitions into one. +- Reuse the existing SHA-256 hex validator. The consolidation set is exactly: export `Sha256HexSchema` from `protocol/schemas.ts` and import it in `renderer/types.ts`. Deliberately left out of scope: the standalone regex copies in `storage/artifactManifest.ts` and the `invariant(/^[a-f0-9]{64}$/u.test(...))` checks (for example in `renderer/profiles.ts` and `renderer/bundledFont.ts`), which are not Zod schemas and are not part of this consolidation. - The field is optional so existing persisted artifacts and older hosts continue to parse. ## Testing Decisions Good tests assert external behavior, not implementation details. -- **Canonical-text and hash helper (unit).** Same screen yields the same hash; cursor-only movement yields the same hash; a single visible-glyph change yields a different hash; a trailing-whitespace-only difference yields a different hash — proving trailing whitespace is intentionally retained and the behavior is unchanged. +- **Canonical-text and hash helper (unit).** Same screen yields the same hash; cursor-only movement yields the same hash; a single visible-glyph change yields a different hash; a trailing-whitespace-only difference (before right-trim of ASCII spaces) yields a different hash — proving the canonical form is exactly what is hashed and the behavior is as specified. +- **UTF-8 encoding pinned (unit).** The hash is the SHA-256 of the UTF-8 bytes of the canonical visible text, asserted against a concrete golden digest so the encoding can never silently drift. Golden: a three-row screen whose canonical text is `"a\nb\nc"` hashes to `ea7fb08b7a2dc4619ffb7c7bb38d95a2047935fa165d71b12efd3852a2e6d0cc`. - **Shared definition (unit).** The host **Screen Stability** compare and the **Render Wait** matcher consume the same canonical string the hash uses, so a later change to one cannot silently diverge from the others, and screen-stability behavior is demonstrably unchanged. -- **Cross-backend hash equality.** The same event log produces the same **Screen Hash** under both renderer backends, pinning the renderer-independence guarantee that is currently only an assumption. -- **Snapshot and wait envelope (integration).** Against an isolated home: the **Screen Hash** is present on a snapshot (structured and text) and on a matched wait, and absent on a timed-out wait. The existing CLI integration tests are prior art. +- **Cross-backend hash equality.** The same event log produces the same **Screen Hash** under both renderer backends, pinning the renderer-independence guarantee that is currently only an assumption. This test requires the optional native addon (`@coder/libghostty-vt-node`) and so must run on at least one CI job that has the addon installed; it skips gracefully where the addon is absent (including the sandbox), so the renderer-independence guarantee is not silently unverified. +- **Snapshot and wait envelope (integration).** Against an isolated home: the **Screen Hash** is present on a snapshot (structured and text), on a matched live wait, and on the offline host-unreachable `matched: false` fallback that still observed a snapshot; and absent on a timed-out live wait. The existing CLI integration tests are prior art. ## Out of Scope +- Per-frame **Screen Hash**es on recordings / `record export` (user story 13). v1 attaches the hash only where a result already holds an observed **Semantic Snapshot**; the export paths render no **Semantic Snapshot** per frame, so a recording-frame dedup hash is future scope rather than a v1 deliverable. - A scrollback hash. The **Screen Hash** is visible-screen-only; a separate scrollback digest can be added later if a concrete need appears. - A styled or per-cell hash. Transient style churn would make such a hash flap; the **Screen Hash** is text-content identity only. -- Pixel-level identity. That is already served by the screenshot pixel digest; the **Screen Hash** is its semantic counterpart and the two are not interchangeable. +- Pixel-level identity, and any **Screen Hash** on the **Screenshot Result**. A **Screenshot Result** carries only its pixel `sha256`; the content hash lives on the snapshot and wait results. The **Screen Hash** is the semantic counterpart to the pixel digest and the two are not interchangeable. - New wait semantics built on the hash (for example, "wait until the screen content changes"). v1 only exposes the field; any hash-driven wait is future scope. -- Any change to the screen-stability behavior. The unify is deliberately behavior-preserving. +- Any change to the screen-stability behavior **beyond** the Phase 1 renderer-convergence change described in the Implementation Decisions. The canonical-text definition and the shared single-source unify are behavior-preserving; the only intended behavior change is the default `ghostty-web` backend's comparand on grapheme / interior-gap / non-ASCII-trailing screens, pinned by characterization tests. No new wait semantics are added. ## Further Notes - The motivation differs from the comparable tool virtui, which hashes to avoid shipping screen bytes over a socket. agent-tty is a local CLI, so the value here is the stable content change-token and frame dedup, not transfer avoidance. -- The **Screen Hash** term is defined in the project glossary; this PRD and that term are on branch `feat/screen-hash`. No ADR was needed: the design is behavior-preserving and easily reversible — an optional field over the canonical string that already exists. +- The **Screen Hash** term is defined in the project glossary; this PRD and that term are on branch `feat/screen-hash`. No ADR was needed: the field is an optional add over the canonical string that already exists, and the one intended behavior change — the Phase 1 renderer convergence — is narrow, characterization-pinned, and easily reversible. diff --git a/skill-data/agent-tty/SKILL.md b/skill-data/agent-tty/SKILL.md index d090d61b..bb803dac 100644 --- a/skill-data/agent-tty/SKILL.md +++ b/skill-data/agent-tty/SKILL.md @@ -70,6 +70,8 @@ agent-tty --home "$AGENT_HOME" run "$SESSION_ID" 'pwd && ls -la' --json agent-tty --home "$AGENT_HOME" snapshot "$SESSION_ID" --format text --json ``` +`snapshot` and a matched `wait` carry an optional `screenHash` (a hash of the visible screen text). Compare it across calls to tell whether the visible screen actually changed instead of diffing full text; equal hashes mean identical visible content even when the event sequence advanced. + ### Drive an interactive CLI or TUI Use `batch` to run an ordered sequence of input-and-`wait` steps in one call instead of separate `run`/`wait`/`send-keys` invocations. Each `wait` step is anchored to a Wait Baseline — it only observes screen state produced _after_ the preceding input step, so the sequence cannot race ahead and match a stale screen. A batch stops at the first failed step by default (`--keep-going` attempts every step). diff --git a/src/batch/executor.ts b/src/batch/executor.ts index d6c9155d..1a009f40 100644 --- a/src/batch/executor.ts +++ b/src/batch/executor.ts @@ -256,6 +256,9 @@ async function runWaitStep( timedOut: result.timedOut, ...matchedText, capturedAtSeq: result.capturedAtSeq, + ...(result.screenHash === undefined + ? {} + : { screenHash: result.screenHash }), }; // A timed-out wait (equivalently an unmatched result) is not a thrown error diff --git a/src/batch/result.ts b/src/batch/result.ts index d8522877..30836e81 100644 --- a/src/batch/result.ts +++ b/src/batch/result.ts @@ -2,6 +2,7 @@ import { z } from 'zod'; import type { BatchPlan, BatchStep } from './plan.js'; +import { Sha256HexSchema } from '../protocol/schemas.js'; import { unreachable } from '../util/assert.js'; // `interrupted` is the in-flight step abandoned by a SIGINT/SIGTERM flush (its @@ -66,6 +67,7 @@ export const WaitStepRecordSchema = z timedOut: z.boolean().optional(), matchedText: z.string().optional(), capturedAtSeq: NonNegativeIntSchema.optional(), + screenHash: Sha256HexSchema.optional(), error: BatchStepErrorSchema.optional(), }) .strict(); diff --git a/src/cli/commands/wait.ts b/src/cli/commands/wait.ts index 0df5b761..2729f55c 100644 --- a/src/cli/commands/wait.ts +++ b/src/cli/commands/wait.ts @@ -19,6 +19,7 @@ import { matchRenderWaitSnapshot, prepareRenderWaitCondition, } from '../../renderWait/matcher.js'; +import { computeScreenHash } from '../../renderer/canonicalScreen.js'; import { isTerminalSessionStatus } from '../../protocol/sessionStatusPolicy.js'; import { withOfflineReplayRenderer } from '../../replay/offlineReplay.js'; import { readManifestIfExists } from '../../storage/manifests.js'; @@ -117,6 +118,7 @@ function buildOfflineRenderWaitResult( cursorRow: match.cursorRow, cursorCol: match.cursorCol, capturedAtSeq: match.capturedAtSeq, + screenHash: computeScreenHash(snapshot), }; } diff --git a/src/host/hostMain.ts b/src/host/hostMain.ts index f027c992..2c322ab2 100644 --- a/src/host/hostMain.ts +++ b/src/host/hostMain.ts @@ -33,6 +33,10 @@ import type { WaitForRenderResult, WaitParams, } from '../protocol/messages.js'; +import { + canonicalVisibleText, + computeScreenHash, +} from '../renderer/canonicalScreen.js'; import { DEFAULT_RENDERER_NAME, resolveRendererName, @@ -901,9 +905,7 @@ export async function runHost(sessionId: string): Promise { throwIfAborted(signal); const snapshot = await backend.snapshot(); throwIfAborted(signal); - const visibleText = snapshot.visibleLines - .map((line) => line.text) - .join('\n'); + const visibleText = canonicalVisibleText(snapshot); const capturedAtSeq = snapshot.capturedAtSeq; latestCapturedAtSeq = capturedAtSeq; consecutiveFailures = 0; @@ -943,6 +945,7 @@ export async function runHost(sessionId: string): Promise { cursorRow: match.cursorRow, cursorCol: match.cursorCol, capturedAtSeq, + screenHash: computeScreenHash(snapshot), }); } } catch (pollError) { diff --git a/src/protocol/schemas.ts b/src/protocol/schemas.ts index 97e4d781..70dca889 100644 --- a/src/protocol/schemas.ts +++ b/src/protocol/schemas.ts @@ -25,7 +25,7 @@ export const ReplayTimingModeSchema = z.enum([ ]); export type ReplayTimingMode = z.infer; const SessionEnvSchema = z.record(NonEmptyStringSchema, z.string()); -const Sha256HexSchema = z +export const Sha256HexSchema = z .string() .regex( /^[a-f0-9]{64}$/u, @@ -342,6 +342,7 @@ export const StructuredSnapshotResultSchema = z visibleLines: z.array(VisibleLineSchema), scrollbackLines: z.array(VisibleLineSchema).optional(), cells: z.array(RichSnapshotLineSchema).optional(), + screenHash: Sha256HexSchema.optional(), }) .strict(); export type StructuredSnapshotResult = z.infer< @@ -358,6 +359,7 @@ export const TextSnapshotResultSchema = z cursorRow: NonNegativeIntSchema, cursorCol: NonNegativeIntSchema, text: z.string(), + screenHash: Sha256HexSchema.optional(), }) .strict(); export type TextSnapshotResult = z.infer; @@ -455,6 +457,7 @@ export const WaitForRenderResultSchema = z cursorRow: NonNegativeIntSchema.optional(), cursorCol: NonNegativeIntSchema.optional(), capturedAtSeq: NonNegativeIntSchema, + screenHash: Sha256HexSchema.optional(), }) .strict(); export const RecordExportResultSchema = z diff --git a/src/renderWait/matcher.ts b/src/renderWait/matcher.ts index 371480d8..5578d609 100644 --- a/src/renderWait/matcher.ts +++ b/src/renderWait/matcher.ts @@ -2,6 +2,10 @@ import type { WaitForRenderParams } from '../protocol/messages.js'; import type { SemanticSnapshot } from '../renderer/types.js'; import { ERROR_CODES, makeCliError } from '../protocol/errors.js'; +import { + canonicalVisibleLines, + canonicalVisibleText, +} from '../renderer/canonicalScreen.js'; import { invariant } from '../util/assert.js'; import { MAX_WAIT_FOR_RENDER_REGEX_LENGTH, @@ -301,8 +305,8 @@ export function matchRenderWaitSnapshot( ); } - const visibleLines = snapshot.visibleLines.map((line) => line.text); - const visibleText = visibleLines.join('\n'); + const visibleLines = canonicalVisibleLines(snapshot); + const visibleText = canonicalVisibleText(snapshot); let textMatched = false; let matchedText: string | undefined; diff --git a/src/renderer/canonicalScreen.ts b/src/renderer/canonicalScreen.ts new file mode 100644 index 00000000..f214c60b --- /dev/null +++ b/src/renderer/canonicalScreen.ts @@ -0,0 +1,41 @@ +import { sha256Hex } from '../util/hash.js'; + +/** + * The minimal snapshot shape needed to derive the canonical visible text: the + * ordered visible lines, each carrying its already-decoded `text`. + * + * Compatible with `Pick`. + */ +interface CanonicalScreenSource { + readonly visibleLines: ReadonlyArray<{ readonly text: string }>; +} + +/** + * The ordered canonical visible lines of a snapshot. + * + * The body is VERBATIM the inline expression already at hostMain.ts:904-906 and + * matcher.ts:304-305 — no trim/pad/normalize is applied, so it is + * behavior-preserving. The source is `visibleLines[].text` ONLY; it must NEVER + * read `backend.getVisibleText()` (divergent native impl) or `cells[]` (the + * dashboard's alternate source). + */ +export function canonicalVisibleLines(s: CanonicalScreenSource): string[] { + return s.visibleLines.map((line) => line.text); +} + +/** + * The canonical visible text of a snapshot: its canonical visible lines joined + * with `\n`. See {@link canonicalVisibleLines} for the no-normalization + * guarantee and source constraint. + */ +export function canonicalVisibleText(s: CanonicalScreenSource): string { + return canonicalVisibleLines(s).join('\n'); +} + +/** + * The screen hash of a snapshot: the lowercase 64-character SHA-256 hex of the + * UTF-8 bytes of its canonical visible text. + */ +export function computeScreenHash(s: CanonicalScreenSource): string { + return sha256Hex(canonicalVisibleText(s)); +} diff --git a/src/renderer/ghosttyWeb/backend.ts b/src/renderer/ghosttyWeb/backend.ts index 21636131..5acfad6c 100644 --- a/src/renderer/ghosttyWeb/backend.ts +++ b/src/renderer/ghosttyWeb/backend.ts @@ -471,19 +471,85 @@ const EMBEDDED_HARNESS_HTML = ` return { cols, rows, activeBuffer, viewportY: bottomViewportY }; } + // Strip ONLY trailing ASCII spaces (0x20). Unlike String.prototype.trimEnd + // this preserves other trailing whitespace (tabs, NBSP, etc.), keeping the + // canonical visible text aligned with the libghostty-vt backend. + function stripTrailingAsciiSpaces(text) { + let end = text.length; + while (end > 0 && text.charCodeAt(end - 1) === 0x20) { + end -= 1; + } + return end === text.length ? text : text.slice(0, end); + } + + // Build a single canonical line by concatenating each column's FULL + // grapheme cluster, then right-trimming trailing ASCII spaces only. + // readColumn(col) returns { grapheme, width }: grapheme is the cell's + // full grapheme cluster; width is the cell's column span. A wide glyph's + // trailing spacer has width 0 and contributes NOTHING, so a row of + // 'A'+wide('漢')+wide('字')+'B' decodes to 'A漢字B' — matching the + // libghostty-vt backend's visibleLines[].text (its cells[] likewise + // carries the spacer as '' rather than ' '). A genuine blank interior + // cell decodes to a single ' ' so interior gaps survive and trailing + // gaps trim away. The live engine returns the NUL codepoint (U+0000) for + // a blank cell — getGrapheme yields [0], so getGraphemeString runs + // String.fromCodePoint(0) and produces a NUL, not ' ' (its empty-array + // ' ' fallback never fires). Those NULs would survive + // stripTrailingAsciiSpaces (it strips only 0x20) and diverge from the + // native backend, so a kept cell whose grapheme is a lone NUL is + // normalized to ' ' here. + function decodeGraphemeLine(readColumn, cols) { + let text = ''; + for (let col = 0; col < cols; col += 1) { + const column = readColumn(col); + invariant( + column !== null && typeof column === 'object', + 'decoded column must be an object', + ); + assertStringValue( + column.grapheme, + 'decoded grapheme must be a string', + ); + invariant( + Number.isInteger(column.width) && column.width >= 0, + 'decoded cell width must be a non-negative integer', + ); + if (column.width === 0) { + continue; + } + text += column.grapheme === '\\u0000' ? ' ' : column.grapheme; + } + return stripTrailingAsciiSpaces(text); + } + + // Width for a single column, used to drop wide-glyph trailing spacers. + // getCell() always returns a cell (out-of-range columns synthesize a + // blank width-1 cell), so a missing cell is treated as a blank column. + function readCellWidth(line, col) { + const cell = line?.getCell(col); + if (cell === undefined || cell === null) { + return 1; + } + const width = cell.getWidth(); + return Number.isInteger(width) && width >= 0 ? width : 1; + } + function decodeVisibleLines(terminal) { terminal.scrollToBottom(); const { cols, rows, activeBuffer, viewportY } = getNormalizedViewportState(terminal); + const wasmTerm = terminal.wasmTerm; + invariant(wasmTerm, 'terminal WASM instance is unavailable'); const visibleLines = []; for (let row = 0; row < rows; row += 1) { const line = activeBuffer.getLine(viewportY + row); - const text = - line === undefined ? '' : line.translateToString(true, 0, cols); - invariant( - typeof text === 'string', - \`decoded line \${row} must be a string\`, + const text = decodeGraphemeLine( + (col) => ({ + grapheme: wasmTerm.getGraphemeString(row, col), + width: readCellWidth(line, col), + }), + cols, ); visibleLines.push({ row, text }); } @@ -503,14 +569,18 @@ const EMBEDDED_HARNESS_HTML = ` return []; } + const wasmTerm = terminal.wasmTerm; + invariant(wasmTerm, 'terminal WASM instance is unavailable'); + const scrollbackLines = []; for (let row = 0; row < viewportY; row += 1) { const line = activeBuffer.getLine(row); - const text = - line === undefined ? '' : line.translateToString(true, 0, cols); - invariant( - typeof text === 'string', - \`decoded scrollback line \${row} must be a string\`, + const text = decodeGraphemeLine( + (col) => ({ + grapheme: wasmTerm.getScrollbackGraphemeString(row, col), + width: readCellWidth(line, col), + }), + cols, ); scrollbackLines.push({ row, text }); } @@ -532,10 +602,26 @@ const EMBEDDED_HARNESS_HTML = ` return \`#\${colorValue.toString(16).padStart(6, '0')}\`; } - function decodeSnapshotCell(cell) { + function decodeSnapshotCell(cell, graphemeChar) { invariant(cell !== undefined, 'snapshot cell must be defined'); - const char = cell.getChars(); - assertStringValue(char, 'snapshot cell char must be a string'); + const baseChars = cell.getChars(); + assertStringValue(baseChars, 'snapshot cell char must be a string'); + assertStringValue( + graphemeChar, + 'snapshot cell grapheme must be a string', + ); + // Deliberate, converged decision (matches the libghostty-vt backend): + // in cells[] a codepoint-0 cell — both a genuine blank AND a wide + // glyph's trailing spacer — is '', whereas in visibleLines[].text a + // genuine blank renders as ' ' and only the width-0 spacer is dropped + // (see decodeGraphemeLine). The cells[] grid is column-addressed, so a + // blank and a spacer are both empty placeholders there; the text line + // is a reading-order string, so a blank is a real space but a spacer + // is layout, not content. Non-empty cells use the FULL grapheme + // cluster so continuation codepoints (emoji ZWJ, NFD combining marks) + // are not dropped. The Screen Hash sources visibleLines[].text, never + // cells[], so this asymmetry never reaches the hash. + const char = baseChars === '' ? '' : graphemeChar; const isInverse = cell.isInverse() === 1; const fgColor = cell.getFgColor(); @@ -556,6 +642,8 @@ const EMBEDDED_HARNESS_HTML = ` terminal.scrollToBottom(); const { cols, rows, activeBuffer, viewportY } = getNormalizedViewportState(terminal); + const wasmTerm = terminal.wasmTerm; + invariant(wasmTerm, 'terminal WASM instance is unavailable'); const nullCell = activeBuffer.getNullCell(); const cells = []; @@ -574,7 +662,12 @@ const EMBEDDED_HARNESS_HTML = ` const line = activeBuffer.getLine(viewportY + row); const rowCells = []; for (let col = 0; col < cols; col += 1) { - rowCells.push(decodeSnapshotCell(line?.getCell(col) ?? nullCell)); + rowCells.push( + decodeSnapshotCell( + line?.getCell(col) ?? nullCell, + wasmTerm.getGraphemeString(row, col), + ), + ); } invariant( @@ -831,6 +924,77 @@ let servedAssetsPromise: Promise< ReadonlyMap > | null = null; +/** + * One decoded terminal column: the cell's full grapheme cluster plus its + * column span. `width === 0` marks a wide glyph's trailing spacer column. + */ +export interface GhosttyDecodedColumn { + grapheme: string; + width: number; +} + +/** + * Strip ONLY trailing ASCII spaces (0x20). Unlike String.prototype.trimEnd + * this preserves other trailing whitespace (tabs, NBSP, etc.), keeping the + * canonical visible text aligned with the libghostty-vt backend. + * + * Exported as the host-testable twin of the identical function embedded in + * EMBEDDED_HARNESS_HTML; the harness copy is the browser runtime and cannot + * import this module, so the two must stay byte-for-byte in sync. + */ +export function stripTrailingAsciiSpaces(text: string): string { + let end = text.length; + while (end > 0 && text.charCodeAt(end - 1) === 0x20) { + end -= 1; + } + return end === text.length ? text : text.slice(0, end); +} + +/** + * Assemble one canonical visible line from a per-column reader, then + * right-trim trailing ASCII spaces. A width-0 column (a wide glyph's trailing + * spacer) contributes nothing, so a row of `A`+wide(`漢`)+wide(`字`)+`B` + * yields `A漢字B` — matching the libghostty-vt backend's visibleLines[].text. + * A genuine blank interior cell decodes to a single ' ', so interior gaps + * survive and trailing gaps trim away. Non-empty cells contribute their FULL + * grapheme cluster, so continuation codepoints (emoji ZWJ, NFD combining + * marks) are preserved instead of being truncated to the base codepoint. + * + * The live ghostty-web engine returns the NUL codepoint (U+0000) for a blank + * cell: getGrapheme yields `[0]`, so getGraphemeString runs + * `String.fromCodePoint(0)` and produces a NUL, not ' ' (its empty-array + * ' ' fallback never fires). Left as-is those NULs would survive + * stripTrailingAsciiSpaces (which strips only 0x20) and diverge from the + * native backend's right-trimmed ' '-blank form, so a kept cell whose grapheme + * is a lone NUL is normalized to ' ' here. + * + * Exported as the host-testable twin of the decodeGraphemeLine function + * embedded in EMBEDDED_HARNESS_HTML; keep the two in sync. + */ +export function assembleCanonicalLine( + cols: number, + readColumn: (col: number) => GhosttyDecodedColumn, +): string { + assertNonNegativeInteger( + cols, + 'canonical line cols must be a non-negative integer', + ); + let text = ''; + for (let col = 0; col < cols; col += 1) { + const column = readColumn(col); + assertString(column.grapheme, 'decoded grapheme must be a string'); + assertNonNegativeInteger( + column.width, + 'decoded cell width must be a non-negative integer', + ); + if (column.width === 0) { + continue; + } + text += column.grapheme === '\u0000' ? ' ' : column.grapheme; + } + return stripTrailingAsciiSpaces(text); +} + function assertNonNegativeInteger( value: unknown, message: string, diff --git a/src/renderer/libghosttyVt/backend.ts b/src/renderer/libghosttyVt/backend.ts index 67bb0ec7..5f68a9ff 100644 --- a/src/renderer/libghosttyVt/backend.ts +++ b/src/renderer/libghosttyVt/backend.ts @@ -113,6 +113,32 @@ function validateNativeVisibleLines( }); } +/** + * Pad the native visible lines to exactly `rows` entries by appending blank + * trailing lines (`text: ''`). The native ReadLine path already right-trims + * trailing ASCII spaces, expands full grapheme clusters, and renders blank + * cells as ' ' (terminal.cc), but it omits trailing blank rows, so only the + * line count needs aligning with the canonical pad-to-rows form. Each visible + * line's `row` is its index (native emits contiguous 0-based rows), so the + * appended lines continue that sequence. This converges the libghostty-vt + * backend's `visibleLines[].text` with the ghostty-web backend so the two + * agree on the Screen Hash. See docs/prd/screen-hash/PRD.md. + */ +function padVisibleLinesToRows( + lines: readonly NativeVisibleLine[], + rows: number, +): NativeVisibleLine[] { + invariant( + lines.length <= rows, + 'native visible line count must not exceed terminal rows', + ); + const padded: NativeVisibleLine[] = [...lines]; + for (let row = padded.length; row < rows; row += 1) { + padded.push({ row, text: '' }); + } + return padded; +} + function assertNativeSnapshot(snapshot: unknown): TerminalSnapshot { invariant( snapshot !== null && typeof snapshot === 'object', @@ -156,6 +182,10 @@ function assertNativeSnapshot(snapshot: unknown): TerminalSnapshot { candidate.visibleLines, 'snapshot.visibleLines', ); + // The native ReadLine path omits trailing blank rows, so it may emit fewer + // than `rows` visible lines; snapshot() pads the gap to exactly `rows` to + // match the canonical pad-to-rows form (see padVisibleLinesToRows). Only the + // permissive upper bound is enforced here. invariant( visibleLines.length <= candidate.rows, 'snapshot visible line count must fit terminal rows', @@ -543,7 +573,10 @@ export class LibghosttyVtBackend implements RendererBackend { cursorRow: nativeSnapshot.cursorRow, cursorCol: nativeSnapshot.cursorCol, isAltScreen: nativeSnapshot.isAltScreen, - visibleLines: nativeSnapshot.visibleLines, + visibleLines: padVisibleLinesToRows( + nativeSnapshot.visibleLines, + nativeSnapshot.rows, + ), ...(nativeSnapshot.scrollbackLines === undefined ? {} : { scrollbackLines: nativeSnapshot.scrollbackLines }), diff --git a/src/renderer/types.ts b/src/renderer/types.ts index 421a2ef8..41f63f70 100644 --- a/src/renderer/types.ts +++ b/src/renderer/types.ts @@ -4,6 +4,7 @@ import { MarkerEventPayloadSchema, RichSnapshotLineSchema, RunCompleteEventPayloadSchema, + Sha256HexSchema, VisibleLineSchema, type VisibleLine, } from '../protocol/schemas.js'; @@ -17,12 +18,6 @@ const ThemeSchema = z.enum(['dark', 'light']); const HexColorSchema = z .string() .regex(/^#[0-9a-fA-F]{6}$/u, 'must be a hex color like #1e1e2e'); -const Sha256HexSchema = z - .string() - .regex( - /^[a-f0-9]{64}$/u, - 'must be a 64-character lowercase SHA-256 hex string', - ); const BundledFontStyleSchema = z.enum(['normal', 'italic', 'oblique']); const RoutePathSchema = z .string() diff --git a/src/snapshot/capture.ts b/src/snapshot/capture.ts index ed766c8a..bac0f0cc 100644 --- a/src/snapshot/capture.ts +++ b/src/snapshot/capture.ts @@ -4,6 +4,7 @@ import type { SemanticSnapshot } from '../renderer/types.js'; import { ERROR_CODES, makeCliError } from '../protocol/errors.js'; import { SnapshotResultSchema } from '../protocol/schemas.js'; import { parseValidatedResult } from '../protocol/validation.js'; +import { computeScreenHash } from '../renderer/canonicalScreen.js'; import { appendArtifactWithRollback, createArtifactEntry, @@ -34,9 +35,11 @@ export function createSnapshotResult( ...snapshot.visibleLines.map((line) => line.text), ]; + const screenHash = computeScreenHash(snapshot); + const snapshotResult: SnapshotResult = format === 'structured' - ? { format: 'structured' as const, ...snapshot } + ? { format: 'structured' as const, ...snapshot, screenHash } : { format: 'text' as const, sessionId: snapshot.sessionId, @@ -46,6 +49,7 @@ export function createSnapshotResult( cursorRow: snapshot.cursorRow, cursorCol: snapshot.cursorCol, text: textLines.join('\n'), + screenHash, }; return parseSnapshotResult( diff --git a/src/util/hash.ts b/src/util/hash.ts new file mode 100644 index 00000000..1b320da2 --- /dev/null +++ b/src/util/hash.ts @@ -0,0 +1,9 @@ +import { createHash } from 'node:crypto'; + +/** + * Returns the lowercase 64-character SHA-256 hex digest of the UTF-8 bytes of + * `text`. + */ +export function sha256Hex(text: string): string { + return createHash('sha256').update(text, 'utf8').digest('hex'); +} diff --git a/test/integration/cross-backend-screen-hash.test.ts b/test/integration/cross-backend-screen-hash.test.ts new file mode 100644 index 00000000..81df453c --- /dev/null +++ b/test/integration/cross-backend-screen-hash.test.ts @@ -0,0 +1,184 @@ +import { afterEach, describe, expect, it } from 'vitest'; + +import { computeScreenHash } from '../../src/renderer/canonicalScreen.js'; +import { resolveProfile } from '../../src/renderer/profiles.js'; +import type { RendererBackend } from '../../src/renderer/backend.js'; +import type { ReplayInput } from '../../src/renderer/types.js'; +import { GhosttyWebBackend } from '../../src/renderer/ghosttyWeb/index.js'; +import { LibghosttyVtBackend } from '../../src/renderer/libghosttyVt/index.js'; + +// Gate the whole suite on the optional native engine: when +// @coder/libghostty-vt-node is unavailable there is no second renderer to +// compare against, so every case skips cleanly (mirrors the nativeAvailable +// pattern in test/e2e/libghostty-vt-renderer.test.ts). Do NOT fall back to a +// length>0 guard — a converged blank/short screen is a valid case here. +let nativeAvailable = false; +let nativeSkipReason = ''; +try { + await import('@coder/libghostty-vt-node'); + nativeAvailable = true; +} catch (error) { + nativeSkipReason = error instanceof Error ? error.message : String(error); +} +const maybeIt = nativeAvailable ? it : it.skip; + +const PROFILE = resolveProfile('reference-dark'); +const SESSION_ID = 'cross-backend-screen-hash'; +const SHA_256_HEX = /^[a-f0-9]{64}$/u; + +function timestampFor(seq: number): string { + return new Date(Date.UTC(2026, 5, 5, 12, 0, seq)).toISOString(); +} + +function singleOutputReplayInput( + data: string, + options: { cols?: number; rows?: number } = {}, +): ReplayInput { + return { + sessionId: SESSION_ID, + initialCols: options.cols ?? 80, + initialRows: options.rows ?? 24, + targetSeq: 0, + events: [ + { + seq: 0, + ts: timestampFor(0), + type: 'output', + payload: { data }, + }, + ], + }; +} + +interface CrossBackendResult { + webHash: string; + nativeHash: string; + webLines: string[]; + nativeLines: string[]; +} + +describe('cross-backend screen hash', { timeout: 120_000 }, () => { + const backends: RendererBackend[] = []; + + afterEach(async () => { + while (backends.length > 0) { + const backend = backends.pop(); + if (backend !== undefined) { + await backend.dispose(); + } + } + }); + + // Boot BOTH renderer backends over the SAME ReplayInput, then route each + // snapshot's visibleLines through computeScreenHash. Returning the raw lines + // too makes any divergence legible in the assertion diff. + async function hashBothBackends( + input: ReplayInput, + ): Promise { + const webBackend = new GhosttyWebBackend(SESSION_ID, PROFILE); + backends.push(webBackend); + const nativeBackend = new LibghosttyVtBackend(SESSION_ID, PROFILE); + backends.push(nativeBackend); + + await webBackend.boot(); + await nativeBackend.boot(); + + await webBackend.replayTo(input); + await nativeBackend.replayTo(input); + + const webSnapshot = await webBackend.snapshot(); + const nativeSnapshot = await nativeBackend.snapshot(); + + return { + webHash: computeScreenHash(webSnapshot), + nativeHash: computeScreenHash(nativeSnapshot), + webLines: webSnapshot.visibleLines.map((line) => line.text), + nativeLines: nativeSnapshot.visibleLines.map((line) => line.text), + }; + } + + async function expectAgreement(input: ReplayInput): Promise { + const result = await hashBothBackends(input); + // Compare the decoded lines first so a mismatch surfaces the offending + // text, then assert the hashes themselves agree. + expect(result.nativeLines).toEqual(result.webLines); + expect(result.webHash).toMatch(SHA_256_HEX); + expect(result.nativeHash).toBe(result.webHash); + } + + maybeIt( + nativeAvailable + ? 'agrees on an ASCII full screen' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + const rows = Array.from( + { length: 12 }, + (_, index) => `row ${String(index)} of ascii content`, + ).join('\r\n'); + await expectAgreement(singleOutputReplayInput(rows)); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on an interior cursor-positioned gap' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // Write 'a' at the home position, jump to row 1 col 6 (CSI 1;6H is + // 1-based), then write 'b'. The interior cols between them are genuine + // blank cells that both backends must render as spaces, yielding + // 'a b' on row 0 after trailing-space trimming. + await expectAgreement(singleOutputReplayInput('a\x1b[1;6Hb')); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on CJK wide glyphs' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // 'kanji-kanji-te-su-to' in CJK: each glyph occupies two columns. + await expectAgreement(singleOutputReplayInput('漢字テスト')); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on grapheme clusters (NFD combining mark and a ZWJ family emoji)' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // 'e' + combining acute accent (NFD) on row 0, then a ZWJ family emoji + // (man + ZWJ + woman + ZWJ + girl + ZWJ + boy) on row 1. Both backends + // must keep the FULL grapheme cluster rather than dropping continuation + // codepoints. + const combiningE = 'e\u0301'; + const zwjFamily = + '\u{1f468}\u200d\u{1f469}\u200d\u{1f467}\u200d\u{1f466}'; + await expectAgreement( + singleOutputReplayInput(`${combiningE}\r\n${zwjFamily}`), + ); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on a line with a trailing non-breaking space' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // NBSP (U+00A0) is not ASCII 0x20, so neither backend trims it; the + // trailing NBSP must survive identically in the canonical text. + await expectAgreement(singleOutputReplayInput('value\u00a0')); + }, + ); + + maybeIt( + nativeAvailable + ? 'agrees on a short, mostly-blank screen' + : `skips because @coder/libghostty-vt-node is unavailable: ${nativeSkipReason}`, + async () => { + // A single short line leaves the rest of the viewport blank, exercising + // the libghostty pad-to-rows alignment against ghostty-web's full grid. + await expectAgreement(singleOutputReplayInput('hi')); + }, + ); +}); diff --git a/test/integration/screen-hash.test.ts b/test/integration/screen-hash.test.ts new file mode 100644 index 00000000..e96159b1 --- /dev/null +++ b/test/integration/screen-hash.test.ts @@ -0,0 +1,234 @@ +import { mkdtemp, realpath } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { afterEach, beforeEach, describe, expect, it } from 'vitest'; + +import type { + SnapshotResult, + WaitForRenderResult, +} from '../../src/protocol/messages.js'; +import { + cleanupHome, + createSession, + crashSession, + destroySession, + inspectSession, + readEvents, + runCli, + sleep, + type SuccessEnvelope, + type WaitResult, +} from '../helpers.js'; + +// A session that emits a stable marker and then idles, so the rendered screen +// has settled visible content the wait and snapshot paths can hash. +const SESSION_COMMAND = [ + '/bin/sh', + '-c', + "printf 'booting\\n'; sleep 1; printf 'Ready\\n'; exec cat", +] as const; +const HOOK_TIMEOUT_MS = 30_000; + +const SHA_256_HEX = /^[a-f0-9]{64}$/u; + +type StructuredSnapshot = Extract; +type TextSnapshot = Extract; + +async function waitForOutputMarker( + testHome: string, + sessionId: string, + marker: string, +): Promise { + const waitResult = runCli( + ['wait', sessionId, '--idle-ms', '200', '--timeout', '10000', '--json'], + { AGENT_TTY_HOME: testHome }, + 15_000, + ); + + expect(waitResult.status).toBe(0); + expect(waitResult.stderr).toBe(''); + const waitEnvelope = JSON.parse( + waitResult.stdout, + ) as SuccessEnvelope; + expect(waitEnvelope.ok).toBe(true); + expect(waitEnvelope.result.timedOut).toBe(false); + + const deadline = Date.now() + 10_000; + while (Date.now() < deadline) { + const events = await readEvents(testHome, sessionId).catch(() => []); + const output = events + .filter((event) => event.type === 'output') + .map((event) => { + const data = event.payload.data; + return typeof data === 'string' ? data : ''; + }) + .join(''); + + if (output.includes(marker)) { + return; + } + + await sleep(100); + } + + throw new Error(`timed out waiting for output marker ${marker}`); +} + +describe('screen hash integration', { timeout: 120_000 }, () => { + let testHome = ''; + let sessionId = ''; + + beforeEach(async () => { + // oxfmt-ignore + testHome = await realpath(await mkdtemp(join(tmpdir(), 'agent-tty-screen-hash-'))); + sessionId = createSession(testHome, [...SESSION_COMMAND]); + await waitForOutputMarker(testHome, sessionId, 'booting'); + }, HOOK_TIMEOUT_MS); + + afterEach(async () => { + destroySession(testHome, sessionId); + await cleanupHome(testHome); + sessionId = ''; + testHome = ''; + }, HOOK_TIMEOUT_MS); + + it('includes screenHash on a structured snapshot', () => { + const result = runCli( + ['snapshot', sessionId, '--format', 'structured', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.format).toBe('structured'); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); + + it('includes screenHash on a text snapshot', () => { + const result = runCli( + ['snapshot', sessionId, '--format', 'text', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse(result.stdout) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.format).toBe('text'); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); + + it('agrees on screenHash between structured and text snapshots of the same screen', () => { + const structured = runCli( + ['snapshot', sessionId, '--format', 'structured', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + const text = runCli( + ['snapshot', sessionId, '--format', 'text', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(structured.status).toBe(0); + expect(text.status).toBe(0); + const structuredEnvelope = JSON.parse( + structured.stdout, + ) as SuccessEnvelope; + const textEnvelope = JSON.parse( + text.stdout, + ) as SuccessEnvelope; + + expect(structuredEnvelope.result.screenHash).toMatch(SHA_256_HEX); + expect(textEnvelope.result.screenHash).toBe( + structuredEnvelope.result.screenHash, + ); + }); + + it('includes screenHash on a matched render wait', () => { + const result = runCli( + ['wait', sessionId, '--text', 'Ready', '--timeout', '15000', '--json'], + { AGENT_TTY_HOME: testHome }, + 20_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.matched).toBe(true); + expect(envelope.result.timedOut).toBe(false); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); + + it('omits screenHash on a timed-out render wait', () => { + const result = runCli( + [ + 'wait', + sessionId, + '--text', + 'TEXT_THAT_NEVER_APPEARS', + '--timeout', + '2000', + '--json', + ], + { AGENT_TTY_HOME: testHome }, + 15_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.matched).toBe(false); + expect(envelope.result.timedOut).toBe(true); + expect(envelope.result.screenHash).toBeUndefined(); + }); + + it('includes screenHash on the offline host-unreachable matched:false fallback', async () => { + // Settle on the visible marker, then kill the host so the wait falls back + // to offline replay. A screen-stability wait cannot prove the stable + // duration from a single offline snapshot, so it returns matched:false — + // but a Semantic Snapshot was still observed, so the hash is present. + await waitForOutputMarker(testHome, sessionId, 'Ready'); + + crashSession(testHome, sessionId); + await sleep(500); + expect(inspectSession(testHome, sessionId).status).toBe('failed'); + + const result = runCli( + [ + 'wait', + sessionId, + '--screen-stable-ms', + '1000', + '--timeout', + '5000', + '--json', + ], + { AGENT_TTY_HOME: testHome }, + 15_000, + ); + + expect(result.status).toBe(0); + expect(result.stderr).toBe(''); + const envelope = JSON.parse( + result.stdout, + ) as SuccessEnvelope; + expect(envelope.ok).toBe(true); + expect(envelope.result.matched).toBe(false); + expect(envelope.result.timedOut).toBe(false); + expect(envelope.result.screenHash).toMatch(SHA_256_HEX); + }); +}); diff --git a/test/unit/batch/executor.test.ts b/test/unit/batch/executor.test.ts index c23c9e2f..a8fa2233 100644 --- a/test/unit/batch/executor.test.ts +++ b/test/unit/batch/executor.test.ts @@ -434,6 +434,51 @@ describe('executeBatch', () => { }); }); + describe('wait screenHash', () => { + const SCREEN_HASH = 'a'.repeat(64); + + it('carries the observed screenHash onto a matched wait step', async () => { + const { driver } = createFakeDriver({ + waitResults: [ + { + matched: true, + timedOut: false, + capturedAtSeq: 7, + screenHash: SCREEN_HASH, + }, + ], + }); + const result = await executeBatch({ + plan: plan([{ wait: { text: 'done' } }]), + driver, + keepGoing: false, + }); + + expect(result.steps[0]).toMatchObject({ + kind: 'wait', + status: 'completed', + screenHash: SCREEN_HASH, + }); + }); + + it('omits screenHash on a timed-out wait step', async () => { + const { driver } = createFakeDriver({ + waitResults: [{ matched: false, timedOut: true, capturedAtSeq: 5 }], + }); + const result = await executeBatch({ + plan: plan([{ wait: { text: 'never' } }]), + driver, + keepGoing: false, + }); + + expect(result.steps[0]).toMatchObject({ + kind: 'wait', + status: 'failed', + }); + expect(result.steps[0]).not.toHaveProperty('screenHash'); + }); + }); + describe('run completion classification', () => { it('fails a Waited Run that timed out with a timedOut runOutcome', async () => { const { driver } = createFakeDriver({ diff --git a/test/unit/commands/golden-envelopes.test.ts b/test/unit/commands/golden-envelopes.test.ts index 9f9ea408..bd53bd9b 100644 --- a/test/unit/commands/golden-envelopes.test.ts +++ b/test/unit/commands/golden-envelopes.test.ts @@ -910,6 +910,7 @@ const goldenResultContracts: readonly GoldenResultContractCase[] = [ ], }, ], + screenHash: 'a'.repeat(64), }, invalidResult: {}, extraFieldResult: { @@ -1088,6 +1089,7 @@ const goldenResultContracts: readonly GoldenResultContractCase[] = [ cursorRow: 4, cursorCol: 0, capturedAtSeq: 9, + screenHash: 'a'.repeat(64), }, invalidResult: { matched: true, diff --git a/test/unit/commands/snapshot.test.ts b/test/unit/commands/snapshot.test.ts index 75393f58..3320157b 100644 --- a/test/unit/commands/snapshot.test.ts +++ b/test/unit/commands/snapshot.test.ts @@ -62,6 +62,7 @@ vi.mock('../../../src/storage/sessionPaths.js', () => ({ import { createTestSemanticSnapshot } from '../../helpers.js'; import { runSnapshotCommand } from '../../../src/cli/commands/snapshot.js'; +import { computeScreenHash } from '../../../src/renderer/canonicalScreen.js'; import { createLogger } from '../../../src/util/logger.js'; const TEST_CONTEXT = { @@ -424,6 +425,7 @@ describe('snapshot command', () => { const result = { format: 'structured' as const, ...snapshot, + screenHash: computeScreenHash(snapshot), }; mocks.readManifestIfExists.mockResolvedValue(createExitedSessionRecord()); installOfflineReplaySuccessMock(); @@ -605,6 +607,7 @@ describe('snapshot command', () => { cursorRow: 0, cursorCol: 0, text: 'offline output', + screenHash: computeScreenHash(createTestSemanticSnapshot()), }; mocks.sendRpc.mockRejectedValue( new CliError(ERROR_CODES.HOST_UNREACHABLE, 'host unreachable'), diff --git a/test/unit/commands/wait.test.ts b/test/unit/commands/wait.test.ts index 22ffc885..0030387f 100644 --- a/test/unit/commands/wait.test.ts +++ b/test/unit/commands/wait.test.ts @@ -40,6 +40,7 @@ vi.mock('../../../src/storage/sessionPaths.js', () => ({ })); import { createTestSemanticSnapshot } from '../../helpers.js'; +import { computeScreenHash } from '../../../src/renderer/canonicalScreen.js'; import { runWaitCommand } from '../../../src/cli/commands/wait.js'; import { createLogger } from '../../../src/util/logger.js'; @@ -608,6 +609,12 @@ describe('wait command', () => { cursorRow: 0, cursorCol: 0, capturedAtSeq: 15, + screenHash: computeScreenHash( + createTestSemanticSnapshot({ + capturedAtSeq: 15, + visibleLines: [{ row: 0, text: 'offline hello output' }], + }), + ), }, lines: ['Matched: hello', 'Cursor: row 0, col 0', 'capturedAtSeq: 15'], }); @@ -666,6 +673,14 @@ describe('wait command', () => { cursorRow: 2, cursorCol: 3, capturedAtSeq: 17, + screenHash: computeScreenHash( + createTestSemanticSnapshot({ + capturedAtSeq: 17, + visibleLines: [{ row: 0, text: 'offline Ready output' }], + cursorRow: 2, + cursorCol: 3, + }), + ), }, lines: [ 'Host became unreachable before the wait condition could be fully verified; returning the latest offline snapshot state.', diff --git a/test/unit/renderer/canonicalScreen.test.ts b/test/unit/renderer/canonicalScreen.test.ts new file mode 100644 index 00000000..e37a79ef --- /dev/null +++ b/test/unit/renderer/canonicalScreen.test.ts @@ -0,0 +1,66 @@ +import { describe, expect, it } from 'vitest'; + +import { + canonicalVisibleLines, + canonicalVisibleText, + computeScreenHash, +} from '../../../src/renderer/canonicalScreen.js'; + +const linesOf = (...texts: string[]) => ({ + visibleLines: texts.map((text) => ({ text })), +}); + +describe('canonicalVisibleLines / canonicalVisibleText', () => { + it('returns the verbatim line texts and their newline join', () => { + const snapshot = linesOf('one', 'two', 'three'); + + expect(canonicalVisibleLines(snapshot)).toEqual(['one', 'two', 'three']); + expect(canonicalVisibleText(snapshot)).toBe('one\ntwo\nthree'); + }); +}); + +describe('computeScreenHash', () => { + it('returns the same hash for identical visible lines', () => { + const a = linesOf('alpha', 'beta'); + const b = linesOf('alpha', 'beta'); + + expect(computeScreenHash(a)).toBe(computeScreenHash(b)); + }); + + it('ignores fields outside visibleLines (e.g. cursor position)', () => { + const base = linesOf('alpha', 'beta'); + const withCursor = { + ...base, + cursorRow: 7, + cursorCol: 13, + }; + + expect(computeScreenHash(withCursor)).toBe(computeScreenHash(base)); + }); + + it('changes when a single glyph changes', () => { + const before = linesOf('alpha', 'beta'); + const after = linesOf('alpha', 'beto'); + + expect(computeScreenHash(after)).not.toBe(computeScreenHash(before)); + }); + + it('changes when only trailing whitespace differs (no normalization)', () => { + const trimmed = linesOf('alpha', 'beta'); + const trailing = linesOf('alpha', 'beta '); + + expect(computeScreenHash(trailing)).not.toBe(computeScreenHash(trimmed)); + }); + + it('pins the canonical digest of a fixed non-ASCII fixture', () => { + // 'café' is "cafe" + a combining acute accent (NFD); '漢字' exercises + // multibyte UTF-8; the third line carries trailing spaces. Pinning the + // concrete digest locks the canonical string assembly and UTF-8 encoding. + const fixture = linesOf('café', '漢字', 'trailing '); + + expect(canonicalVisibleText(fixture)).toBe('café\n漢字\ntrailing '); + expect(computeScreenHash(fixture)).toBe( + 'e813b95ab8cd844d5a3eff7d6e447a3c3c0cc79300085c701f2b9193efbaa1f3', + ); + }); +}); diff --git a/test/unit/renderer/ghosttyWebDecode.test.ts b/test/unit/renderer/ghosttyWebDecode.test.ts new file mode 100644 index 00000000..dd33acc5 --- /dev/null +++ b/test/unit/renderer/ghosttyWebDecode.test.ts @@ -0,0 +1,158 @@ +import { describe, expect, it } from 'vitest'; + +import { + assembleCanonicalLine, + stripTrailingAsciiSpaces, + type GhosttyDecodedColumn, +} from '../../../src/renderer/ghosttyWeb/backend.js'; + +// A decoded column as the ghostty-web harness reads it. `grapheme` mirrors +// wasmTerm.getGraphemeString(row, col). The lib's empty-array fallback +// (node_modules/ghostty-web/dist/ghostty-web.js: +// `getGraphemeString = !g||g.length===0 ? " " : String.fromCodePoint(...g)`) +// suggests a blank yields ' ', but the live engine returns getGrapheme === [0] +// for a blank cell, so the fallback never fires and getGraphemeString actually +// returns String.fromCodePoint(0) === a NUL — confirmed by booting +// both backends over 'hi' (see test/integration/cross-backend-screen-hash). +// assembleCanonicalLine normalizes that lone NUL back to ' ', so the fixtures +// below feed the NUL the engine really emits and assert the normalization. +// `width` mirrors cell.getWidth(): 1 for a normal cell, 2 for a wide glyph's +// lead column, 0 for its trailing spacer column. +type ColumnSpec = GhosttyDecodedColumn; + +const BLANK: ColumnSpec = { grapheme: '\u0000', width: 1 }; +const SPACER: ColumnSpec = { grapheme: '\u0000', width: 0 }; + +function wide(grapheme: string): readonly ColumnSpec[] { + return [{ grapheme, width: 2 }, SPACER]; +} + +function cell(grapheme: string): ColumnSpec { + return { grapheme, width: 1 }; +} + +// Assemble a full line over a fixed-width grid of column specs, padding any +// columns past the supplied cells with blanks (the lib's getCell synthesizes a +// blank width-1 cell out of range). +function assemble(cells: readonly ColumnSpec[], cols: number): string { + return assembleCanonicalLine(cols, (col) => cells[col] ?? BLANK); +} + +describe('assembleCanonicalLine (ghostty-web canonical visible text)', () => { + it('drops wide-glyph trailing spacers so a CJK row matches the native backend text', () => { + // Native libghostty-vt pins this exact layout's visibleLines[].text as + // 'A漢字B' (test/unit/renderer/libghosttyVtBackend.test.ts:301) and its + // cells[] carries each spacer as '' — so the converged ghostty-web text + // must NOT inject a space for the spacer columns. + const row: ColumnSpec[] = [ + cell('A'), + ...wide('漢'), + ...wide('字'), + cell('B'), + ]; + expect(assemble(row, row.length)).toBe('A漢字B'); + }); + + it('drops the emoji wide-glyph spacer while keeping a real trailing-content space', () => { + // 'rocket 🚀 done' — the 🚀 occupies cols 7-8 (8 is the width-0 spacer), + // col 9 is a genuine space, then 'done'. Matches the native row layout in + // libghosttyVtBackend.test.ts. + const row: ColumnSpec[] = [ + cell('r'), + cell('o'), + cell('c'), + cell('k'), + cell('e'), + cell('t'), + cell(' '), + ...wide('🚀'), + cell(' '), + cell('d'), + cell('o'), + cell('n'), + cell('e'), + ]; + expect(assemble(row, row.length)).toBe('rocket 🚀 done'); + }); + + it('preserves interior blank cells as single spaces', () => { + const row: ColumnSpec[] = [cell('a'), BLANK, BLANK, cell('b')]; + expect(assemble(row, 4)).toBe('a b'); + }); + + it('right-trims trailing ASCII spaces only, padding out to the full width', () => { + const row: ColumnSpec[] = [cell('h'), cell('i')]; + expect(assemble(row, 10)).toBe('hi'); + }); + + it('keeps non-space trailing whitespace (tab) instead of JS trimEnd', () => { + const row: ColumnSpec[] = [cell('h'), cell('i'), cell('\t')]; + // A bare trailing tab survives; a following blank ASCII space is trimmed. + expect(assemble(row, 5)).toBe('hi\t'); + }); + + it('preserves a full NFD combining-mark grapheme cluster', () => { + // NFD 'é' = 'e' + U+0301. getGraphemeString returns the whole cluster for + // the lead cell; the old getChars() path would have dropped the mark. + const combined = 'é'; + const row: ColumnSpec[] = [cell(combined), cell('x')]; + expect(assemble(row, 4)).toBe(`${combined}x`); + }); + + it('preserves an emoji ZWJ grapheme cluster', () => { + // Family emoji built with ZWJ; the harness reads it as one wide grapheme. + const family = '\u{1F468}‍\u{1F469}‍\u{1F467}'; + const row: ColumnSpec[] = [...wide(family), cell('!')]; + expect(assemble(row, row.length)).toBe(`${family}!`); + }); + + it('returns the empty string for an all-blank line', () => { + expect(assemble([], 8)).toBe(''); + }); + + it('reads exactly cols columns regardless of how many cells are supplied', () => { + const seen: number[] = []; + assembleCanonicalLine(6, (col) => { + seen.push(col); + return BLANK; + }); + expect(seen).toEqual([0, 1, 2, 3, 4, 5]); + }); + + it('rejects a non-string grapheme', () => { + expect(() => + assembleCanonicalLine(1, () => ({ + grapheme: 42 as unknown as string, + width: 1, + })), + ).toThrow('decoded grapheme must be a string'); + }); + + it('rejects a negative or non-integer width', () => { + expect(() => + assembleCanonicalLine(1, () => ({ grapheme: 'a', width: -1 })), + ).toThrow('decoded cell width must be a non-negative integer'); + expect(() => + assembleCanonicalLine(1, () => ({ grapheme: 'a', width: 1.5 })), + ).toThrow('decoded cell width must be a non-negative integer'); + }); +}); + +describe('stripTrailingAsciiSpaces', () => { + it('removes only trailing 0x20 spaces', () => { + expect(stripTrailingAsciiSpaces('abc ')).toBe('abc'); + }); + + it('leaves interior spaces and the string untouched when there is no trailing space', () => { + expect(stripTrailingAsciiSpaces('a b c')).toBe('a b c'); + }); + + it('does not strip a trailing tab or non-breaking space', () => { + expect(stripTrailingAsciiSpaces('abc\t')).toBe('abc\t'); + expect(stripTrailingAsciiSpaces('abc ')).toBe('abc '); + }); + + it('returns the empty string for an all-space input', () => { + expect(stripTrailingAsciiSpaces(' ')).toBe(''); + }); +}); diff --git a/test/unit/renderer/libghosttyVtBackend.test.ts b/test/unit/renderer/libghosttyVtBackend.test.ts index 1d66a22a..bf01833f 100644 --- a/test/unit/renderer/libghosttyVtBackend.test.ts +++ b/test/unit/renderer/libghosttyVtBackend.test.ts @@ -263,9 +263,16 @@ describe('LibghosttyVtBackend', () => { cursorRow: 1, cursorCol: 2, isAltScreen: true, + // snapshot() pads visibleLines to exactly `rows` with blank trailing + // lines so the canonical visible text converges with the ghostty-web + // backend (see padVisibleLinesToRows). The native fixture emits 2 lines + // for rows: 5, so rows 2-4 are padded blanks. visibleLines: [ { row: 0, text: 'hello world' }, { row: 1, text: 'prompt>' }, + { row: 2, text: '' }, + { row: 3, text: '' }, + { row: 4, text: '' }, ], scrollbackLines: [{ row: 0, text: 'scrolled' }], cells: [ diff --git a/test/unit/snapshot/capture.test.ts b/test/unit/snapshot/capture.test.ts index d0055559..12ddb9a9 100644 --- a/test/unit/snapshot/capture.test.ts +++ b/test/unit/snapshot/capture.test.ts @@ -13,6 +13,7 @@ import { createSnapshotResult, persistSnapshotArtifact, } from '../../../src/snapshot/capture.js'; +import { computeScreenHash } from '../../../src/renderer/canonicalScreen.js'; import { readArtifactManifest } from '../../../src/storage/artifactManifest.js'; import { artifactPath, @@ -40,6 +41,7 @@ describe('snapshot capture', () => { expect(createSnapshotResult(snapshot, 'structured')).toEqual({ format: 'structured', ...snapshot, + screenHash: computeScreenHash(snapshot), }); }); @@ -223,7 +225,11 @@ describe('snapshot capture', () => { rendererBackend: 'test-backend', }); - expect(result).toEqual({ format: 'structured', ...snapshot }); + expect(result).toEqual({ + format: 'structured', + ...snapshot, + screenHash: computeScreenHash(snapshot), + }); const filename = snapshotFilename(5, 'structured'); expect( JSON.parse( @@ -268,6 +274,7 @@ describe('snapshot capture', () => { cursorRow: 0, cursorCol: 0, text: 'first visible line\nsecond visible line', + screenHash: computeScreenHash(snapshot), }); const filename = snapshotFilename(5, 'text'); @@ -314,6 +321,7 @@ describe('snapshot capture', () => { cursorRow: 0, cursorCol: 0, text: 'scrolled\naway\nvisible output', + screenHash: computeScreenHash(snapshot), }); const filename = snapshotFilename(5, 'text'); diff --git a/test/unit/util/hash.test.ts b/test/unit/util/hash.test.ts new file mode 100644 index 00000000..9d48f1af --- /dev/null +++ b/test/unit/util/hash.test.ts @@ -0,0 +1,27 @@ +import { createHash } from 'node:crypto'; + +import { describe, expect, it } from 'vitest'; + +import { sha256Hex } from '../../../src/util/hash.js'; + +describe('sha256Hex', () => { + it('matches the known SHA-256 digest of "abc"', () => { + expect(sha256Hex('abc')).toBe( + 'ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad', + ); + }); + + it('returns the empty-string digest for ""', () => { + expect(sha256Hex('')).toBe( + 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', + ); + }); + + it('hashes the UTF-8 bytes of a non-ASCII string', () => { + const value = 'café漢字'; + + expect(sha256Hex(value)).toBe( + createHash('sha256').update(Buffer.from(value, 'utf8')).digest('hex'), + ); + }); +}); From ee24aefc781f77c984c19422640796e164d9dd36 Mon Sep 17 00:00:00 2001 From: Thomas Kosiewski Date: Sat, 6 Jun 2026 13:14:51 +0200 Subject: [PATCH 4/4] ci: retry flaky integration and e2e tests with bounded retries Integration and e2e tests drive real PTY hosts and headless-browser renderers, which can transiently fail under machine load (e.g. a screenshot render or host RPC hiccup) even when the code is correct. Pass `--retry=2` to the `test:integration`, `test:e2e`, and combined `test` scripts so a flaky attempt is retried in place instead of failing the sharded CI job; a genuine failure still fails all three attempts. `test:unit` stays strict so the unit gate keeps catching real unit flakes. Document the policy in CONTRIBUTING and CI. Change-Id: I7696b26b9a5b97102b4543a6bcf485ce30ea4be4 Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: Thomas Kosiewski --- .github/workflows/ci.yml | 6 ++++++ docs/CONTRIBUTING.md | 9 +++++++++ package.json | 6 +++--- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e5db8414..8aff6eed 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -84,6 +84,12 @@ jobs: - name: Test unit run: mise run test-unit + # Integration and e2e tests drive real PTY hosts and headless-browser + # renderers, which can transiently fail under machine load (e.g. a screenshot + # render or RPC hiccup) even when the code is correct. The `test:integration` + # and `test:e2e` npm scripts pass `--retry=2`, so a flaky attempt is retried + # in place instead of failing the shard; a genuine failure still fails all + # three attempts. Unit tests (`test:unit`) deliberately do NOT retry. test-integration: runs-on: ubuntu-latest timeout-minutes: 20 diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 8108419b..dd734b07 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -46,6 +46,15 @@ If you touch the public bootstrap under `skills/` or the bundled runtime skills npm run intent:validate ``` +### Flaky integration and e2e tests + +Integration and e2e tests drive real PTY hosts and headless-browser renderers, so an individual test can transiently fail under machine load (most often a screenshot render or host RPC hiccup) even when the code is correct. To keep these flakes from causing spurious red: + +- `npm run test:integration`, `npm run test:e2e`, and the combined `npm run test` retry a failing test in place (`--retry=2`, up to three attempts). A genuine failure still fails all three attempts. +- `npm run test:unit` deliberately does **not** retry — unit tests must be deterministic, and the dedicated unit CI gate is the authority that catches real unit flakes. +- If an integration/e2e test fails _consistently_ (not just on one attempt), treat it as a real failure and investigate; do not raise the retry count to paper over it. +- When debugging a single browser-backed test locally, run it in isolation (`npm run test:e2e -- `); the full serial suite is the heaviest load and the most flake-prone. + ## Documentation and proof expectations - Keep the root docs split clear: `README.md` for overview and `RELEASE.md` for supported scope. diff --git a/package.json b/package.json index 7e4a8d90..962bb3d3 100644 --- a/package.json +++ b/package.json @@ -55,9 +55,9 @@ "release:finalize": "node ./scripts/release-finalize.mjs", "review-bundle": "tsx src/tools/review-bundle.ts", "smoke:install": "node ./scripts/smoke-install.mjs", - "test": "vitest run", - "test:e2e": "vitest run --maxWorkers=1 test/e2e", - "test:integration": "vitest run --maxWorkers=1 test/integration", + "test": "vitest run --retry=2", + "test:e2e": "vitest run --maxWorkers=1 --retry=2 test/e2e", + "test:integration": "vitest run --maxWorkers=1 --retry=2 test/integration", "test:unit": "vitest run test/unit", "test:watch": "vitest", "typecheck": "tsc -p tsconfig.json --noEmit",