Skip to content

Silent-pass trap audit — waitForHuman used for visual verification #560

@couimet

Description

@couimet

Audit every [assisted] integration test that uses waitForHuman (not waitForHumanVerdict) to verify a visible outcome. Those tests are subject to a silent-pass trap: the human can be looking at a broken UI and the test will still report green.

Background

There are two helpers in assistedTestHelper.ts:

  • waitForHuman(tcId, action, consoleSteps?) — shows a notification, returns when the human clicks Cancel. Captures no judgment. Used when the test's assertion is on logs/state, not on what the human saw.
  • waitForHumanVerdict(tcId, action, consoleSteps?) — shows a notification + PASS/FAIL status bar buttons, returns 'pass' | 'fail'. Captures judgment. Used when the test's assertion IS what the human saw.

The contract matters because many [assisted] tests describe themselves as "verify X appears in the UI" but use waitForHuman. The verification is left to the human's intent, not encoded in the assertion. Result: if the UI is broken, the human can click Cancel by reflex, and the test passes against a broken state.

How this was discovered

While diagnosing the issues/547 paste-pipeline regression (see related bug issue), claude-code-004 was reported "passing" by pnpm test:release:with-extensions --grep "claude-code-004" — but the human watched Claude Code receive nothing. The log-based assertions (ComposablePasteDestination.pasteLink "Pasted link", VscodeAdapter.pasteTextFromClipboard "Clipboard paste succeeded") confirmed the code-side ran; the human's eye confirmed nothing arrived. The test reported green because waitForHuman doesn't capture a verdict — clicking Cancel resolves the promise regardless of what was observed.

The newly-written claude-code-002 uses waitForHumanVerdict and FAILED with the exact same code-side state. The contract switch made the silent-pass trap visible.

Scope of the audit

For every [assisted] test in packages/rangelink-vscode-extension/src/__integration-tests__/suite/*.test.ts:

  1. Identify whether the test's contract includes verifying a visible outcome (e.g., "the link appears in Claude Code chat", "the chat panel receives content", "the toast appears").
  2. Check which helper it uses (waitForHuman vs waitForHumanVerdict).
  3. Flag any test that:
    • Uses waitForHuman
    • Has instructions like "verify X appears" / "confirm X" / "the link should appear in Y"
    • Has no other UI assertion that would fail if the visible outcome is broken (log assertions don't count if they only prove code-side execution)

Likely affected (rough scan, not authoritative):

  • claude-code-004 (confirmed silent-pass against the issues/547 regression)
  • claude-code-005 (warm step — same waitForHuman pattern, same risk)
  • clipboard-preservation-001/004/005/009 — instruction asks "verify the link should appear in terminal/Dummy AI" but only requires Cancel; the clipboard assertion is partial coverage at best
  • Most [assisted] tests in customAiAssistants.test.ts that verify content delivery to Dummy AI tiers
  • Likely core-send-commands-r-l-003 and similar send tests

A proper audit would enumerate every match with its file:line, the assertion text, and a verdict on whether to convert.

Conversion pattern

For each affected test:

// Before — silent-pass trap
await waitForHuman(
  'cc-004',
  'Cold paste: select lines, press Cmd+R Cmd+L, verify link appears in Claude Code, then Cancel',
  [/* steps */],
);
// log assertions only — pass even when human saw nothing
// After — verdict captured
const verdict = await waitForHumanVerdict(
  'cc-004',
  'Cold paste: press Cmd+R Cmd+L. Did the link appear in Claude Code chat?',
  ['Click PASS if the link appeared, FAIL otherwise'],
);
assert.strictEqual(verdict, 'pass', 'Human reported the link did not appear');
// log assertions still run as belt-and-suspenders

For tests where the human's eye is the ENTIRE assertion (no log path even partially covers it), the conversion is mandatory. For tests where the log path covers most of the contract but the human's eye is the final confirmation, the conversion is still strongly recommended — it adds the verdict capture without removing any log-based coverage.

What this does NOT change

  • Tests that use waitForHuman for non-visual contracts (e.g., "press a chord, then logs assert the side effect") are fine as-is. The human's eye isn't being relied on.
  • Tests that already use waitForHumanVerdict — already correct, no change needed.
  • The number of [assisted] tests stays the same. This isn't about reducing assisted count; it's about making the assisted tests actually verify what they claim to verify.

Implementation suggestion

Don't fix all in one PR. Suggested phasing:

  1. Phase 1 — Audit: full enumeration of affected tests in a tracking comment/issue, with per-test verdicts.
  2. Phase 2 — High-stakes tests first: convert claude-code-004/005 and the rest of builtInAiAssistants.test.ts (highest-risk because they're the canary for issues/547).
  3. Phase 3 — Per-file batches: clipboardPreservation.test.ts, customAiAssistants.test.ts, coreSendCommands.test.ts in separate PRs.
  4. Phase 4 — Update TESTING.md § "Assisted mode" with the contract distinction: use waitForHuman only when there's a non-visual machine-verifiable assertion; use waitForHumanVerdict whenever the human's eye is part of the assertion.

Definition of done

  • Every [assisted] test with a "verify X appears in the UI" contract uses waitForHumanVerdict and asserts on the verdict.
  • TESTING.md documents the contract rule.
  • A future regression like the issues/547 paste pipeline would be caught by the test rather than silent-passed.

Related

Priority

Medium-high. Not blocking any specific work, but until this is addressed, future regressions like #559 will continue to silent-pass against any visual-verification test. This is the systemic fix that would have caught #559 before it merged.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions