Audit every [assisted] integration test that uses waitForHuman (not waitForHumanVerdict) to verify a visible outcome. Those tests are subject to a silent-pass trap: the human can be looking at a broken UI and the test will still report green.
Background
There are two helpers in assistedTestHelper.ts:
waitForHuman(tcId, action, consoleSteps?) — shows a notification, returns when the human clicks Cancel. Captures no judgment. Used when the test's assertion is on logs/state, not on what the human saw.
waitForHumanVerdict(tcId, action, consoleSteps?) — shows a notification + PASS/FAIL status bar buttons, returns 'pass' | 'fail'. Captures judgment. Used when the test's assertion IS what the human saw.
The contract matters because many [assisted] tests describe themselves as "verify X appears in the UI" but use waitForHuman. The verification is left to the human's intent, not encoded in the assertion. Result: if the UI is broken, the human can click Cancel by reflex, and the test passes against a broken state.
How this was discovered
While diagnosing the issues/547 paste-pipeline regression (see related bug issue), claude-code-004 was reported "passing" by pnpm test:release:with-extensions --grep "claude-code-004" — but the human watched Claude Code receive nothing. The log-based assertions (ComposablePasteDestination.pasteLink "Pasted link", VscodeAdapter.pasteTextFromClipboard "Clipboard paste succeeded") confirmed the code-side ran; the human's eye confirmed nothing arrived. The test reported green because waitForHuman doesn't capture a verdict — clicking Cancel resolves the promise regardless of what was observed.
The newly-written claude-code-002 uses waitForHumanVerdict and FAILED with the exact same code-side state. The contract switch made the silent-pass trap visible.
Scope of the audit
For every [assisted] test in packages/rangelink-vscode-extension/src/__integration-tests__/suite/*.test.ts:
- Identify whether the test's contract includes verifying a visible outcome (e.g., "the link appears in Claude Code chat", "the chat panel receives content", "the toast appears").
- Check which helper it uses (
waitForHuman vs waitForHumanVerdict).
- Flag any test that:
- Uses
waitForHuman
- Has instructions like "verify X appears" / "confirm X" / "the link should appear in Y"
- Has no other UI assertion that would fail if the visible outcome is broken (log assertions don't count if they only prove code-side execution)
Likely affected (rough scan, not authoritative):
claude-code-004 (confirmed silent-pass against the issues/547 regression)
claude-code-005 (warm step — same waitForHuman pattern, same risk)
clipboard-preservation-001/004/005/009 — instruction asks "verify the link should appear in terminal/Dummy AI" but only requires Cancel; the clipboard assertion is partial coverage at best
- Most
[assisted] tests in customAiAssistants.test.ts that verify content delivery to Dummy AI tiers
- Likely
core-send-commands-r-l-003 and similar send tests
A proper audit would enumerate every match with its file:line, the assertion text, and a verdict on whether to convert.
Conversion pattern
For each affected test:
// Before — silent-pass trap
await waitForHuman(
'cc-004',
'Cold paste: select lines, press Cmd+R Cmd+L, verify link appears in Claude Code, then Cancel',
[/* steps */],
);
// log assertions only — pass even when human saw nothing
// After — verdict captured
const verdict = await waitForHumanVerdict(
'cc-004',
'Cold paste: press Cmd+R Cmd+L. Did the link appear in Claude Code chat?',
['Click PASS if the link appeared, FAIL otherwise'],
);
assert.strictEqual(verdict, 'pass', 'Human reported the link did not appear');
// log assertions still run as belt-and-suspenders
For tests where the human's eye is the ENTIRE assertion (no log path even partially covers it), the conversion is mandatory. For tests where the log path covers most of the contract but the human's eye is the final confirmation, the conversion is still strongly recommended — it adds the verdict capture without removing any log-based coverage.
What this does NOT change
- Tests that use
waitForHuman for non-visual contracts (e.g., "press a chord, then logs assert the side effect") are fine as-is. The human's eye isn't being relied on.
- Tests that already use
waitForHumanVerdict — already correct, no change needed.
- The number of
[assisted] tests stays the same. This isn't about reducing assisted count; it's about making the assisted tests actually verify what they claim to verify.
Implementation suggestion
Don't fix all in one PR. Suggested phasing:
- Phase 1 — Audit: full enumeration of affected tests in a tracking comment/issue, with per-test verdicts.
- Phase 2 — High-stakes tests first: convert
claude-code-004/005 and the rest of builtInAiAssistants.test.ts (highest-risk because they're the canary for issues/547).
- Phase 3 — Per-file batches:
clipboardPreservation.test.ts, customAiAssistants.test.ts, coreSendCommands.test.ts in separate PRs.
- Phase 4 — Update
TESTING.md § "Assisted mode" with the contract distinction: use waitForHuman only when there's a non-visual machine-verifiable assertion; use waitForHumanVerdict whenever the human's eye is part of the assertion.
Definition of done
- Every
[assisted] test with a "verify X appears in the UI" contract uses waitForHumanVerdict and asserts on the verdict.
TESTING.md documents the contract rule.
- A future regression like the issues/547 paste pipeline would be caught by the test rather than silent-passed.
Related
Priority
Medium-high. Not blocking any specific work, but until this is addressed, future regressions like #559 will continue to silent-pass against any visual-verification test. This is the systemic fix that would have caught #559 before it merged.
Audit every
[assisted]integration test that useswaitForHuman(notwaitForHumanVerdict) to verify a visible outcome. Those tests are subject to a silent-pass trap: the human can be looking at a broken UI and the test will still report green.Background
There are two helpers in
assistedTestHelper.ts:waitForHuman(tcId, action, consoleSteps?)— shows a notification, returns when the human clicks Cancel. Captures no judgment. Used when the test's assertion is on logs/state, not on what the human saw.waitForHumanVerdict(tcId, action, consoleSteps?)— shows a notification + PASS/FAIL status bar buttons, returns'pass' | 'fail'. Captures judgment. Used when the test's assertion IS what the human saw.The contract matters because many
[assisted]tests describe themselves as "verify X appears in the UI" but usewaitForHuman. The verification is left to the human's intent, not encoded in the assertion. Result: if the UI is broken, the human can click Cancel by reflex, and the test passes against a broken state.How this was discovered
While diagnosing the issues/547 paste-pipeline regression (see related bug issue),
claude-code-004was reported "passing" bypnpm test:release:with-extensions --grep "claude-code-004"— but the human watched Claude Code receive nothing. The log-based assertions (ComposablePasteDestination.pasteLink "Pasted link",VscodeAdapter.pasteTextFromClipboard "Clipboard paste succeeded") confirmed the code-side ran; the human's eye confirmed nothing arrived. The test reported green becausewaitForHumandoesn't capture a verdict — clicking Cancel resolves the promise regardless of what was observed.The newly-written
claude-code-002useswaitForHumanVerdictand FAILED with the exact same code-side state. The contract switch made the silent-pass trap visible.Scope of the audit
For every
[assisted]test inpackages/rangelink-vscode-extension/src/__integration-tests__/suite/*.test.ts:waitForHumanvswaitForHumanVerdict).waitForHumanLikely affected (rough scan, not authoritative):
claude-code-004(confirmed silent-pass against the issues/547 regression)claude-code-005(warm step — samewaitForHumanpattern, same risk)clipboard-preservation-001/004/005/009— instruction asks "verify the link should appear in terminal/Dummy AI" but only requires Cancel; the clipboard assertion is partial coverage at best[assisted]tests incustomAiAssistants.test.tsthat verify content delivery to Dummy AI tierscore-send-commands-r-l-003and similar send testsA proper audit would enumerate every match with its file:line, the assertion text, and a verdict on whether to convert.
Conversion pattern
For each affected test:
For tests where the human's eye is the ENTIRE assertion (no log path even partially covers it), the conversion is mandatory. For tests where the log path covers most of the contract but the human's eye is the final confirmation, the conversion is still strongly recommended — it adds the verdict capture without removing any log-based coverage.
What this does NOT change
waitForHumanfor non-visual contracts (e.g., "press a chord, then logs assert the side effect") are fine as-is. The human's eye isn't being relied on.waitForHumanVerdict— already correct, no change needed.[assisted]tests stays the same. This isn't about reducing assisted count; it's about making the assisted tests actually verify what they claim to verify.Implementation suggestion
Don't fix all in one PR. Suggested phasing:
claude-code-004/005and the rest ofbuiltInAiAssistants.test.ts(highest-risk because they're the canary for issues/547).clipboardPreservation.test.ts,customAiAssistants.test.ts,coreSendCommands.test.tsin separate PRs.TESTING.md§ "Assisted mode" with the contract distinction: usewaitForHumanonly when there's a non-visual machine-verifiable assertion; usewaitForHumanVerdictwhenever the human's eye is part of the assertion.Definition of done
[assisted]test with a "verify X appears in the UI" contract useswaitForHumanVerdictand asserts on the verdict.TESTING.mddocuments the contract rule.Related
closeQuickOpenautomation survey — 41 assisted tests can become fully automated #557) — orthogonal; closeQuickOpen tests don't usewaitForHumanfor visual verification, they use it (or no human at all) for picker dismissal.Priority
Medium-high. Not blocking any specific work, but until this is addressed, future regressions like #559 will continue to silent-pass against any visual-verification test. This is the systemic fix that would have caught #559 before it merged.