Silent-pass trap audit — `waitForHuman` used for visual verification

Audit every `[assisted]` integration test that uses `waitForHuman` (not `waitForHumanVerdict`) to verify a *visible* outcome. Those tests are subject to a **silent-pass trap**: the human can be looking at a broken UI and the test will still report green.

## Background

There are two helpers in `assistedTestHelper.ts`:

- `waitForHuman(tcId, action, consoleSteps?)` — shows a notification, returns when the human clicks Cancel. **Captures no judgment.** Used when the test's assertion is on logs/state, not on what the human saw.
- `waitForHumanVerdict(tcId, action, consoleSteps?)` — shows a notification + PASS/FAIL status bar buttons, returns `'pass' | 'fail'`. **Captures judgment.** Used when the test's assertion IS what the human saw.

The contract matters because many `[assisted]` tests describe themselves as "verify X appears in the UI" but use `waitForHuman`. The verification is left to the human's intent, not encoded in the assertion. Result: if the UI is broken, the human can click Cancel by reflex, and the test passes against a broken state.

## How this was discovered

While diagnosing the issues/547 paste-pipeline regression (see related bug issue), `claude-code-004` was reported "passing" by `pnpm test:release:with-extensions --grep "claude-code-004"` — but the human watched Claude Code receive nothing. The log-based assertions (`ComposablePasteDestination.pasteLink "Pasted link"`, `VscodeAdapter.pasteTextFromClipboard "Clipboard paste succeeded"`) confirmed the code-side ran; the human's eye confirmed nothing arrived. The test reported green because `waitForHuman` doesn't capture a verdict — clicking Cancel resolves the promise regardless of what was observed.

The newly-written `claude-code-002` uses `waitForHumanVerdict` and FAILED with the exact same code-side state. The contract switch made the silent-pass trap visible.

## Scope of the audit

For every `[assisted]` test in `packages/rangelink-vscode-extension/src/__integration-tests__/suite/*.test.ts`:

1. Identify whether the test's contract includes verifying a *visible* outcome (e.g., "the link appears in Claude Code chat", "the chat panel receives content", "the toast appears").
2. Check which helper it uses (`waitForHuman` vs `waitForHumanVerdict`).
3. Flag any test that:
   - Uses `waitForHuman`
   - Has instructions like "verify X appears" / "confirm X" / "the link should appear in Y"
   - Has no other UI assertion that would fail if the visible outcome is broken (log assertions don't count if they only prove code-side execution)

Likely affected (rough scan, not authoritative):

- `claude-code-004` (confirmed silent-pass against the issues/547 regression)
- `claude-code-005` (warm step — same `waitForHuman` pattern, same risk)
- `clipboard-preservation-001/004/005/009` — instruction asks "verify the link should appear in terminal/Dummy AI" but only requires Cancel; the clipboard assertion is partial coverage at best
- Most `[assisted]` tests in `customAiAssistants.test.ts` that verify content delivery to Dummy AI tiers
- Likely `core-send-commands-r-l-003` and similar send tests

A proper audit would enumerate every match with its file:line, the assertion text, and a verdict on whether to convert.

## Conversion pattern

For each affected test:

```typescript
// Before — silent-pass trap
await waitForHuman(
  'cc-004',
  'Cold paste: select lines, press Cmd+R Cmd+L, verify link appears in Claude Code, then Cancel',
  [/* steps */],
);
// log assertions only — pass even when human saw nothing
```

```typescript
// After — verdict captured
const verdict = await waitForHumanVerdict(
  'cc-004',
  'Cold paste: press Cmd+R Cmd+L. Did the link appear in Claude Code chat?',
  ['Click PASS if the link appeared, FAIL otherwise'],
);
assert.strictEqual(verdict, 'pass', 'Human reported the link did not appear');
// log assertions still run as belt-and-suspenders
```

For tests where the human's eye is the ENTIRE assertion (no log path even partially covers it), the conversion is mandatory. For tests where the log path covers most of the contract but the human's eye is the final confirmation, the conversion is still strongly recommended — it adds the verdict capture without removing any log-based coverage.

## What this does NOT change

- Tests that use `waitForHuman` for *non-visual* contracts (e.g., "press a chord, then logs assert the side effect") are fine as-is. The human's eye isn't being relied on.
- Tests that already use `waitForHumanVerdict` — already correct, no change needed.
- The number of `[assisted]` tests stays the same. This isn't about reducing assisted count; it's about making the assisted tests actually verify what they claim to verify.

## Implementation suggestion

Don't fix all in one PR. Suggested phasing:

1. **Phase 1 — Audit**: full enumeration of affected tests in a tracking comment/issue, with per-test verdicts.
2. **Phase 2 — High-stakes tests first**: convert `claude-code-004/005` and the rest of `builtInAiAssistants.test.ts` (highest-risk because they're the canary for issues/547).
3. **Phase 3 — Per-file batches**: `clipboardPreservation.test.ts`, `customAiAssistants.test.ts`, `coreSendCommands.test.ts` in separate PRs.
4. **Phase 4 — Update `TESTING.md` § "Assisted mode"** with the contract distinction: use `waitForHuman` only when there's a non-visual machine-verifiable assertion; use `waitForHumanVerdict` whenever the human's eye is part of the assertion.

## Definition of done

- Every `[assisted]` test with a "verify X appears in the UI" contract uses `waitForHumanVerdict` and asserts on the verdict.
- `TESTING.md` documents the contract rule.
- A future regression like the issues/547 paste pipeline would be caught by the test rather than silent-passed.

## Related

- The issues/547 paste-pipeline regression that exposed this trap — see sibling issue.
- The boring-setup-steps refactor (#558) — overlaps with 7 specific tests; those should be converted as part of #558 because they touch the same set.
- The closeQuickOpen survey (#557) — orthogonal; closeQuickOpen tests don't use `waitForHuman` for visual verification, they use it (or no human at all) for picker dismissal.

## Priority

**Medium-high.** Not blocking any specific work, but until this is addressed, future regressions like #559 will continue to silent-pass against any visual-verification test. This is the systemic fix that would have caught #559 before it merged.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silent-pass trap audit — `waitForHuman` used for visual verification #560

Background

How this was discovered

Scope of the audit

Conversion pattern

What this does NOT change

Implementation suggestion

Definition of done

Related

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Silent-pass trap audit — waitForHuman used for visual verification #560

Description

Background

How this was discovered

Scope of the audit

Conversion pattern

What this does NOT change

Implementation suggestion

Definition of done

Related

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Silent-pass trap audit — `waitForHuman` used for visual verification #560