Skip to content

Boring-setup-steps refactor — 7 assisted tests force the human through automatable setup #558

@couimet

Description

@couimet

Spawned from the claude-code-002 fix: when the test instructs the human to "click into the test file (cc-002) to focus it" and "select exactly lines 1-2 (click line 1, shift-click end of line 2)", those steps are friction the test inflicts without test value. They can be replaced with three lines of code:

const doc = await vscode.workspace.openTextDocument(fileUri);
const editor = await vscode.window.showTextDocument(doc);
editor.selection = new vscode.Selection(0, 0, 1, 6);

The survey looked at all 147 [assisted] tests across 20 files for the same anti-pattern. The pool is smaller than expected: only 7 tests have this problem. Most assisted tests already automate setup correctly and ask the human only for the action that genuinely needs them.

The 7 affected tests

builtInAiAssistants.test.ts

  • claude-code-004 at line 162 — file content 'line 1\nline 2\nline 3\n'. Selection: new vscode.Selection(0, 0, 1, 6). "Click into the test file (cc-004) to focus it" + "Select exactly lines 1-2".
  • claude-code-005 (cold step only) at line 205 — file content 'line 1\nline 2\nline 3\nline 4\n'. Selection: new vscode.Selection(0, 0, 1, 6). "Click into the test file (cc-005)" + "Select exactly lines 1-2".

The warm step of claude-code-005 already does it right — it sets editor.selection = new vscode.Selection(2, 0, 3, 7) programmatically. The cold step was missed when the file was written.

clipboardPreservation.test.ts

  • clipboard-preservation-001 at line 134 — file content is 10 lines "line N content" (N=1..10). Selection: lines 2-4 → new vscode.Selection(1, 0, 3, document.lineAt(3).text.length) or simpler (1, 0, 4, 0).
  • clipboard-preservation-004 at line 198 — same 10-line content. Selection: lines 1-3 → (0, 0, 3, 0).
  • clipboard-preservation-005 at line 236 — same 10-line content. Instruction is non-deterministic ("Select 2 or 3 lines"); pin to lines 1-3 → (0, 0, 3, 0).
  • clipboard-preservation-009 at line 302 — same 10-line content. Instruction is non-deterministic ("Select a few lines"); pin to lines 1-3 → (0, 0, 3, 0).

This file is the largest cluster — 4 of the 5 manual clipboard-preservation tests share the same pattern. clipboard-preservation-005 and -009 are particularly notable because the human is told to "select 2 or 3 lines" / "select a few lines" — non-deterministic instructions. Pin to a specific range so assertions are stable across runs.

coreSendCommands.test.ts

  • core-send-commands-r-l-003 at line 97 — file content is 10 lines "line N content". Selection: lines 1-3 → (0, 0, 3, 0). The expected link is ${relPath}#L1-L3, so the selection range must produce that. Pinning to exactly (0, 0, 3, 0) gives #L1-L3 deterministically.

What the survey looked at and excluded

The survey explicitly excluded:

  1. Pressing keybinding chords (e.g., "Press Cmd+R Cmd+L"). The chord itself can be argued to be part of the test contract for some TCs — though for claude-code-002 we replaced it with executeCommand(CMD_COPY_LINK_RELATIVE) and the test's contract is unchanged. There's a separate latent question of which TCs would benefit from this swap; out of scope here.
  2. Visual verification (e.g., "confirm the link appears in Claude Code"). That IS the test's assertion when paired with waitForHumanVerdict.
  3. Selecting from a picker when the selection drives the outcome (e.g., "select 'Dummy AI (Tier 1)' from the picker"). Already covered by issue closeQuickOpen automation survey — 41 assisted tests can become fully automated #557 (the closeQuickOpen survey).
  4. Clicking dialog buttons ("Yes, replace"). No API.
  5. Right-clicking + selecting menu items. No API.

The bigger fish — a related finding worth flagging

While running these surveys, a more serious issue surfaced. claude-code-004 passed an integration run on issues/547 even though the human watched Claude Code receive nothing — because claude-code-004 uses waitForHuman ("click Cancel when done"), not waitForHumanVerdict ("click PASS or FAIL"). The log-based assertions only proved that ComposablePasteDestination.pasteLink and VscodeAdapter.pasteTextFromClipboard fired — not that content arrived in the webview. The human's "yeah I clicked Cancel" was treated as "yeah the test passed".

This is the silent-pass trap, and it's a different bug than the boring-setup one — but it's the same kind of test-rigor failure. Any [assisted] test that uses waitForHuman to verify a visible outcome is susceptible to it: the human can be looking at a broken UI and the test will still go green.

Tests likely affected (rough scan, not a full audit):

  • claude-code-004 (confirmed silent-pass; the cold-paste test passed against a webview that received nothing)
  • claude-code-005 (warm step) — same waitForHuman pattern, same risk
  • clipboard-preservation-001/004/005/009 — instruction asks "verify the link should appear in terminal/Dummy AI" but only requires Cancel click; the clipboard assertion is partial coverage at best
  • All [assisted] tests in builtInAiAssistants.test.ts other than claude-code-002 (which uses waitForHumanVerdict)

A separate, focused audit is worth doing. Not folding it into this issue because the scope is different — but it should not get lost.

Recommended conversion pattern

For each of the 7 affected tests, the fix is mechanical:

Before:

const fileUri = createWorkspaceFile('cc-004', 'line 1\nline 2\nline 3\n');
tmpFileUris.push(fileUri);
await openEditor(fileUri);
await settle();

// ... bind setup ...

await waitForHuman(
  'claude-code-004',
  'Cold paste: select lines 1-2 and press Cmd+R Cmd+L, verify link appears in Claude Code chat, then Cancel',
  [
    '1. Click into the test file (cc-004) to focus it',
    '2. Select exactly lines 1-2 (click line 1, shift-click end of line 2)',
    '3. Press Cmd+R Cmd+L — the RangeLink should appear in Claude Code chat input',
    '4. Visually confirm the link appears in Claude Code',
    '5. Press Cancel to continue (assertions happen automatically)',
  ],
);

After:

const fileUri = createWorkspaceFile('cc-004', 'line 1\nline 2\nline 3\n');
tmpFileUris.push(fileUri);
const doc = await vscode.workspace.openTextDocument(fileUri);
const editor = await vscode.window.showTextDocument(doc);
editor.selection = new vscode.Selection(0, 0, 1, 6);
await settle();

// ... bind setup ...

await waitForHumanVerdict(  // also flip waitForHuman → waitForHumanVerdict for real assertion
  'claude-code-004',
  'Cold paste: press Cmd+R Cmd+L. Did the link appear in Claude Code chat?',
  [
    '1. Lines 1-2 are already selected in cc-004',
    '2. Press Cmd+R Cmd+L',
    '3. Click PASS if the RangeLink appeared in Claude Code chat input, FAIL otherwise',
  ],
);

(The waitForHumanVerdict swap is recommended but separately scoped — it fixes the silent-pass trap for these tests; the boring-setup fix is the smaller, more contained change.)

Implementer's checklist (per test)

For each of the 7 tests above:

  1. Pre-focus and pre-select before any waitForHuman / waitForHumanVerdict call:

    const doc = await vscode.workspace.openTextDocument(fileUri);
    const editor = await vscode.window.showTextDocument(doc);
    editor.selection = new vscode.Selection(<startLine>, <startCol>, <endLine>, <endCol>);
    await settle();

    Place this AFTER createWorkspaceFile/openEditor and AFTER any bind setup (executeCommand(CMD_BIND_TO_*)), but BEFORE the human prompt.

  2. Strip the boring instructions from consoleSteps. Remove the "Click into the test file" and "Select exactly lines X-Y" lines. Renumber the remaining steps.

  3. Update the action string to reflect what the human still needs to do (typically "Press Cmd+R Cmd+L. Did the link appear in X?" or similar).

  4. STRONGLY RECOMMENDED — flip waitForHuman to waitForHumanVerdict for these specific tests. All 7 are testing visual outcomes ("did the link appear in Claude Code / terminal / Dummy AI?"). Without waitForHumanVerdict they silent-pass when the visible outcome is broken — claude-code-004 did exactly this against an issues/547 regression. The broader waitForHuman audit is out of scope (separate issue), but these 7 specifically should be flipped as part of THIS change because they're the same set being touched.

  5. Verify with pnpm test:release:with-extensions --grep "<test-id>" (per release-test-requirement in CLAUDE.md). The human still needs to interact (press the chord, click PASS/FAIL) — but the setup friction is gone.

Definition of done

  • All 7 tests above use programmatic focus + selection
  • All 7 tests use waitForHumanVerdict (not waitForHuman)
  • Each test's consoleSteps array is free of any "Click into the file" / "Select lines X-Y" instructions
  • pnpm test:release:with-extensions --grep "claude-code-004|claude-code-005|clipboard-preservation-001|clipboard-preservation-004|clipboard-preservation-005|clipboard-preservation-009|core-send-commands-r-l-003" passes when the human clicks PASS for each visual check
  • No QA YAML changes needed — these tests stay automated: assisted (the human verdict is still the assertion)

Reference implementation

The canonical "right way" to do this is claude-code-002 in packages/rangelink-vscode-extension/src/__integration-tests__/suite/builtInAiAssistants.test.ts (around line 94). It demonstrates:

  • Programmatic vscode.window.showTextDocument + editor.selection setup
  • waitForHumanVerdict with PASS/FAIL contract
  • Log-based pre-assertions that fire before the verdict prompt for additional diagnostic value
  • Direct executeCommand(CMD_COPY_LINK_RELATIVE) invocation (replacing "press Cmd+R Cmd+L") — note that this last bit is OPTIONAL for the 7 tests in this issue; the keybinding-press can stay if it's part of the TC's intent. The setup automation is the focus here.

Out of scope

Bottom line

7 tests, 8 boring instructions, all clustered in 3 files. Mechanical fix. The boring-setup category is much smaller than expected — claude-code-002 was an outlier, not the tip of an iceberg.

The bigger lever from the same investigation is the waitForHumanwaitForHumanVerdict audit, which would catch tests that currently silent-pass against broken UIs. That deserves its own survey and is not the same change as the boring-setup conversion — but for these 7 tests specifically, the two fixes should land together (see Implementer's checklist step 4).

Pointers

  • The fix pattern in production: claude-code-002 in packages/rangelink-vscode-extension/src/__integration-tests__/suite/builtInAiAssistants.test.ts
  • Existing right-way-around example: claude-code-005 warm step — editor.selection = new vscode.Selection(2, 0, 3, 7)
  • Related survey on a different lever: closeQuickOpen automation survey — 41 assisted tests can become fully automated #557 (41 tests convertible from assisted to fully automated via closeQuickOpen)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions