From 6f85d4de283ad3e2613c1d444c2e4a3e87644865 Mon Sep 17 00:00:00 2001
From: Matthew Seal <mseal@confluent.io>
Date: Thu, 14 May 2026 17:16:19 -0700
Subject: [PATCH] chore: add /cleanup skill for periodic maintainability passes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds a project-local Claude Code skill at .claude/skills/cleanup/SKILL.md
that runs non-functional cleanup passes. Identifies tech debt, hygiene,
and organization issues, analyzes test coverage over the touched scope,
and reviews proposed changes with the user before applying any subset.

Key properties:
- No behavior changes: every proposed edit must be functionally inert;
  anything that could shift behavior goes to a "needs-decision" bucket and
  is never auto-applied.
- Propose before changing: findings are bucketed (Test Improvements,
  Must-fix, Should-fix, Consider, Needs decision) and the user picks the
  subset to apply.
- Test coverage analysis runs before code changes; coverage gaps in the
  touched scope are surfaced first so tests can land before refactors.
- Scope defaults: explicit path/glob > branch-diff vs main > prompt for
  full repo scan vs directory vs cancel. Never silently scans the whole
  repo.
- Reads .claude/skills/cleanup/conventions.md (if present) and treats its
  rules as team norms alongside CLAUDE.md. The conventions doc is a
  hand-editable style guide; CLAUDE.md wins on conflicts.

Also adds .claude/skills/cleanup/conventions.md with team-derived
conventions covering style, testing, error handling, architecture, sidecar
usage, and anti-patterns. Excludes the internal "learn-conventions"
skill that mines team Slack/PRs to refresh this doc — that skill stays
on contributors' machines via .gitignore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .claude/skills/cleanup/SKILL.md       | 337 ++++++++++++++++++++++++++
 .claude/skills/cleanup/conventions.md |  95 ++++++++
 .gitignore                            |   3 +
 3 files changed, 435 insertions(+)
 create mode 100644 .claude/skills/cleanup/SKILL.md
 create mode 100644 .claude/skills/cleanup/conventions.md

diff --git a/.claude/skills/cleanup/SKILL.md b/.claude/skills/cleanup/SKILL.md
new file mode 100644
index 000000000..ea80b9c5b
--- /dev/null
+++ b/.claude/skills/cleanup/SKILL.md
@@ -0,0 +1,337 @@
+---
+name: cleanup
+description:
+  Runs a code cleanup, organization, and maintainability pass over a scope of code. Identifies tech
+  debt, hygiene issues, and organization problems WITHOUT changing functionality, then reviews the
+  findings with the user and applies ONLY the changes the user approves. Use when the user mentions
+  "cleanup", "tech debt", "tidy", "tidy up", "maintainability pass", "code hygiene", "refactor
+  pass", or asks for a periodic non-functional polish of code.
+allowed-tools:
+  - Read
+  - Edit
+  - Write
+  - Bash
+  - Grep
+  - Glob
+  - Task
+---
+
+# Cleanup Skill
+
+Runs a non-functional cleanup, organization, and maintainability pass. Identifies issues, presents
+them for review, and applies ONLY the user-approved subset.
+
+## Core Rules
+
+1. **No behavior changes.** Every proposed change must be functionally inert (a reasonable reader
+   should be able to confirm that by inspection). If a finding requires a behavior change, surface
+   it as a "needs-decision" note — do not silently apply it.
+2. **Propose before changing.** Always present findings and let the user select what to apply. Never
+   bulk-apply.
+3. **Stay in scope.** Touch only files inside the agreed scope. Drive-by edits in unrelated files
+   are forbidden.
+4. **Preserve comments.** Do not delete existing comments. Update them if the surrounding code moves
+   or is renamed.
+5. **Respect generated code.** Skip `src/clients/kafkaRest/**`, `src/clients/schemaRegistryRest/**`,
+   `src/clients/sidecar/**`, and `src/clients/flinkGateway/**`. They are regenerated from OpenAPI
+   specs.
+6. **Verify after.** After applying any approved batch, run the relevant checks (`npx gulp check`,
+   `npx gulp lint`, and targeted `npx gulp test -t`) and report results.
+7. **Load conventions.** Before scanning, read `.claude/skills/cleanup/conventions.md` if it exists.
+   Treat its rules as authoritative team norms alongside `CLAUDE.md` — violations are must-fix or
+   should-fix findings depending on severity. If the file is missing, proceed without it; do not
+   auto-generate it. (Run `/learn-conventions` to populate or refresh that doc.) If a convention
+   contradicts `CLAUDE.md`, `CLAUDE.md` wins.
+
+## Process
+
+### Step 1: Establish Scope
+
+Determine the scope in this order:
+
+1. **Explicit path or glob** the user named (e.g. `src/viewProviders/topicView.ts`,
+   `src/loaders/**`). Use it directly.
+2. **Branch diff vs `main`** if the user did not name a scope but the branch has uncommitted or
+   committed changes (`git diff main --name-only` returns a non-empty list). Treat that as the scope
+   for a pre-PR polish.
+3. **No glob, no diff** — ask the user explicitly:
+
+   > No scope was given and the branch is clean. Would you like to: A) Run a **full repo scan**
+   > (slow — expect a high finding count and a longer review), B) Scope to a directory (give me a
+   > path or glob), or C) Cancel?
+
+   Do not default to a full scan silently. Always require explicit confirmation for it.
+
+Once scope is fixed, list the files you will look at so the user can correct it before any work
+happens. For a full repo scan, list directory-level counts instead of every file, and remind the
+user that the auto-generated client directories will be skipped.
+
+### Step 2: Gather Hard Signals
+
+Run the cheap, deterministic checks first and collect their output:
+
+```bash
+npx gulp check    # TypeScript errors / warnings
+npx gulp lint     # ESLint findings
+```
+
+Also gather:
+
+- `git log --follow -- <file>` for files where age/ownership matters
+- `grep -rn "TODO\|FIXME\|XXX\|HACK" <scope>` for self-flagged debt
+- `grep -rn ": any\b\|as any\b" <scope>` for `any` usage (a CLAUDE.md violation)
+
+These produce the unambiguous findings — list them under "Hard signals" in the report.
+
+### Step 3: Scan for Soft Signals
+
+These are heuristics — judgment is required, and many will be discarded. Read the files and look
+for:
+
+#### Organization
+
+- Files mixing unrelated concerns (could be split)
+- Helpers defined far from their only caller (could be co-located, or vice versa)
+- Files in awkward locations (e.g. a view-provider helper sitting in `src/utils/`)
+- Inconsistent file/module naming within a directory
+- Re-exports that obscure the real definition site
+- Circular or near-circular imports
+
+#### Code Hygiene
+
+- Unused exports, unused imports, unreachable code
+- Dead branches (conditions that can never fire given the types)
+- Duplicated literals that should be named constants
+- Duplicated logic across 2–3 sites that could be a shared helper (be conservative — see CLAUDE.md
+  "three similar lines is better than a premature abstraction")
+- Magic numbers / hardcoded strings without explanation
+- String unions where an `enum` would be clearer (per CLAUDE.md)
+- Inconsistent error handling (e.g. `logger.error` instead of `logError()`)
+- Inconsistent user-facing error surfacing (raw throws vs `showErrorNotificationWithButtons`)
+- Functions doing two things — naming or length suggests split
+- Names that no longer match what the symbol does
+- Comments that contradict or no longer describe the code (UPDATE, do not delete)
+
+#### Type Safety
+
+- `any` types (always a finding per CLAUDE.md)
+- Implicit `any` from missing annotations on exported functions
+- Overly wide types (`object`, `Record<string, unknown>`) where a real interface exists
+- Missing JSDoc on exported functions and public methods (CLAUDE.md requirement)
+
+#### Disposable / Lifecycle (project-specific, CRITICAL)
+
+- Classes with event subscriptions that don't extend `DisposableCollection`
+- `onDid*` / `.event(...)` return values not pushed to `this.disposables`
+- `setInterval` / `setTimeout` without a cleared handle
+- Long-lived `SidecarHandle` references (handles should be short-lived per CLAUDE.md)
+- Classes not registered in `context.subscriptions` where they should be
+
+#### Test Hygiene
+
+- `.only` left in test files (always flag)
+- Tests stubbing functions defined in the same module (Sinon limitation per CLAUDE.md)
+- Skipped tests with no linked issue/comment
+- Test files mirroring source-file organization poorly
+
+### Step 4: Analyze Test Coverage Over the Scope
+
+Before recommending any code changes, evaluate the test safety net over the touched files. Code
+without coverage is risky to refactor — surface test gaps **before** code findings so the user can
+choose to land tests first.
+
+For each file in scope, classify coverage:
+
+- **Unit coverage**: is there a co-located or matching `*.test.ts`? Do its `describe`/`it` blocks
+  reference the functions that would be touched by must-fix or should-fix findings? For a small
+  scope, run `npx gulp test --coverage` and read the Istanbul report; for a large scope, rely on
+  file existence + symbol-level greps.
+- **E2E / behavioral coverage**: for command handlers, view providers, and webview code
+  (`src/commands/**`, `src/viewProviders/**`, `src/webview/**`), grep `tests/e2e/**` for the command
+  ID, view ID, or webview message type. Cross-check with `.claude/rules/testing/e2e-tests.md`.
+- **Functional / webview coverage**: for webview-internal code, check `tests/functional/**` (or the
+  project's webview test location) via the same symbol/route greps.
+
+For each file, record one of: **covered**, **partial**, **uncovered**, and which specific findings
+sit on uncovered code paths.
+
+Generate a **Test Improvements** bucket from the gaps. Recommend landing tests **before** code
+changes when:
+
+- A must-fix or should-fix finding lives in an uncovered or partial area
+- The change touches a high-risk surface (sidecar lifecycle, view providers, command handlers,
+  webview message handlers)
+- This is a periodic pass with no PR deadline pressure
+
+If the user accepts test-improvement items, apply those first, get them green via
+`npx gulp test -t "<name>"`, and only then proceed to the code findings. Any tests written by this
+skill must follow `.claude/rules/testing/*.md` and the `/mocha`, `/sinon`, `/playwright` skill
+patterns — and must be genuine new coverage, not assertion-free filler.
+
+### Step 5: Categorize and Prioritize
+
+Sort findings into:
+
+| Bucket                | Meaning                                                  | Default recommendation     |
+| --------------------- | -------------------------------------------------------- | -------------------------- |
+| **Test Improvements** | Missing or thin coverage protecting code-change findings | Recommend landing FIRST    |
+| **Must-fix**          | Violates CLAUDE.md rule (e.g. `any` type, disposables)   | Recommend applying         |
+| **Should-fix**        | Clear improvement, low risk, no behavior change          | Recommend applying         |
+| **Consider**          | Subjective; reasonable to leave as-is                    | Default to skipping        |
+| **Needs decision**    | Touches behavior, scope, or a design choice              | Surface, do NOT auto-apply |
+
+Findings that would change behavior — even subtly (error timing, log output a user might match on,
+ordering of async work) — go into **Needs decision** and stay there unless the user explicitly
+agrees the change is acceptable.
+
+### Step 6: Present the Report
+
+Output a single structured report. Cap the visible list at ~15 items across the actionable buckets.
+Anything beyond that goes into a per-bucket remaining tally so the user knows what's been deferred
+and can ask to see it. Use this format:
+
+```markdown
+## Cleanup Pass — <scope>
+
+**Files inspected:** N **Hard-signal checks:** check ✓ / lint ✓ (or list failures) **Coverage
+snapshot:** covered: X / partial: Y / uncovered: Z (of N files)
+
+### Test Improvements — recommended FIRST (before code changes)
+
+These protect the must-fix / should-fix changes below. Land these (or knowingly waive them) before
+applying the code findings underneath.
+
+T1. `src/foo/bar.ts` — no unit test exists; covers must-fix #1 and should-fix #3 - **Suggested:**
+add `tests/.../bar.test.ts` with cases for <function>, <function> T2. `src/commands/openTopic.ts` —
+no E2E flow exercises this command - **Suggested:** add a test under `tests/e2e/...` using the page
+object pattern
+
+### Must-fix (recommend applying all)
+
+1. `path/to/file.ts:42` — <one-line description>
+   - **Why:** <CLAUDE.md rule or concrete risk>
+   - **Change:** <what the edit will do, in plain words>
+   - **Coverage:** covered ✓ / partial ⚠ / uncovered ✗ (links to T# if a test was proposed)
+
+### Should-fix (recommend applying all)
+
+2. `path/to/file.ts:120` — <one-line description> ...
+
+### Consider (default skip)
+
+3. `path/to/file.ts:55` — <one-line description> ...
+
+### Needs decision (NOT applied without explicit go-ahead)
+
+4. `path/to/file.ts:88` — <one-line description>
+   - **Why this is here:** <reason it isn't auto-applicable>
+   - **Options:** A) ... B) ... C) leave as-is
+
+### Remaining lower-priority items (not shown above)
+
+Listed by bucket and rough scope so you know what's been deferred:
+
+- Test Improvements: 4 more (3 unit-test gaps in `src/loaders/`, 1 E2E gap in `src/commands/`)
+- Should-fix: 7 more (mostly missing JSDoc in `src/utils/**`, 2 magic-number extractions)
+- Consider: 12 more (naming nits, comment refresh candidates, scoped across `src/viewProviders/**`)
+- Needs decision: 2 more (both touching sidecar error surfacing — call out if you want details)
+
+Ask to see any bucket in full and I'll list them.
+```
+
+Keep each finding to ~3 lines. The report should be skimmable — the user is making a triage
+decision, not reading a design doc. The "Remaining lower-priority items" section must give real
+counts and rough scope (directory or theme), not just "more available."
+
+After the report, ask explicitly:
+
+> Which findings would you like to apply? You can reply with:
+>
+> - `tests first` (apply Test Improvements only, then re-evaluate),
+> - numbers (`T1, 1, 3, 5-7`),
+> - `all must-fix` / `all should-fix`,
+> - `all except 4`,
+> - `show remaining <bucket>` to expand a deferred list,
+> - `skip all`,
+> - or describe the subset in words.
+
+### Step 7: Apply the Approved Subset
+
+For each approved finding:
+
+1. Make the edit using `Edit` (or `Write` for new files only).
+2. Keep edits minimal and localized — don't reformat surrounding code, don't rename unrelated
+   symbols.
+3. Preserve comments. If a comment is now wrong because of the edit, update it; never just delete
+   it.
+4. If during the edit you discover the change would alter behavior, STOP that finding and report
+   back — do not silently push it through.
+
+Group related edits into a single logical batch where it makes sense (e.g. all the
+`logger.error → logError` replacements together), but still apply file-by-file so the user can audit
+each diff.
+
+### Step 8: Verify
+
+After all approved edits are applied:
+
+```bash
+npx gulp check                 # type checking must pass
+npx gulp lint                  # lint must pass (or only show pre-existing warnings)
+npx gulp test -t "<scope>"     # relevant unit tests
+```
+
+If anything broke, report the failure and the specific edit that likely caused it. Offer to revert
+that edit (`git checkout -- <file>`) rather than piling on more changes to fix the symptom.
+
+### Step 9: Summarize
+
+Brief summary at the end:
+
+- Findings applied: N (list with file:line)
+- Findings skipped: N
+- Verification: check ✓ / lint ✓ / tests ✓
+- Any remaining "needs decision" items still on the table
+
+## What This Skill Does NOT Do
+
+- **No feature work.** If a finding is "this would be nicer if we also added X," it goes to "needs
+  decision" or gets dropped.
+- **No rewrites.** Renaming a single misleading symbol is fine; restructuring a module is not — that
+  belongs in a real PR with its own design.
+- **No dependency changes.** Bumping versions, swapping libraries, adding new packages — all out of
+  scope.
+- **No test changes that alter assertions** (changing the meaning of a test). Removing `.only` or
+  reorganizing test file structure is fine.
+- **No silent behavior changes.** Anything that changes what a user, a caller, or telemetry would
+  observe is "needs decision," full stop.
+- **No "drive-by" edits** outside the agreed scope. Note them in the report if you spot them, but
+  don't touch them.
+
+## Anti-Patterns to Avoid
+
+- Presenting 40 findings with no prioritization — the user can't make decisions on a wall of text.
+  Cap the visible report at ~15 items and summarize the rest in the "Remaining lower-priority items"
+  section with real counts and rough scope (directories or themes), not a vague "more available."
+- Suggesting an abstraction for two similar code blocks. Per CLAUDE.md: three similar lines is
+  better than a premature abstraction. Three call sites is the rough threshold for proposing a
+  helper, and even then it goes in "consider."
+- Suggesting "add error handling" without a specific failure mode. Per CLAUDE.md: don't add error
+  handling for scenarios that can't happen.
+- Flagging style preferences ESLint/Prettier would catch (or already accept). Hard signals from
+  `npx gulp lint` are enough on those.
+- Touching auto-generated client code under
+  `src/clients/{kafkaRest,schemaRegistryRest,sidecar,flinkGateway}/**`.
+
+## Tips
+
+- Use the `Task` tool with the Explore agent for an initial sweep of a large scope — get a list of
+  candidate files and obvious findings, then read the highest-signal ones yourself.
+- When in doubt about whether a change is functionally inert, push it to "needs decision." That
+  bucket exists precisely so judgment calls don't get silently merged in.
+- Cross-reference suggestions against `.claude/rules/architecture/*.md` — proposing a change that
+  contradicts an architectural rule means the proposal is wrong, not the rule.
+- The `/pr-review` skill is for evaluating a finished change; this skill is for finding things to
+  change. They are complementary — don't duplicate its output here.
+- The `/simplify` skill operates on _changed_ code; this skill operates on a _scope_ the user picks.
+  Use simplify after editing; use cleanup as a periodic pass.
diff --git a/.claude/skills/cleanup/conventions.md b/.claude/skills/cleanup/conventions.md
new file mode 100644
index 000000000..d2c5f8c71
--- /dev/null
+++ b/.claude/skills/cleanup/conventions.md
@@ -0,0 +1,95 @@
+---
+last-edited: 2026-05-14
+repo: confluentinc/vscode
+---
+
+# Team Conventions for confluentinc/vscode
+
+These conventions augment `CLAUDE.md`. Where they conflict, **`CLAUDE.md` wins** — codified rules
+exist to keep AI agents out of risky territory where a human might reasonably navigate nuance. Edit
+this file directly when conventions shift; rerun `/learn-conventions` periodically to surface
+patterns this doc may have missed.
+
+## High-confidence rules
+
+- Prefer explicit, simple code over clever abstraction. Pull out helper layers when they hurt
+  readability rather than building them up.
+- Feature-scoped subdirs in `src/` over creating new top-level modules. Helpers live alongside the
+  feature they support.
+
+## Style
+
+- Run Prettier locally — unformatted code is treated as a process smell.
+- Don't over-shorten code into one-liners just for line count if it obscures intent.
+- Side effects inside conditionals are surprising. Prefer separating `get()` from `reveal()` (or
+  equivalent), even if it costs a line.
+
+## Testing
+
+- Mocha tests must be isolated. Rely on `globalBeforeAll` to activate the extension rather than
+  assuming shared state from prior tests.
+- Split real unit tests from integration-ish tests. Don't mix them in one window/environment.
+- Fully stub external/API behavior in unit tests. Unstubbed loader/API calls cause noise and
+  nondeterminism.
+- Keep Playwright Page Object Models lean. Overbuilt POMs hurt clarity and should be ripped out when
+  they do.
+- Prefer explicit Playwright waits/assertions close to the behavior under test, even if verbose,
+  over hidden helper magic.
+- Use shared Playwright fixtures (e.g. coverage config) — don't duplicate test infrastructure.
+- E2E coverage should hit realistic broad user flows first, not deep coverage of one UI area.
+- E2E tests should verify visible webview UI affordances, not just backend effects.
+- Use real-enough fixtures when behavior depends on realistic validation.
+- `Promise.all` wrapping a Playwright load-state assertion can be a trap — the load state often
+  resolves immediately. Awaiting the triggering command directly often gives better timing.
+- Don't put `testInfo.skip()` inside `expect().toPass()`. `toPass` retries the throw and defeats the
+  skip.
+
+## Error handling & observability
+
+- Surface the actual underlying cause to users, not a generic "check credentials or network"
+  fallback.
+- Validate user-entered config (file paths, etc.) locally before sending to sidecar. Give an
+  explicit corrective message when the value is wrong.
+- Log level reflects impact: loader fetch failures are warnings; sidecar transport problems are
+  errors.
+- Use `logError()` over `logger.error()` and `showErrorNotificationWithButtons()` over
+  `window.showErrorMessage()` — corroborates CLAUDE.md.
+
+## Architecture & module organization
+
+- Centralize lifecycle/dispose wiring in a central listener layer. Don't bury extra listeners inside
+  the managed class unless necessary.
+- For singleton-ish managers, `dispose()` should also null out the instance so it can be recreated
+  cleanly.
+- Don't spread enable/disable hooks across many components — keep lifecycle gates near
+  initialization.
+- Resource/provider internals should maintain enough state for a top-level view refresh to be cheap
+  and deterministic (cached repaint, no refetch).
+- Prefer targeted fixes over broad central changes when centralization causes weird build or test
+  behavior.
+
+## Project-specific patterns
+
+- Use `SidecarHandle` when sidecar is the natural integration boundary.
+- Don't force everything through sidecar. Direct API client modules in a feature subdir are accepted
+  (e.g. Docker engine API, scaffolding service).
+- When a problem spans direct + CCloud paths and there's no single sidecar codepath to fix, fix it
+  in the extension layer.
+
+## Anti-patterns to avoid
+
+- Overbuilding Page Object Models.
+- Mixing unit and integration tests in one Mocha environment.
+- Hiding important waits inside opaque helpers.
+- Creating new top-level `src/` modules per feature.
+- Forcing sidecar usage when a direct API call is clearer.
+- `git push --force` without `--force-with-lease` on automation branches.
+
+## External references
+
+- [Playwright actionability / assertions](https://playwright.dev/docs/actionability#assertions) —
+  baseline reference for E2E style decisions.
+- [Rollwright coverage docs](https://unknownprinciple.github.io/rollwright/coverage.html) — webview
+  test coverage setup.
+- [`vscode-test-playwright`](https://www.npmjs.com/package/vscode-test-playwright) — basis for the
+  E2E infrastructure used here.
diff --git a/.gitignore b/.gitignore
index 99d6af95e..4995eef9a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -127,3 +127,6 @@ bin/ide-sidecar-*-runner
 *.pem
 
 playwright-report/
+
+# Internal-only Claude skills (kept locally; not pushed)
+.claude/skills/learn-conventions/