From 6f85d4de283ad3e2613c1d444c2e4a3e87644865 Mon Sep 17 00:00:00 2001 From: Matthew Seal Date: Thu, 14 May 2026 17:16:19 -0700 Subject: [PATCH] chore: add /cleanup skill for periodic maintainability passes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a project-local Claude Code skill at .claude/skills/cleanup/SKILL.md that runs non-functional cleanup passes. Identifies tech debt, hygiene, and organization issues, analyzes test coverage over the touched scope, and reviews proposed changes with the user before applying any subset. Key properties: - No behavior changes: every proposed edit must be functionally inert; anything that could shift behavior goes to a "needs-decision" bucket and is never auto-applied. - Propose before changing: findings are bucketed (Test Improvements, Must-fix, Should-fix, Consider, Needs decision) and the user picks the subset to apply. - Test coverage analysis runs before code changes; coverage gaps in the touched scope are surfaced first so tests can land before refactors. - Scope defaults: explicit path/glob > branch-diff vs main > prompt for full repo scan vs directory vs cancel. Never silently scans the whole repo. - Reads .claude/skills/cleanup/conventions.md (if present) and treats its rules as team norms alongside CLAUDE.md. The conventions doc is a hand-editable style guide; CLAUDE.md wins on conflicts. Also adds .claude/skills/cleanup/conventions.md with team-derived conventions covering style, testing, error handling, architecture, sidecar usage, and anti-patterns. Excludes the internal "learn-conventions" skill that mines team Slack/PRs to refresh this doc — that skill stays on contributors' machines via .gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/cleanup/SKILL.md | 337 ++++++++++++++++++++++++++ .claude/skills/cleanup/conventions.md | 95 ++++++++ .gitignore | 3 + 3 files changed, 435 insertions(+) create mode 100644 .claude/skills/cleanup/SKILL.md create mode 100644 .claude/skills/cleanup/conventions.md diff --git a/.claude/skills/cleanup/SKILL.md b/.claude/skills/cleanup/SKILL.md new file mode 100644 index 000000000..ea80b9c5b --- /dev/null +++ b/.claude/skills/cleanup/SKILL.md @@ -0,0 +1,337 @@ +--- +name: cleanup +description: + Runs a code cleanup, organization, and maintainability pass over a scope of code. Identifies tech + debt, hygiene issues, and organization problems WITHOUT changing functionality, then reviews the + findings with the user and applies ONLY the changes the user approves. Use when the user mentions + "cleanup", "tech debt", "tidy", "tidy up", "maintainability pass", "code hygiene", "refactor + pass", or asks for a periodic non-functional polish of code. +allowed-tools: + - Read + - Edit + - Write + - Bash + - Grep + - Glob + - Task +--- + +# Cleanup Skill + +Runs a non-functional cleanup, organization, and maintainability pass. Identifies issues, presents +them for review, and applies ONLY the user-approved subset. + +## Core Rules + +1. **No behavior changes.** Every proposed change must be functionally inert (a reasonable reader + should be able to confirm that by inspection). If a finding requires a behavior change, surface + it as a "needs-decision" note — do not silently apply it. +2. **Propose before changing.** Always present findings and let the user select what to apply. Never + bulk-apply. +3. **Stay in scope.** Touch only files inside the agreed scope. Drive-by edits in unrelated files + are forbidden. +4. **Preserve comments.** Do not delete existing comments. Update them if the surrounding code moves + or is renamed. +5. **Respect generated code.** Skip `src/clients/kafkaRest/**`, `src/clients/schemaRegistryRest/**`, + `src/clients/sidecar/**`, and `src/clients/flinkGateway/**`. They are regenerated from OpenAPI + specs. +6. **Verify after.** After applying any approved batch, run the relevant checks (`npx gulp check`, + `npx gulp lint`, and targeted `npx gulp test -t`) and report results. +7. **Load conventions.** Before scanning, read `.claude/skills/cleanup/conventions.md` if it exists. + Treat its rules as authoritative team norms alongside `CLAUDE.md` — violations are must-fix or + should-fix findings depending on severity. If the file is missing, proceed without it; do not + auto-generate it. (Run `/learn-conventions` to populate or refresh that doc.) If a convention + contradicts `CLAUDE.md`, `CLAUDE.md` wins. + +## Process + +### Step 1: Establish Scope + +Determine the scope in this order: + +1. **Explicit path or glob** the user named (e.g. `src/viewProviders/topicView.ts`, + `src/loaders/**`). Use it directly. +2. **Branch diff vs `main`** if the user did not name a scope but the branch has uncommitted or + committed changes (`git diff main --name-only` returns a non-empty list). Treat that as the scope + for a pre-PR polish. +3. **No glob, no diff** — ask the user explicitly: + + > No scope was given and the branch is clean. Would you like to: A) Run a **full repo scan** + > (slow — expect a high finding count and a longer review), B) Scope to a directory (give me a + > path or glob), or C) Cancel? + + Do not default to a full scan silently. Always require explicit confirmation for it. + +Once scope is fixed, list the files you will look at so the user can correct it before any work +happens. For a full repo scan, list directory-level counts instead of every file, and remind the +user that the auto-generated client directories will be skipped. + +### Step 2: Gather Hard Signals + +Run the cheap, deterministic checks first and collect their output: + +```bash +npx gulp check # TypeScript errors / warnings +npx gulp lint # ESLint findings +``` + +Also gather: + +- `git log --follow -- ` for files where age/ownership matters +- `grep -rn "TODO\|FIXME\|XXX\|HACK" ` for self-flagged debt +- `grep -rn ": any\b\|as any\b" ` for `any` usage (a CLAUDE.md violation) + +These produce the unambiguous findings — list them under "Hard signals" in the report. + +### Step 3: Scan for Soft Signals + +These are heuristics — judgment is required, and many will be discarded. Read the files and look +for: + +#### Organization + +- Files mixing unrelated concerns (could be split) +- Helpers defined far from their only caller (could be co-located, or vice versa) +- Files in awkward locations (e.g. a view-provider helper sitting in `src/utils/`) +- Inconsistent file/module naming within a directory +- Re-exports that obscure the real definition site +- Circular or near-circular imports + +#### Code Hygiene + +- Unused exports, unused imports, unreachable code +- Dead branches (conditions that can never fire given the types) +- Duplicated literals that should be named constants +- Duplicated logic across 2–3 sites that could be a shared helper (be conservative — see CLAUDE.md + "three similar lines is better than a premature abstraction") +- Magic numbers / hardcoded strings without explanation +- String unions where an `enum` would be clearer (per CLAUDE.md) +- Inconsistent error handling (e.g. `logger.error` instead of `logError()`) +- Inconsistent user-facing error surfacing (raw throws vs `showErrorNotificationWithButtons`) +- Functions doing two things — naming or length suggests split +- Names that no longer match what the symbol does +- Comments that contradict or no longer describe the code (UPDATE, do not delete) + +#### Type Safety + +- `any` types (always a finding per CLAUDE.md) +- Implicit `any` from missing annotations on exported functions +- Overly wide types (`object`, `Record`) where a real interface exists +- Missing JSDoc on exported functions and public methods (CLAUDE.md requirement) + +#### Disposable / Lifecycle (project-specific, CRITICAL) + +- Classes with event subscriptions that don't extend `DisposableCollection` +- `onDid*` / `.event(...)` return values not pushed to `this.disposables` +- `setInterval` / `setTimeout` without a cleared handle +- Long-lived `SidecarHandle` references (handles should be short-lived per CLAUDE.md) +- Classes not registered in `context.subscriptions` where they should be + +#### Test Hygiene + +- `.only` left in test files (always flag) +- Tests stubbing functions defined in the same module (Sinon limitation per CLAUDE.md) +- Skipped tests with no linked issue/comment +- Test files mirroring source-file organization poorly + +### Step 4: Analyze Test Coverage Over the Scope + +Before recommending any code changes, evaluate the test safety net over the touched files. Code +without coverage is risky to refactor — surface test gaps **before** code findings so the user can +choose to land tests first. + +For each file in scope, classify coverage: + +- **Unit coverage**: is there a co-located or matching `*.test.ts`? Do its `describe`/`it` blocks + reference the functions that would be touched by must-fix or should-fix findings? For a small + scope, run `npx gulp test --coverage` and read the Istanbul report; for a large scope, rely on + file existence + symbol-level greps. +- **E2E / behavioral coverage**: for command handlers, view providers, and webview code + (`src/commands/**`, `src/viewProviders/**`, `src/webview/**`), grep `tests/e2e/**` for the command + ID, view ID, or webview message type. Cross-check with `.claude/rules/testing/e2e-tests.md`. +- **Functional / webview coverage**: for webview-internal code, check `tests/functional/**` (or the + project's webview test location) via the same symbol/route greps. + +For each file, record one of: **covered**, **partial**, **uncovered**, and which specific findings +sit on uncovered code paths. + +Generate a **Test Improvements** bucket from the gaps. Recommend landing tests **before** code +changes when: + +- A must-fix or should-fix finding lives in an uncovered or partial area +- The change touches a high-risk surface (sidecar lifecycle, view providers, command handlers, + webview message handlers) +- This is a periodic pass with no PR deadline pressure + +If the user accepts test-improvement items, apply those first, get them green via +`npx gulp test -t ""`, and only then proceed to the code findings. Any tests written by this +skill must follow `.claude/rules/testing/*.md` and the `/mocha`, `/sinon`, `/playwright` skill +patterns — and must be genuine new coverage, not assertion-free filler. + +### Step 5: Categorize and Prioritize + +Sort findings into: + +| Bucket | Meaning | Default recommendation | +| --------------------- | -------------------------------------------------------- | -------------------------- | +| **Test Improvements** | Missing or thin coverage protecting code-change findings | Recommend landing FIRST | +| **Must-fix** | Violates CLAUDE.md rule (e.g. `any` type, disposables) | Recommend applying | +| **Should-fix** | Clear improvement, low risk, no behavior change | Recommend applying | +| **Consider** | Subjective; reasonable to leave as-is | Default to skipping | +| **Needs decision** | Touches behavior, scope, or a design choice | Surface, do NOT auto-apply | + +Findings that would change behavior — even subtly (error timing, log output a user might match on, +ordering of async work) — go into **Needs decision** and stay there unless the user explicitly +agrees the change is acceptable. + +### Step 6: Present the Report + +Output a single structured report. Cap the visible list at ~15 items across the actionable buckets. +Anything beyond that goes into a per-bucket remaining tally so the user knows what's been deferred +and can ask to see it. Use this format: + +```markdown +## Cleanup Pass — + +**Files inspected:** N **Hard-signal checks:** check ✓ / lint ✓ (or list failures) **Coverage +snapshot:** covered: X / partial: Y / uncovered: Z (of N files) + +### Test Improvements — recommended FIRST (before code changes) + +These protect the must-fix / should-fix changes below. Land these (or knowingly waive them) before +applying the code findings underneath. + +T1. `src/foo/bar.ts` — no unit test exists; covers must-fix #1 and should-fix #3 - **Suggested:** +add `tests/.../bar.test.ts` with cases for , T2. `src/commands/openTopic.ts` — +no E2E flow exercises this command - **Suggested:** add a test under `tests/e2e/...` using the page +object pattern + +### Must-fix (recommend applying all) + +1. `path/to/file.ts:42` — + - **Why:** + - **Change:** + - **Coverage:** covered ✓ / partial ⚠ / uncovered ✗ (links to T# if a test was proposed) + +### Should-fix (recommend applying all) + +2. `path/to/file.ts:120` — ... + +### Consider (default skip) + +3. `path/to/file.ts:55` — ... + +### Needs decision (NOT applied without explicit go-ahead) + +4. `path/to/file.ts:88` — + - **Why this is here:** + - **Options:** A) ... B) ... C) leave as-is + +### Remaining lower-priority items (not shown above) + +Listed by bucket and rough scope so you know what's been deferred: + +- Test Improvements: 4 more (3 unit-test gaps in `src/loaders/`, 1 E2E gap in `src/commands/`) +- Should-fix: 7 more (mostly missing JSDoc in `src/utils/**`, 2 magic-number extractions) +- Consider: 12 more (naming nits, comment refresh candidates, scoped across `src/viewProviders/**`) +- Needs decision: 2 more (both touching sidecar error surfacing — call out if you want details) + +Ask to see any bucket in full and I'll list them. +``` + +Keep each finding to ~3 lines. The report should be skimmable — the user is making a triage +decision, not reading a design doc. The "Remaining lower-priority items" section must give real +counts and rough scope (directory or theme), not just "more available." + +After the report, ask explicitly: + +> Which findings would you like to apply? You can reply with: +> +> - `tests first` (apply Test Improvements only, then re-evaluate), +> - numbers (`T1, 1, 3, 5-7`), +> - `all must-fix` / `all should-fix`, +> - `all except 4`, +> - `show remaining ` to expand a deferred list, +> - `skip all`, +> - or describe the subset in words. + +### Step 7: Apply the Approved Subset + +For each approved finding: + +1. Make the edit using `Edit` (or `Write` for new files only). +2. Keep edits minimal and localized — don't reformat surrounding code, don't rename unrelated + symbols. +3. Preserve comments. If a comment is now wrong because of the edit, update it; never just delete + it. +4. If during the edit you discover the change would alter behavior, STOP that finding and report + back — do not silently push it through. + +Group related edits into a single logical batch where it makes sense (e.g. all the +`logger.error → logError` replacements together), but still apply file-by-file so the user can audit +each diff. + +### Step 8: Verify + +After all approved edits are applied: + +```bash +npx gulp check # type checking must pass +npx gulp lint # lint must pass (or only show pre-existing warnings) +npx gulp test -t "" # relevant unit tests +``` + +If anything broke, report the failure and the specific edit that likely caused it. Offer to revert +that edit (`git checkout -- `) rather than piling on more changes to fix the symptom. + +### Step 9: Summarize + +Brief summary at the end: + +- Findings applied: N (list with file:line) +- Findings skipped: N +- Verification: check ✓ / lint ✓ / tests ✓ +- Any remaining "needs decision" items still on the table + +## What This Skill Does NOT Do + +- **No feature work.** If a finding is "this would be nicer if we also added X," it goes to "needs + decision" or gets dropped. +- **No rewrites.** Renaming a single misleading symbol is fine; restructuring a module is not — that + belongs in a real PR with its own design. +- **No dependency changes.** Bumping versions, swapping libraries, adding new packages — all out of + scope. +- **No test changes that alter assertions** (changing the meaning of a test). Removing `.only` or + reorganizing test file structure is fine. +- **No silent behavior changes.** Anything that changes what a user, a caller, or telemetry would + observe is "needs decision," full stop. +- **No "drive-by" edits** outside the agreed scope. Note them in the report if you spot them, but + don't touch them. + +## Anti-Patterns to Avoid + +- Presenting 40 findings with no prioritization — the user can't make decisions on a wall of text. + Cap the visible report at ~15 items and summarize the rest in the "Remaining lower-priority items" + section with real counts and rough scope (directories or themes), not a vague "more available." +- Suggesting an abstraction for two similar code blocks. Per CLAUDE.md: three similar lines is + better than a premature abstraction. Three call sites is the rough threshold for proposing a + helper, and even then it goes in "consider." +- Suggesting "add error handling" without a specific failure mode. Per CLAUDE.md: don't add error + handling for scenarios that can't happen. +- Flagging style preferences ESLint/Prettier would catch (or already accept). Hard signals from + `npx gulp lint` are enough on those. +- Touching auto-generated client code under + `src/clients/{kafkaRest,schemaRegistryRest,sidecar,flinkGateway}/**`. + +## Tips + +- Use the `Task` tool with the Explore agent for an initial sweep of a large scope — get a list of + candidate files and obvious findings, then read the highest-signal ones yourself. +- When in doubt about whether a change is functionally inert, push it to "needs decision." That + bucket exists precisely so judgment calls don't get silently merged in. +- Cross-reference suggestions against `.claude/rules/architecture/*.md` — proposing a change that + contradicts an architectural rule means the proposal is wrong, not the rule. +- The `/pr-review` skill is for evaluating a finished change; this skill is for finding things to + change. They are complementary — don't duplicate its output here. +- The `/simplify` skill operates on _changed_ code; this skill operates on a _scope_ the user picks. + Use simplify after editing; use cleanup as a periodic pass. diff --git a/.claude/skills/cleanup/conventions.md b/.claude/skills/cleanup/conventions.md new file mode 100644 index 000000000..d2c5f8c71 --- /dev/null +++ b/.claude/skills/cleanup/conventions.md @@ -0,0 +1,95 @@ +--- +last-edited: 2026-05-14 +repo: confluentinc/vscode +--- + +# Team Conventions for confluentinc/vscode + +These conventions augment `CLAUDE.md`. Where they conflict, **`CLAUDE.md` wins** — codified rules +exist to keep AI agents out of risky territory where a human might reasonably navigate nuance. Edit +this file directly when conventions shift; rerun `/learn-conventions` periodically to surface +patterns this doc may have missed. + +## High-confidence rules + +- Prefer explicit, simple code over clever abstraction. Pull out helper layers when they hurt + readability rather than building them up. +- Feature-scoped subdirs in `src/` over creating new top-level modules. Helpers live alongside the + feature they support. + +## Style + +- Run Prettier locally — unformatted code is treated as a process smell. +- Don't over-shorten code into one-liners just for line count if it obscures intent. +- Side effects inside conditionals are surprising. Prefer separating `get()` from `reveal()` (or + equivalent), even if it costs a line. + +## Testing + +- Mocha tests must be isolated. Rely on `globalBeforeAll` to activate the extension rather than + assuming shared state from prior tests. +- Split real unit tests from integration-ish tests. Don't mix them in one window/environment. +- Fully stub external/API behavior in unit tests. Unstubbed loader/API calls cause noise and + nondeterminism. +- Keep Playwright Page Object Models lean. Overbuilt POMs hurt clarity and should be ripped out when + they do. +- Prefer explicit Playwright waits/assertions close to the behavior under test, even if verbose, + over hidden helper magic. +- Use shared Playwright fixtures (e.g. coverage config) — don't duplicate test infrastructure. +- E2E coverage should hit realistic broad user flows first, not deep coverage of one UI area. +- E2E tests should verify visible webview UI affordances, not just backend effects. +- Use real-enough fixtures when behavior depends on realistic validation. +- `Promise.all` wrapping a Playwright load-state assertion can be a trap — the load state often + resolves immediately. Awaiting the triggering command directly often gives better timing. +- Don't put `testInfo.skip()` inside `expect().toPass()`. `toPass` retries the throw and defeats the + skip. + +## Error handling & observability + +- Surface the actual underlying cause to users, not a generic "check credentials or network" + fallback. +- Validate user-entered config (file paths, etc.) locally before sending to sidecar. Give an + explicit corrective message when the value is wrong. +- Log level reflects impact: loader fetch failures are warnings; sidecar transport problems are + errors. +- Use `logError()` over `logger.error()` and `showErrorNotificationWithButtons()` over + `window.showErrorMessage()` — corroborates CLAUDE.md. + +## Architecture & module organization + +- Centralize lifecycle/dispose wiring in a central listener layer. Don't bury extra listeners inside + the managed class unless necessary. +- For singleton-ish managers, `dispose()` should also null out the instance so it can be recreated + cleanly. +- Don't spread enable/disable hooks across many components — keep lifecycle gates near + initialization. +- Resource/provider internals should maintain enough state for a top-level view refresh to be cheap + and deterministic (cached repaint, no refetch). +- Prefer targeted fixes over broad central changes when centralization causes weird build or test + behavior. + +## Project-specific patterns + +- Use `SidecarHandle` when sidecar is the natural integration boundary. +- Don't force everything through sidecar. Direct API client modules in a feature subdir are accepted + (e.g. Docker engine API, scaffolding service). +- When a problem spans direct + CCloud paths and there's no single sidecar codepath to fix, fix it + in the extension layer. + +## Anti-patterns to avoid + +- Overbuilding Page Object Models. +- Mixing unit and integration tests in one Mocha environment. +- Hiding important waits inside opaque helpers. +- Creating new top-level `src/` modules per feature. +- Forcing sidecar usage when a direct API call is clearer. +- `git push --force` without `--force-with-lease` on automation branches. + +## External references + +- [Playwright actionability / assertions](https://playwright.dev/docs/actionability#assertions) — + baseline reference for E2E style decisions. +- [Rollwright coverage docs](https://unknownprinciple.github.io/rollwright/coverage.html) — webview + test coverage setup. +- [`vscode-test-playwright`](https://www.npmjs.com/package/vscode-test-playwright) — basis for the + E2E infrastructure used here. diff --git a/.gitignore b/.gitignore index 99d6af95e..4995eef9a 100644 --- a/.gitignore +++ b/.gitignore @@ -127,3 +127,6 @@ bin/ide-sidecar-*-runner *.pem playwright-report/ + +# Internal-only Claude skills (kept locally; not pushed) +.claude/skills/learn-conventions/