Skip to content

feat(hermit): per-fleet domain-brainstorm skills #76

@hermit-scribe

Description

@hermit-scribe

Context

Surfaced 2026-05-14 from an interactive operator discussion while improving the existing /claude-code-hermit:capability-brainstorm skill. The operator asked whether capability-brainstorm currently surfaces domain-feature ideas (e.g. "your test suite is slow", "5 motion sensors but only 1 is wired into automations"), or only hermit-side capability ideas (new skills, agents, routines).

It only surfaces hermit-side. The skill reads project context (memory, available MCPs and channels, recent compiled artifacts, README + CLAUDE.md + top-level ls) for grounding, but the ideas it emits are pinned to category: capability, which /proposal-create defines as "new agent, skill, or heartbeat item." Domain-feature ideas grounded in actual project state never make it out.

After a short brainstorm of what each fleet hermit could surface if the scope were widened, the operator confirmed the appetite for this and asked for it captured as a proposal.

Problem

Operators have useful domain-feature ideas — concrete, observable from the data the hermit already touches — that go unsurfaced. Examples per fleet:

  • dev-hermit: "tests cover lib/ at 94% but cli/ has 0 tests; CI green is misleading"; "db.test.js is 38s of your 45s test runtime"; "README mentions feature X but no module implements it"
  • homeassistant-hermit: "you have a bom_dia script but no boa_noite equivalent"; "5 motion sensors exist; only 1 is wired into automations"; "persiana scripts cover bedroom but not the office janelão"
  • fitness-hermit: "12 cardio sessions, 1 strength session in last 3 weeks — imbalance vs stated goal"; "sleep is in your goals but no sleep entry in 17 days"; "same workout repeated 4 weeks running"

The current capability-brainstorm could be widened to emit both classes, but that runs into three real problems:

  1. Reading-depth mismatch. Capability-brainstorm's scan is sketch-level by design (top 15 lines of compiled, README, top-level ls) — fine for "what skill could exist?" but too thin for "what's wrong with your auth code?" Forcing it deeper inflates token cost on every invocation, including the ones that only want hermit-side ideas.
  2. Kill-criteria signal degradation. Capability-brainstorm's self-audit retires the skill if triage-survival drops below 25% or PROP-acceptance below 30%. A 25% rate mixing capability-class and domain-class ideas is uninterpretable — you can't tell which class is dying, and tuning the prompt to lift one class may sink the other.
  3. Per-fleet context lives in the fleet. Fleet plugins already have domain reading built in (HA's entity registry MCP, dev-hermit's codebase scanners, fitness-hermit's activity log shape). Core has none of this and shouldn't grow it.

Proposed Solution

Add a domain-brainstorm skill to each fleet plugin, scoped to that fleet's domain. Uniform name across fleets (domain-brainstorm) since the domain is implied by the hermit context — operators don't need to remember dev-brainstorm vs ha-brainstorm vs fitness-brainstorm.

Skill structure (mirror capability-brainstorm's shape):

  • On-demand only; never autonomous. Triggers on phrases like "what should I be fixing?", "anything wrong with X?", "brainstorm features".
  • Cap 2 ideas per invocation. Same concrete-friction + ≥2-named-grounding-items gates.
  • Single-pass through /proposal-createproposal-triage pipeline.
  • Output PROPs land in the standard .claude-code-hermit/proposals/ stream (operator-confirmed) so review stays unified.
  • Each fleet's skill carries its own kill-criteria threshold tuned to that domain — survival rates won't be comparable across fleets.

Per-fleet reading depth (where the value is):

  • dev-hermit: recent git activity (last 50 commits, churned files), test runner output (slowest suites, coverage gaps), package.json/Cargo.toml/pyproject delta vs lockfile, README ↔ source drift
  • homeassistant-hermit: entity registry (existing, unused, recently added), automation/script files (asymmetries, coverage gaps), recent event log shape
  • fitness-hermit: recent activity log entries, workout-type distribution, stated goals from operator profile, sleep/macro/cardio gaps

Pipeline plumbing (small core changes):

  • Add domain-brainstorm as a recognized Evidence Source in proposal-triage (bypass recurrence like capability-brainstorm, since the brainstorm pass establishes the candidate).
  • Either add a new category: domain-feature in /proposal-create for filterability, or reuse improvement and rely on tags. The category route is preferred because it gives operators a clean filter in /proposal-list.

Implementation priority (operator-confirmed): dev > HA > fitness. Ship dev's first, observe for a few weeks of real use, then HA, then fitness.

Impact

Effort. Roughly 1 skill per fleet plugin (3 total at full rollout), each ~100 lines mirroring capability-brainstorm's shape. Core changes are minor: extend proposal-triage's Evidence Source list, optionally add a category value. Estimated days per fleet for the first ship, day per fleet for subsequent ones after the pattern stabilizes.

Benefit. Surfaces a class of ideas the hermit currently watches but can't act on. The operator's "things I should fix but forgot to ask about" category gets a structured intake. Per-fleet scoping keeps each skill's reading depth (and token cost) calibrated to where the signal is.

Risk. Domain ideas can be lower-quality than capability ideas because deeper reading is still token-bounded — a sketch read of a 50k-LOC repo will miss most real problems. Mitigated by per-fleet kill criteria (retire if survival/acceptance drops) and the cap-of-2 discipline. Worst case: a fleet's domain-brainstorm produces noise, fails its self-audit, and gets retired — non-destructive failure mode.

Open design questions (decide during implementation, not blockers):

  • New Evidence Source: domain-brainstorm vs reusing capability-brainstorm: leaning new, for distinct kill-criteria interpretation.
  • New category: domain-feature vs reusing improvement: leaning new, for filterability.
  • Whether the dev-hermit version should accept a --depth=light|deep parameter for token-budget control, or pick one heuristic.

Filed via hermit-scribe · proposal=PROP-022 · session=null

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions