feat(hermit): per-fleet domain-brainstorm skills

## Context

Surfaced 2026-05-14 from an interactive operator discussion while improving the existing `/claude-code-hermit:capability-brainstorm` skill. The operator asked whether capability-brainstorm currently surfaces domain-feature ideas (e.g. "your test suite is slow", "5 motion sensors but only 1 is wired into automations"), or only hermit-side capability ideas (new skills, agents, routines).

It only surfaces hermit-side. The skill reads project context (memory, available MCPs and channels, recent compiled artifacts, README + CLAUDE.md + top-level `ls`) for **grounding**, but the ideas it emits are pinned to `category: capability`, which `/proposal-create` defines as "new agent, skill, or heartbeat item." Domain-feature ideas grounded in actual project state never make it out.

After a short brainstorm of what each fleet hermit *could* surface if the scope were widened, the operator confirmed the appetite for this and asked for it captured as a proposal.

## Problem

Operators have useful domain-feature ideas — concrete, observable from the data the hermit already touches — that go unsurfaced. Examples per fleet:

- **dev-hermit:** "tests cover `lib/` at 94% but `cli/` has 0 tests; CI green is misleading"; "`db.test.js` is 38s of your 45s test runtime"; "README mentions feature X but no module implements it"
- **homeassistant-hermit:** "you have a `bom_dia` script but no `boa_noite` equivalent"; "5 motion sensors exist; only 1 is wired into automations"; "persiana scripts cover bedroom but not the office janelão"
- **fitness-hermit:** "12 cardio sessions, 1 strength session in last 3 weeks — imbalance vs stated goal"; "sleep is in your goals but no sleep entry in 17 days"; "same workout repeated 4 weeks running"

The current `capability-brainstorm` could be widened to emit both classes, but that runs into three real problems:

1. **Reading-depth mismatch.** Capability-brainstorm's scan is sketch-level by design (top 15 lines of compiled, README, top-level `ls`) — fine for "what skill could exist?" but too thin for "what's wrong with your auth code?" Forcing it deeper inflates token cost on every invocation, including the ones that only want hermit-side ideas.
2. **Kill-criteria signal degradation.** Capability-brainstorm's self-audit retires the skill if triage-survival drops below 25% or PROP-acceptance below 30%. A 25% rate mixing capability-class and domain-class ideas is uninterpretable — you can't tell which class is dying, and tuning the prompt to lift one class may sink the other.
3. **Per-fleet context lives in the fleet.** Fleet plugins already have domain reading built in (HA's entity registry MCP, dev-hermit's codebase scanners, fitness-hermit's activity log shape). Core has none of this and shouldn't grow it.

## Proposed Solution

Add a `domain-brainstorm` skill to each fleet plugin, scoped to that fleet's domain. Uniform name across fleets (`domain-brainstorm`) since the domain is implied by the hermit context — operators don't need to remember `dev-brainstorm` vs `ha-brainstorm` vs `fitness-brainstorm`.

**Skill structure** (mirror `capability-brainstorm`'s shape):
- On-demand only; never autonomous. Triggers on phrases like "what should I be fixing?", "anything wrong with X?", "brainstorm features".
- Cap 2 ideas per invocation. Same concrete-friction + ≥2-named-grounding-items gates.
- Single-pass through `/proposal-create` → `proposal-triage` pipeline.
- Output PROPs land in the standard `.claude-code-hermit/proposals/` stream (operator-confirmed) so review stays unified.
- Each fleet's skill carries its own kill-criteria threshold tuned to that domain — survival rates won't be comparable across fleets.

**Per-fleet reading depth** (where the value is):
- *dev-hermit:* recent git activity (last 50 commits, churned files), test runner output (slowest suites, coverage gaps), package.json/Cargo.toml/pyproject delta vs lockfile, README ↔ source drift
- *homeassistant-hermit:* entity registry (existing, unused, recently added), automation/script files (asymmetries, coverage gaps), recent event log shape
- *fitness-hermit:* recent activity log entries, workout-type distribution, stated goals from operator profile, sleep/macro/cardio gaps

**Pipeline plumbing** (small core changes):
- Add `domain-brainstorm` as a recognized `Evidence Source` in `proposal-triage` (bypass recurrence like `capability-brainstorm`, since the brainstorm pass establishes the candidate).
- Either add a new `category: domain-feature` in `/proposal-create` for filterability, or reuse `improvement` and rely on tags. The category route is preferred because it gives operators a clean filter in `/proposal-list`.

**Implementation priority** (operator-confirmed): dev > HA > fitness. Ship dev's first, observe for a few weeks of real use, then HA, then fitness.

## Impact

**Effort.** Roughly 1 skill per fleet plugin (3 total at full rollout), each ~100 lines mirroring `capability-brainstorm`'s shape. Core changes are minor: extend `proposal-triage`'s Evidence Source list, optionally add a `category` value. Estimated `days` per fleet for the first ship, `day` per fleet for subsequent ones after the pattern stabilizes.

**Benefit.** Surfaces a class of ideas the hermit currently watches but can't act on. The operator's "things I should fix but forgot to ask about" category gets a structured intake. Per-fleet scoping keeps each skill's reading depth (and token cost) calibrated to where the signal is.

**Risk.** Domain ideas can be lower-quality than capability ideas because deeper reading is still token-bounded — a sketch read of a 50k-LOC repo will miss most real problems. Mitigated by per-fleet kill criteria (retire if survival/acceptance drops) and the cap-of-2 discipline. Worst case: a fleet's domain-brainstorm produces noise, fails its self-audit, and gets retired — non-destructive failure mode.

**Open design questions** (decide during implementation, not blockers):
- New `Evidence Source: domain-brainstorm` vs reusing `capability-brainstorm`: leaning new, for distinct kill-criteria interpretation.
- New `category: domain-feature` vs reusing `improvement`: leaning new, for filterability.
- Whether the dev-hermit version should accept a `--depth=light|deep` parameter for token-budget control, or pick one heuristic.

---
*Filed via hermit-scribe · proposal=PROP-022 · session=null*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hermit): per-fleet domain-brainstorm skills #76

Context

Problem

Proposed Solution

Impact

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(hermit): per-fleet domain-brainstorm skills #76

Description

Context

Problem

Proposed Solution

Impact

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions