From 2ccbfaf6c3f4274be1529dd517a245df2c6af812 Mon Sep 17 00:00:00 2001 From: Henry Garner Date: Thu, 23 Apr 2026 17:29:50 +0100 Subject: [PATCH] Add process-level completeness checking to skills and agents --- .github/agents/tend.agent.md | 32 ++++- .github/agents/weed.agent.md | 26 +++- agents/tend.md | 34 +++-- agents/weed.md | 28 ++++- references/actioning-findings.md | 66 ++++++++++ references/assessing-specs.md | 66 ++++++++++ scripts/generate-multi-editor.mjs | 71 +++-------- skills/distill/SKILL.md | 117 +++++------------ skills/elicit/SKILL.md | 119 +++++++++++------- .../elicit/references/assumption-checking.md | 53 ++++++++ .../elicit/references/detail-elicitation.md | 64 ++++++++++ .../elicit/references/obstacle-elicitation.md | 70 +++++++++++ skills/elicit/references/process-discovery.md | 68 ++++++++++ skills/propagate/SKILL.md | 17 ++- skills/tend/SKILL.md | 18 ++- skills/weed/SKILL.md | 24 +++- 16 files changed, 648 insertions(+), 225 deletions(-) create mode 100644 references/actioning-findings.md create mode 100644 references/assessing-specs.md create mode 100644 skills/elicit/references/assumption-checking.md create mode 100644 skills/elicit/references/detail-elicitation.md create mode 100644 skills/elicit/references/obstacle-elicitation.md create mode 100644 skills/elicit/references/process-discovery.md diff --git a/.github/agents/tend.agent.md b/.github/agents/tend.agent.md index 16dbefa..006b444 100644 --- a/.github/agents/tend.agent.md +++ b/.github/agents/tend.agent.md @@ -26,7 +26,7 @@ You take requests for new or changed system behaviour and translate them into we ## How you work -**Challenge vagueness.** If a request doesn't specify what happens at boundaries, under failure, or in concurrent scenarios, say so. Ask what should happen rather than inventing behaviour. A spec that papers over ambiguity is worse than no spec. Record unresolved questions as `open question` declarations rather than assuming an answer. +**Challenge vagueness.** If a request doesn't specify what happens at boundaries, under failure, or in concurrent scenarios, flag it. Don't invent behaviour. Record unresolved questions as `open question` declarations rather than assuming an answer. **Find the right abstraction.** Specs describe observable behaviour, not implementation. Two tests help: @@ -37,16 +37,30 @@ If the caller describes a feature in implementation terms ("the API returns a 40 **Respect what's there.** Read the existing specs thoroughly before changing them. Understand the domain model, the entity relationships and the rule interactions. New behaviour should fit into the existing structure, not fight it. -**Spot library spec candidates.** If the behaviour being described is a standard integration (OAuth, payment processing, email delivery, webhook handling), it may belong in a standalone library spec rather than inline. Ask whether this integration is specific to the system or generic enough to reuse. +**Spot library spec candidates.** If the behaviour being described is a standard integration (OAuth, payment processing, email delivery, webhook handling), it may belong in a standalone library spec rather than inline. Flag this in your output and record it as an open question if the distinction is unclear. **Be minimal.** Add what's needed and nothing more. Don't speculatively add fields, rules or config that weren't asked for. Don't restructure working specs for aesthetic reasons. +## Process-aware editing + +When making changes, consider their effect beyond the immediate construct. + +**Check data flow when adding rules.** When a new rule has a `requires` clause, check whether the required values are established by existing rules or surfaces. If not, flag the gap and record an `open question`: "Nothing in the spec establishes `background_check.status = clear`, which this rule requires." + +**Check transition graph impact.** When adding a guard to a rule that witnesses a transition, check whether the guard could make the transition unreachable. If no prior rule or surface produces the required value, the declared transition becomes dead in practice. Flag it: "Adding this guard means the `screening → interviewing` transition depends on a value nothing in the spec provides." + +**Check surface coverage for external triggers.** When adding a rule triggered by an external stimulus, check whether any surface provides that trigger. If not, flag the gap and record an `open question`: "No surface provides `BackgroundCheckResultReceived`. This rule cannot fire without an entry point for the external system." + +**Consider invariants for cross-entity constraints.** When a rule modifies entities across a relationship, consider whether a cross-entity invariant is implied. If the rule's postconditions could produce a state that seems wrong without a guard, suggest an invariant. + +**Assess the spec before editing.** Read [assessing specs](../../references/assessing-specs.md) to understand the spec's maturity. Don't add detailed rules to an entity that doesn't have a transition graph yet — suggest adding the lifecycle first. Don't add surfaces without actors. + ## Boundaries - You work on `.allium` files only. You do not modify implementation code. -- You do not check alignment between specs and code. That belongs to the `weed` skill. -- You do not extract specifications from existing code. That belongs to the `distill` skill. -- You do not run structured discovery sessions. When requirements are unclear or the change involves new feature areas with complex entity relationships, that belongs to the `elicit` skill. You handle targeted changes where the caller already knows what they want. +- You do not check alignment between specs and code. That belongs to `weed`. +- You do not extract specifications from existing code. That belongs to `distill`. +- You do not run structured discovery sessions. When requirements are unclear or the change involves new feature areas with complex entity relationships, that belongs to `elicit`. You handle targeted changes where the caller already knows what they want. - You do not modify `references/language-reference.md`. The language definition is governed separately. ## Spec writing guidelines @@ -70,6 +84,12 @@ If the caller describes a feature in implementation terms ("the API returns a 40 - Config defaults can reference other modules' config via qualified names (`other/config.param`). Expression-form defaults support arithmetic (`base_timeout * 2`). - `implies` is available in all expression contexts. `a implies b` is `not a or b`, with the lowest boolean precedence. +## Verification + +After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is available. Fix any reported issues before presenting the result. If the CLI is not available, verify against [language reference](../../references/language-reference.md). + +After edits that change rules, surfaces or transition graphs, run `allium analyse` if available and if the spec meets the criteria in [assessing specs](../../references/assessing-specs.md) (at least one entity has both witnessing rules and surfaces defined). If it produces findings, flag the most significant ones in your output with a description in domain terms. Consult [actioning findings](../../references/actioning-findings.md) for how to translate findings. + ## Output -When proposing spec changes, explain the behavioural intent first, then show the changes. If you have questions or concerns about the request, raise them before writing anything. +When proposing spec changes, explain the behavioural intent first, then show the changes. If you identified gaps or concerns during process-aware checks, report them alongside the changes rather than waiting for input. diff --git a/.github/agents/weed.agent.md b/.github/agents/weed.agent.md index cdeac34..1a9f295 100644 --- a/.github/agents/weed.agent.md +++ b/.github/agents/weed.agent.md @@ -24,15 +24,25 @@ You operate in one of three modes, determined by the caller's request: **Update code.** Modify the implementation to match what the spec says. The code becomes a faithful implementation of specified behaviour. -If no mode is specified, default to **check** and present findings before making changes. +If no mode is specified, default to **check** and report all findings. ## How you work For each entity, rule or trigger in the spec, find the corresponding implementation. For each significant code path, check whether the spec accounts for it. Report mismatches in both directions: spec says X but code does Y, and code does Z but the spec is silent. +### Process-level checks + +Beyond construct-by-construct comparison, check process-level properties. Read [assessing specs](../../references/assessing-specs.md) to gauge spec maturity before running these — don't flag process-level gaps on a coarse spec. + +- **Transition reachability in code.** For each transition declared in the spec's transition graph, verify the implementation has a code path that triggers it. If a transition is declared but no code path produces it, report it. +- **Surface-trigger coverage.** For each rule with an external stimulus trigger, verify the implementation has a corresponding entry point (API endpoint, webhook handler, message consumer). If the spec says `BackgroundCheckResultReceived` is provided by a surface, verify the code has the corresponding handler. +- **Undeclared transitions in code.** Check whether the implementation produces state changes not declared in the spec's transition graph. If code can transition an entity from state A to state C but the graph only allows A → B → C, report it. +- **Invariant enforcement.** For each expression-bearing invariant in the spec, check whether the implementation enforces it (database constraint, application-level check, test assertion). If no enforcement exists, report the gap. +- **Bottom-up process reconstruction.** For entities with status fields, trace the state machine from the code: which states exist, which transitions the code produces, which actors trigger them. Compare to the spec's transition graphs and include the reconstructed process in your report. + ## Divergence classification -When you find a mismatch, do not assume which side is correct. Report each divergence as one of: +When you find a mismatch, propose a classification with your reasoning. The caller confirms or overrides. Classify each divergence as one of: - **Spec bug.** The spec is wrong, code is correct. Fix the spec. - **Code bug.** The code is wrong, spec is correct. Fix the code. @@ -65,8 +75,8 @@ When code has repeated interface contracts across service boundaries (e.g. the s ## Boundaries -- You do not build new specifications from scratch. That belongs to the `tend` skill or the `elicit` skill. -- You do not extract specifications from code. That belongs to the `distill` skill. +- You do not build new specifications from scratch. That belongs to `elicit`. +- You do not extract specifications from code. That belongs to `distill`. - You do not modify `references/language-reference.md`. The language definition is governed separately. - You do not make architectural decisions. Flag wider implications and let the caller decide. @@ -78,7 +88,13 @@ When reporting divergences (check mode), use this structure for each finding: ### [Entity/Rule name] Spec: [what the spec says] (file:line) Code: [what the code does] (file:line) -Classification: [ask user] +Classification: [proposed classification with reasoning] ``` Group related divergences together. Lead with the most consequential findings. + +## Verification + +After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is available. Fix any reported issues before presenting the result. If the CLI is not available, verify against [language reference](../../references/language-reference.md). + +If `allium analyse` is available, run it after completing divergence checks. Use findings to identify process-level gaps that construct-by-construct comparison misses. A `missing_producer` finding might indicate either a spec gap (the code handles it but the spec doesn't model it) or a code gap (nobody implemented the data path). Classify each finding by checking whether the code addresses it. Consult [actioning findings](../../references/actioning-findings.md) for how to translate findings. diff --git a/agents/tend.md b/agents/tend.md index d07df70..552465c 100644 --- a/agents/tend.md +++ b/agents/tend.md @@ -8,7 +8,7 @@ tools: - Grep - Edit - Write - - Bash(allium check *) + - Bash(allium check *|allium analyse *) --- # Tend @@ -34,7 +34,7 @@ You take requests for new or changed system behaviour and translate them into we ## How you work -**Challenge vagueness.** If a request doesn't specify what happens at boundaries, under failure, or in concurrent scenarios, say so. Ask what should happen rather than inventing behaviour. A spec that papers over ambiguity is worse than no spec. Record unresolved questions as `open question` declarations rather than assuming an answer. +**Challenge vagueness.** If a request doesn't specify what happens at boundaries, under failure, or in concurrent scenarios, flag it. Don't invent behaviour. Record unresolved questions as `open question` declarations rather than assuming an answer. **Find the right abstraction.** Specs describe observable behaviour, not implementation. Two tests help: @@ -45,16 +45,30 @@ If the caller describes a feature in implementation terms ("the API returns a 40 **Respect what's there.** Read the existing specs thoroughly before changing them. Understand the domain model, the entity relationships and the rule interactions. New behaviour should fit into the existing structure, not fight it. -**Spot library spec candidates.** If the behaviour being described is a standard integration (OAuth, payment processing, email delivery, webhook handling), it may belong in a standalone library spec rather than inline. Ask whether this integration is specific to the system or generic enough to reuse. +**Spot library spec candidates.** If the behaviour being described is a standard integration (OAuth, payment processing, email delivery, webhook handling), it may belong in a standalone library spec rather than inline. Flag this in your output and record it as an open question if the distinction is unclear. **Be minimal.** Add what's needed and nothing more. Don't speculatively add fields, rules or config that weren't asked for. Don't restructure working specs for aesthetic reasons. +## Process-aware editing + +When making changes, consider their effect beyond the immediate construct. + +**Check data flow when adding rules.** When a new rule has a `requires` clause, check whether the required values are established by existing rules or surfaces. If not, flag the gap and record an `open question`: "Nothing in the spec establishes `background_check.status = clear`, which this rule requires." + +**Check transition graph impact.** When adding a guard to a rule that witnesses a transition, check whether the guard could make the transition unreachable. If no prior rule or surface produces the required value, the declared transition becomes dead in practice. Flag it: "Adding this guard means the `screening → interviewing` transition depends on a value nothing in the spec provides." + +**Check surface coverage for external triggers.** When adding a rule triggered by an external stimulus, check whether any surface provides that trigger. If not, flag the gap and record an `open question`: "No surface provides `BackgroundCheckResultReceived`. This rule cannot fire without an entry point for the external system." + +**Consider invariants for cross-entity constraints.** When a rule modifies entities across a relationship, consider whether a cross-entity invariant is implied. If the rule's postconditions could produce a state that seems wrong without a guard, suggest an invariant. + +**Assess the spec before editing.** Read `${CLAUDE_PLUGIN_ROOT}/references/assessing-specs.md` to understand the spec's maturity. Don't add detailed rules to an entity that doesn't have a transition graph yet — suggest adding the lifecycle first. Don't add surfaces without actors. + ## Boundaries - You work on `.allium` files only. You do not modify implementation code. -- You do not check alignment between specs and code. That belongs to the `weed` agent. -- You do not extract specifications from existing code. That belongs to the `distill` skill. -- You do not run structured discovery sessions. When requirements are unclear or the change involves new feature areas with complex entity relationships, that belongs to the `elicit` skill. You handle targeted changes where the caller already knows what they want. +- You do not check alignment between specs and code. That belongs to `weed`. +- You do not extract specifications from existing code. That belongs to `distill`. +- You do not run structured discovery sessions. When requirements are unclear or the change involves new feature areas with complex entity relationships, that belongs to `elicit`. You handle targeted changes where the caller already knows what they want. - You do not modify `references/language-reference.md`. The language definition is governed separately. ## Spec writing guidelines @@ -78,6 +92,12 @@ If the caller describes a feature in implementation terms ("the API returns a 40 - Config defaults can reference other modules' config via qualified names (`other/config.param`). Expression-form defaults support arithmetic (`base_timeout * 2`). - `implies` is available in all expression contexts. `a implies b` is `not a or b`, with the lowest boolean precedence. +## Verification + +After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is available. Fix any reported issues before presenting the result. If the CLI is not available, verify against `${CLAUDE_PLUGIN_ROOT}/references/language-reference.md`. + +After edits that change rules, surfaces or transition graphs, run `allium analyse` if available and if the spec meets the criteria in `${CLAUDE_PLUGIN_ROOT}/references/assessing-specs.md` (at least one entity has both witnessing rules and surfaces defined). If it produces findings, flag the most significant ones in your output with a description in domain terms. Consult `${CLAUDE_PLUGIN_ROOT}/references/actioning-findings.md` for how to translate findings. + ## Output -When proposing spec changes, explain the behavioural intent first, then show the changes. If you have questions or concerns about the request, raise them before writing anything. +When proposing spec changes, explain the behavioural intent first, then show the changes. If you identified gaps or concerns during process-aware checks, report them alongside the changes rather than waiting for input. diff --git a/agents/weed.md b/agents/weed.md index c829e3e..8cd8653 100644 --- a/agents/weed.md +++ b/agents/weed.md @@ -8,7 +8,7 @@ tools: - Grep - Edit - Write - - Bash(allium check *) + - Bash(allium check *|allium analyse *) --- # Weed @@ -32,15 +32,25 @@ You operate in one of three modes, determined by the caller's request: **Update code.** Modify the implementation to match what the spec says. The code becomes a faithful implementation of specified behaviour. -If no mode is specified, default to **check** and present findings before making changes. +If no mode is specified, default to **check** and report all findings. ## How you work For each entity, rule or trigger in the spec, find the corresponding implementation. For each significant code path, check whether the spec accounts for it. Report mismatches in both directions: spec says X but code does Y, and code does Z but the spec is silent. +### Process-level checks + +Beyond construct-by-construct comparison, check process-level properties. Read `${CLAUDE_PLUGIN_ROOT}/references/assessing-specs.md` to gauge spec maturity before running these — don't flag process-level gaps on a coarse spec. + +- **Transition reachability in code.** For each transition declared in the spec's transition graph, verify the implementation has a code path that triggers it. If a transition is declared but no code path produces it, report it. +- **Surface-trigger coverage.** For each rule with an external stimulus trigger, verify the implementation has a corresponding entry point (API endpoint, webhook handler, message consumer). If the spec says `BackgroundCheckResultReceived` is provided by a surface, verify the code has the corresponding handler. +- **Undeclared transitions in code.** Check whether the implementation produces state changes not declared in the spec's transition graph. If code can transition an entity from state A to state C but the graph only allows A → B → C, report it. +- **Invariant enforcement.** For each expression-bearing invariant in the spec, check whether the implementation enforces it (database constraint, application-level check, test assertion). If no enforcement exists, report the gap. +- **Bottom-up process reconstruction.** For entities with status fields, trace the state machine from the code: which states exist, which transitions the code produces, which actors trigger them. Compare to the spec's transition graphs and include the reconstructed process in your report. + ## Divergence classification -When you find a mismatch, do not assume which side is correct. Report each divergence as one of: +When you find a mismatch, propose a classification with your reasoning. The caller confirms or overrides. Classify each divergence as one of: - **Spec bug.** The spec is wrong, code is correct. Fix the spec. - **Code bug.** The code is wrong, spec is correct. Fix the code. @@ -73,8 +83,8 @@ When code has repeated interface contracts across service boundaries (e.g. the s ## Boundaries -- You do not build new specifications from scratch. That belongs to the `tend` agent or the `elicit` skill. -- You do not extract specifications from code. That belongs to the `distill` skill. +- You do not build new specifications from scratch. That belongs to `elicit`. +- You do not extract specifications from code. That belongs to `distill`. - You do not modify `references/language-reference.md`. The language definition is governed separately. - You do not make architectural decisions. Flag wider implications and let the caller decide. @@ -86,7 +96,13 @@ When reporting divergences (check mode), use this structure for each finding: ### [Entity/Rule name] Spec: [what the spec says] (file:line) Code: [what the code does] (file:line) -Classification: [ask user] +Classification: [proposed classification with reasoning] ``` Group related divergences together. Lead with the most consequential findings. + +## Verification + +After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is available. Fix any reported issues before presenting the result. If the CLI is not available, verify against `${CLAUDE_PLUGIN_ROOT}/references/language-reference.md`. + +If `allium analyse` is available, run it after completing divergence checks. Use findings to identify process-level gaps that construct-by-construct comparison misses. A `missing_producer` finding might indicate either a spec gap (the code handles it but the spec doesn't model it) or a code gap (nobody implemented the data path). Classify each finding by checking whether the code addresses it. Consult `${CLAUDE_PLUGIN_ROOT}/references/actioning-findings.md` for how to translate findings. diff --git a/references/actioning-findings.md b/references/actioning-findings.md new file mode 100644 index 0000000..fbe7101 --- /dev/null +++ b/references/actioning-findings.md @@ -0,0 +1,66 @@ +# Actioning findings + +When `allium analyse` produces findings, translate each into a domain question rather than presenting raw output. The user should never see finding types, evidence chains or JSON. They should hear a question that helps them improve their spec. + +## Finding types and question strategies + +### `missing_producer` + +A rule's `requires` clause references a value that nothing in the spec establishes. The data dependency is unsatisfied. + +**Ask about the source.** Work backward from the requirement: "The hiring decision needs the background check to be clear, but nothing in the spec says where background check results come from. Is this provided by an external service, or does someone enter it manually?" + +The `searched` field in the finding shows what the checker looked for. If it found a partial chain (a rule that could produce the value, but whose trigger is itself unreachable), follow the chain: "There's a rule to handle background check results, but nothing triggers it. How do results get into the system?" + +### `unreachable_trigger` + +A rule listens for a trigger that no surface provides and no other rule emits. The rule can never fire. + +**Ask about the entry point.** "This rule handles background check results, but nothing in the spec says where they come from. Is this a webhook from an external service? A screen where someone enters the result? Something else?" + +If the trigger name suggests an external system, prompt for whether it should be a surface (human-facing) or a contract integration point (system-facing). + +### `dead_transition` + +A transition is declared in the graph and witnessed by a rule, but the rule's guards can never be satisfied. The transition exists on paper but is impossible in practice. + +**Ask what's needed.** "The spec says a candidacy can move from screening to interviewing, but that requires the background check to be clear. I can't find a path through the spec that produces a clear background check. What needs to happen for this transition to work?" + +The finding's evidence shows which guard is unsatisfiable and why. Use this to frame the question in terms of what's missing, not what's broken. + +### `deadlock` + +A non-terminal state has no achievable exit. The entity can reach this state but can never leave it. + +**Ask what happens when things stall.** "If a candidacy reaches the screening state and the background check never completes, the candidacy is stuck. What should happen in that situation? Is there a timeout? Can someone manually override it?" + +If the finding includes cycle evidence (states that loop without reaching terminal), frame it differently: "The spec allows a job to bounce between retrying and waiting indefinitely without ever completing or failing. Is there a maximum number of retries, or a timeout that breaks the cycle?" + +### `conflict` + +Two rules with different triggers can both fire in the same state and would set the same field to different values. The outcome is ambiguous. + +**Ask about priority.** "If a membership is active and both the expiry timer fires and an admin extends it at the same moment, which should win? Should the extension prevent the expiry, or should the expiry take priority?" + +This is distinct from actor choice (where one actor picks between alternatives). Conflicts arise from independent triggers that the spec doesn't order. + +### `invariant_risk` + +A rule's `ensures` clause could produce a state that violates a declared invariant. The `requires` clause may not prevent it. + +**Ask whether to guard or revise.** "The spec says at most one candidate per role can be hired, but the hiring rule doesn't prevent a second hire if the role hasn't been marked as filled. Should we add a guard that checks the role is still open, or is the invariant too strict?" + +The finding's evidence shows the mechanism — how the ensures clause is inconsistent with the invariant. Use this to suggest a specific fix rather than asking an open-ended question. + +## Choosing which finding to present + +When `analyse` returns multiple findings, pick the most relevant one. Apply these criteria in order: + +1. If a finding chains into another (a `dead_transition` caused by a `missing_producer` caused by an `unreachable_trigger`), present the root cause first — even if it's in a different entity. Frame it in terms of its effect on the entity the user is working on. +2. If the user is working on a specific entity, pick a finding that affects that entity. +3. If the user just added a rule, pick a finding related to that rule's data flow. +4. If the user asked about completeness, pick the highest-impact finding first — deadlocks before broken data flow chains, broken chains before unreachable triggers. + +A `deadlock` or `invariant_risk` finding indicates the spec may be structurally unsound. Surface these before continuing to build on the affected entity — adding more rules to a deadlocked lifecycle compounds the problem. Other finding types (`missing_producer`, `unreachable_trigger`, `dead_transition`, `conflict`) are gaps worth resolving but don't necessarily block further work. + +Present one finding at a time. Let the user resolve it before surfacing the next. If `analyse` returns more than five or six findings, present the most impactful two or three individually, then summarise the rest by category: "There are also three unreachable triggers and two missing producers — would you like to work through those, or focus on something else?" diff --git a/references/assessing-specs.md b/references/assessing-specs.md new file mode 100644 index 0000000..78eb5a1 --- /dev/null +++ b/references/assessing-specs.md @@ -0,0 +1,66 @@ +# Assessing specs + +When working with an Allium spec, assess its maturity before deciding what to do next. Spec maturity isn't uniform — a well-developed entity with full rules and surfaces can sit alongside a newly sketched entity with just a transition graph, in the same file. + +## Spec-level assessment + +Read the spec and note which constructs are present: + +| What's present | What it tells you | +|---|---| +| Entities with fields, no transition graphs | Domain concepts identified but lifecycles not yet explored | +| Transition graphs on entities | Lifecycles sketched — the user knows the states and intended flows | +| Rules witnessing transitions | Behaviour specified — triggers, guards and outcomes defined | +| Surfaces with exposes and provides | Boundaries defined — who sees what and can do what | +| Actors with identified_by | Roles identified and formalised | +| Invariants | Cross-cutting properties asserted | +| Open questions | Known unknowns documented | +| Deferred specifications | Complexity acknowledged and scoped for later | + +A spec with entities and transition graphs but no rules is coarse. The right next step is filling in rules ("what triggers this transition?"). A spec with rules but no surfaces has behaviour without boundaries. The right next step is asking about actors and what they see. + +## Per-entity assessment + +Each entity can be at a different level of development. Check: + +- **Has a transition graph?** The lifecycle is sketched. +- **Has witnessing rules for all transitions?** Every declared edge has a rule that produces it. +- **Has surfaces providing all external triggers?** Every rule that listens for an external stimulus has a surface that provides it. +- **Has all `requires` clauses traceable to a producer?** Every precondition can be satisfied by a prior rule or surface in the spec. + +An entity that has all four is structurally complete. It may still lack exception transitions, temporal triggers or failure paths — those are explored through obstacle elicitation, not structural assessment. An entity missing the fourth criterion has gaps the user may not be aware of. + +## When to use `check` vs `analyse` + +If the Allium CLI is available: + +The two commands produce different kinds of output. `check` produces **diagnostics**: line-level structural warnings (syntax issues, unreachable values, unused fields). `analyse` produces **findings**: process-level results with typed evidence (missing producers, dead transitions, deadlocks). Both are returned as JSON. See [actioning findings](actioning-findings.md) for how to translate findings into domain questions. + +Run `allium check` after every edit. It validates what's written — syntax, field resolution, transition graph structure, witnessing rules. It's fast and useful at every stage, including coarse specs. + +Run `allium analyse` at natural checkpoints: when the user asks about completeness, when at least one entity has both witnessing rules and surfaces defined, when transitioning from one entity to another, or when stepping back to review. It reasons about what's missing — data flow gaps, unreachable transitions, deadlocks. + +If the CLI is not available, fall back to the language reference for validation. The first time this fallback happens, note: "I'll validate against the language reference instead. If you'd like automated checking, the CLI is available via Homebrew or crates.io — see the README for details." + +If `allium analyse` fails with an unrecognised command error, the installed CLI predates the `analyse` feature. Fall back to conversational analysis (trace data flow and reachability by reading the spec) and don't retry `analyse` in the same session. Mention that updating the CLI would enable automated process-level checking. + +## Adjusting your approach + +Work at the right level for each part of the spec: + +- A coarse entity calls for walkthrough questions: "What triggers this transition? Who's involved at this step?" +- A detailed entity with rules calls for gap analysis: "This rule requires a value that nothing in the spec produces. Where does it come from?" +- A well-specified entity calls for validation: "Here's the lifecycle as I understand it — does this match your mental model?" + +Don't apply detailed analysis to a coarse spec (it produces noise about things that haven't been written yet). Don't ask exploratory questions about an entity that already has rules and surfaces covering all declared transitions, including exception paths (the user has already answered them). + +## Communicating with stakeholders + +Users are not expected to read or write Allium syntax. When discussing the spec with stakeholders, translate constructs into domain language: + +- Instead of showing a transition graph, describe the lifecycle: "A candidacy starts as applied, moves through screening and interviewing, and ends as either hired or rejected." +- Instead of showing a rule, describe the behaviour: "When the recruiter advances a candidate, the system checks that the background check is clear before moving to interviews." +- Instead of showing a surface, describe the interaction: "The recruiter sees a queue of candidates awaiting screening, with their name and the role they applied for." +- Instead of listing `open_questions`, pose them directly: "One thing we haven't resolved — what happens to in-progress candidacies when a role is closed?" + +When validating the spec, describe what it says and ask whether that matches expectations. Don't present the spec itself for review unless the user has shown they're comfortable reading it. The spec is the artefact; the conversation is in domain terms. diff --git a/scripts/generate-multi-editor.mjs b/scripts/generate-multi-editor.mjs index 96d9b24..7386b47 100644 --- a/scripts/generate-multi-editor.mjs +++ b/scripts/generate-multi-editor.mjs @@ -1,8 +1,12 @@ #!/usr/bin/env node /** - * Generates skill and VS Code agent variants from the canonical - * Claude Code agent definitions in agents/. + * Generates VS Code agent variants from the canonical Claude Code + * agent definitions in agents/. + * + * Skills (skills/tend/SKILL.md, skills/weed/SKILL.md) are hand-maintained + * independently of agents. Skills are interactive; agents are autonomous. + * The two diverge intentionally in tone and behaviour. * * Usage: node scripts/generate-multi-editor.mjs [--check] * @@ -57,6 +61,14 @@ function adaptBody(body) { /`\$\{CLAUDE_PLUGIN_ROOT\}\/references\/language-reference\.md`/g, "[language reference](../../references/language-reference.md)" ) + .replace( + /`\$\{CLAUDE_PLUGIN_ROOT\}\/references\/assessing-specs\.md`/g, + "[assessing specs](../../references/assessing-specs.md)" + ) + .replace( + /`\$\{CLAUDE_PLUGIN_ROOT\}\/references\/actioning-findings\.md`/g, + "[actioning findings](../../references/actioning-findings.md)" + ) // Replace Claude Code tool names with generic instructions .replace(/\(use `Glob` to find them if not specified\)/g, "(search the project to find them if not specified)") // Replace "agent" cross-references with "skill" for portable contexts @@ -65,57 +77,6 @@ function adaptBody(body) { ); } -// --------------------------------------------------------------------------- -// Skill generation -// --------------------------------------------------------------------------- - -const SKILL_EXTRA_TEND = ` -## Context management - -Spec evolution can require many edit-validate cycles. If you anticipate a long iterative session, or if the context is growing large, advise the user to open a fresh chat specifically for tending the spec. Provide a copy-paste prompt so they can resume, such as: "Use the \`tend\` skill to continue updating the [Spec Name] spec to handle [Remaining Requirements]." - -## Verification - -After every edit to a \`.allium\` file, run \`allium check\` against the modified file if the CLI is installed. Fix any reported issues before presenting the result. If the CLI is not available, verify against the [language reference](../../references/language-reference.md). -`; - -const SKILL_EXTRA_WEED = ` -## Context management - -Spec alignment checks can require many edit-validate cycles. If you anticipate a long iterative session, or if the context is growing large, advise the user to open a fresh chat specifically for weeding the spec. Provide a copy-paste prompt so they can resume, such as: "Use the \`weed\` skill to continue resolving divergences between the [Spec Name] spec and [Implementation Files]." - -## Verification - -After every edit to a \`.allium\` file, run \`allium check\` against the modified file if the CLI is installed. Fix any reported issues before presenting the result. If the CLI is not available, verify against the [language reference](../../references/language-reference.md). -`; - -const SKILL_EXTRAS = { tend: SKILL_EXTRA_TEND, weed: SKILL_EXTRA_WEED }; - -function generateSkill(name) { - const src = read(`agents/${name}.md`); - const { frontmatter, body } = parseFrontmatter(src); - const adapted = adaptBody(body); - - // Insert extra sections before the final ## Output or ## Output format section - const extra = SKILL_EXTRAS[name]; - let finalBody = adapted; - const outputMatch = adapted.match(/\n(## Output\b[^\n]*)/); - if (outputMatch) { - const idx = adapted.indexOf(outputMatch[0]); - finalBody = adapted.slice(0, idx) + extra + adapted.slice(idx); - } else { - finalBody = adapted + extra; - } - - const skill = `--- -name: ${name} -description: "${frontmatter.description}" ---- -${finalBody}`; - - return skill; -} - // --------------------------------------------------------------------------- // VS Code agent generation // --------------------------------------------------------------------------- @@ -144,10 +105,6 @@ ${adapted}`; let dirty = false; for (const name of AGENTS) { - if (write(`skills/${name}/SKILL.md`, generateSkill(name))) { - console.log(`${CHECK ? "out of date" : "wrote"}: skills/${name}/SKILL.md`); - dirty = true; - } if (write(`.github/agents/${name}.agent.md`, generateVscodeAgent(name))) { console.log( `${CHECK ? "out of date" : "wrote"}: .github/agents/${name}.agent.md` diff --git a/skills/distill/SKILL.md b/skills/distill/SKILL.md index 0f5bb60..6b74368 100644 --- a/skills/distill/SKILL.md +++ b/skills/distill/SKILL.md @@ -358,6 +358,14 @@ entity Invitation { Look for enum definitions, status or state columns, constants like `STATUS_PENDING = 'pending'`, and state machine libraries (e.g. `transitions`, `django-fsm`). +### Step 2.5: Identify candidate processes + +After extracting entities and their states, scan for state machines that suggest end-to-end processes. Trace where each status value gets set across the codebase (where does `status = 'interviewing'` happen?). Present candidate processes to the user for validation: "I see an entity with states `applied → screening → interviewing → deciding → hired/rejected`. Is this a process the system is meant to support?" + +Also trace cross-entity data flow. If a rule on entity A requires a field from entity B, follow the chain: where does entity B's field get set, and what triggers that? Present the chain: "The hiring decision requires `background_check_status = clear`. This gets set by a webhook handler at `/api/webhooks/background-check`. Does this chain look right?" + +Generate transition graphs from the extracted rules. The graph is a derived view of the code. If it has gaps (states with no outbound transitions that aren't terminal), flag them as potential issues. + ### Step 3: Extract transitions Find where status changes happen: @@ -516,6 +524,17 @@ external entity Candidate { When repeated interface patterns appear across service boundaries (e.g. the same serialisation contract expected by multiple consumers), these suggest `contract` declarations for reuse rather than duplicated inline obligation blocks. +### Step 5.5: Identify actors from auth patterns + +After extracting surfaces from API endpoints, identify actors by examining authentication and authorisation patterns. Different auth contexts suggest different actors: + +- API key authentication → system actor (external service) +- Role-based access (`user.role == 'admin'`) → distinct actor per role +- Scoped access (`user.org_id == resource.org_id`) → actor with `within` scoping +- Unauthenticated endpoints → public-facing actor or system webhook + +Ask the user to confirm: "This endpoint requires admin role authentication. Is 'Admin' a distinct actor, or is this the same person as the regular user with elevated permissions?" + ### Step 6: Abstract away implementation Now make a pass through your extracted spec and remove implementation details. @@ -564,100 +583,38 @@ Common findings: - "Actually we wanted X but never built it" - "These two code paths should be the same but aren't" -## Recognising library spec candidates +Before running further checks, read [assessing specs](../../references/assessing-specs.md) to gauge the distilled spec's maturity. This tells you whether the spec is ready for process-level analysis or still needs structural work. -During distillation, stay alert for code that implements **generic integration patterns** rather than application-specific logic. These belong in library specs, not your main specification. +If the Allium CLI is available, run `allium check` on the distilled spec to catch structural issues, then `allium analyse` to identify process-level gaps. Findings from `analyse` can drive validation questions: "The distilled spec has a rule that requires `background_check.status = clear` but no surface captures background check results. Is this handled by a part of the codebase we haven't looked at?" Consult [actioning findings](../../references/actioning-findings.md) for how to translate findings into domain questions. + +## Recognising library spec candidates -The same principle applies in elicitation. When a stakeholder describes "we use Google for login" or "payments go through Stripe", pause and consider whether this is a library spec. +During distillation, stay alert for code that implements generic integration patterns rather than application-specific logic. These belong in library specs. See [recognising library spec opportunities](../elicit/references/library-spec-signals.md) for the full decision framework (questions to ask, how to handle, common extractions). ### Signals in the code +Look for these patterns that suggest a library spec: + **Third-party integration modules:** ```python -# Finding code like this suggests a library spec class StripeWebhookHandler: def handle_invoice_paid(self, event): ... - def handle_subscription_cancelled(self, event): - ... class GoogleOAuthProvider: def exchange_code(self, code): ... - def refresh_token(self, refresh_token): - ... ``` -**Generic patterns with specific providers:** -- OAuth flows (Google, Microsoft, GitHub) -- Payment processing (Stripe, PayPal) -- Email delivery (SendGrid, Postmark, SES) -- Calendar sync (Google Calendar, Outlook) -- ATS integrations (Greenhouse, Lever) -- File storage (S3, GCS) - **Configuration-driven integrations:** ```python -# Heavy configuration suggests the integration itself is separable OAUTH_CONFIG = { 'google': {'client_id': ..., 'scopes': ...}, 'microsoft': {'client_id': ..., 'scopes': ...}, } ``` -### Questions to ask - -1. **"Is this integration logic, or application logic?"** - Integration: how to talk to Stripe. - Application: what to do when payment succeeds. - -2. **"Would another application integrate the same way?"** - If yes, library spec candidate. If no, probably application-specific. - -3. **"Does the code separate integration from application concerns?"** - If cleanly separated, easy to extract to library spec. If tangled, might need refactoring first (but the spec should still separate them). - -### How to handle - -**Option 1: Reference an existing library spec** - -If a standard library spec exists for this integration: -``` -use "github.com/allium-specs/stripe-billing/abc123" as stripe - --- Application responds to Stripe events -rule ActivateSubscription { - when: stripe/PaymentSucceeded(invoice) - ... -} -``` - -**Option 2: Create a separate library spec** - -If no standard spec exists but the integration is generic: -``` --- greenhouse-ats.allium (library spec) --- Specifies: Greenhouse webhook events, candidate sync, etc. - --- interview-scheduling.allium (application spec) -use "./greenhouse-ats.allium" as greenhouse - -rule ImportCandidate { - when: greenhouse/CandidateCreated(data) - ensures: Candidacy.created(...) -} -``` - -**Option 3: Abstract and move on** - -If the integration is minor, just abstract it: -``` --- Don't specify Slack details, just: -ensures: Notification.created( - to: interviewers, - channel: slack -) -``` +**Generic patterns with specific providers:** OAuth flows, payment processing, email delivery, calendar sync, ATS integrations, file storage. ### Red flags: integration logic in your spec @@ -667,11 +624,8 @@ If you find yourself writing spec like this, stop and reconsider: -- TOO DETAILED - this is Stripe's domain, not yours rule ProcessStripeWebhook { when: WebhookReceived(payload, signature) - requires: verify_stripe_signature(payload, signature) - let event = parse_stripe_event(payload) - if event.type = "invoice.paid": ... } @@ -686,18 +640,7 @@ rule PaymentReceived { } ``` -### Common library spec extractions - -| Code pattern found | Library spec candidate | -|-------------------|----------------------| -| OAuth token exchange, refresh, session management | `oauth2.allium` | -| Stripe webhook handling, subscription lifecycle | `stripe-billing.allium` | -| Email sending with templates, bounce handling | `email-delivery.allium` | -| Calendar event sync, availability checking | `calendar-integration.allium` | -| ATS candidate import, status sync | `greenhouse-ats.allium`, `lever-ats.allium` | -| File upload, virus scanning, thumbnail generation | `file-storage.allium` | - -See patterns.md Pattern 8 for detailed examples of integrating library specs. +See [patterns.md Pattern 8](../../references/patterns.md) for detailed examples of integrating library specs. ## Common distillation challenges @@ -863,9 +806,11 @@ If any remain, ask: "Would a stakeholder include this in a requirements doc?" ## After distillation -The extracted spec is a starting point. For targeted changes as requirements evolve, use the `tend` skill. For checking ongoing alignment between the spec and implementation, use the `weed` skill. +The extracted spec is a starting point. If distillation reveals gaps that need structured discovery (unclear requirements, complex entity relationships, unstated business rules), use the `elicit` skill to fill them. For targeted changes as requirements evolve, use the `tend` skill. For checking ongoing alignment between the spec and implementation, use the `weed` skill. ## References - [Language reference](../../references/language-reference.md), full Allium syntax +- [Assessing specs](../../references/assessing-specs.md), how to assess spec maturity and choose the right level of analysis +- [Actioning findings](../../references/actioning-findings.md), translating checker findings into domain questions - [Worked examples](./references/worked-examples.md), complete code-to-spec examples in Python, TypeScript and Java diff --git a/skills/elicit/SKILL.md b/skills/elicit/SKILL.md index e8d6c60..d619af9 100644 --- a/skills/elicit/SKILL.md +++ b/skills/elicit/SKILL.md @@ -7,8 +7,6 @@ description: "Run a structured discovery session to build an Allium specificatio This skill guides you through building Allium specifications by conversation. The goal is to surface ambiguities and produce a specification that captures what the software does without prescribing implementation. -The same principles apply to distillation. Whether you are hearing a stakeholder describe a feature or reading code that implements it, the challenge is identical: finding the right level of abstraction. - ## Scoping the specification Before diving into details, establish what you are specifying. Not everything needs to be in one spec. @@ -42,7 +40,7 @@ The version marker (`-- allium: N`) must be the first line of every `.allium` fi ## Finding the right level of abstraction -The hardest part of specification is choosing what to include and what to leave out. Too concrete and you are specifying implementation. Too abstract and you are not saying anything useful. +Too concrete and you are specifying implementation. Too abstract and you are not saying anything useful. ### The "Why" test @@ -148,8 +146,34 @@ The spec says there is a matching algorithm, that it considers these inputs and This is the right level when the algorithm is complex and evolving, when product owners care about inputs and outputs rather than internals, and when a separate detailed spec could cover it if needed. +## Reading the initial prompt + +Before choosing an approach, assess what the user is bringing. The initial prompt tells you where to start. + +**The user describes a process.** "We have a hiring pipeline where candidates apply, get screened, interview, then we decide." They're thinking at the process level. Start with process discovery — let them describe the flow, then help organise it into spec constructs. Consult [process discovery](./references/process-discovery.md). + +**The user names entities.** "I need to spec an Order entity with states and transitions." They're already thinking at the construct level. Skip process discovery and move to scope definition, then fill in detail. Consult [detail elicitation](./references/detail-elicitation.md) when working through rules and surfaces. + +**The user has a vague idea.** "We need to build something for managing customer support." They need help shaping the idea before specifying it. Start with process discovery using open questions: "Tell me about what happens when a customer reaches out for help." Consult [process discovery](./references/process-discovery.md). + +**The user has existing code.** "We have a payments service and I want to capture what it does." This is distillation with elicitation. Point them to the `distill` skill, or combine both: distill the structure from code, elicit the intent from the stakeholder. + +**The user has an existing spec.** Read the spec first. Use [assessing specs](../../references/assessing-specs.md) to determine what level of development each entity is at. Skip phases the spec has already covered — don't re-ask scope questions for a spec that already has scope comments, or re-discover processes for a spec that already has transition graphs. Start at the level each entity needs: detail elicitation for entities with lifecycles but no rules, obstacle elicitation for entities with rules but no failure paths. + ## Elicitation methodology +### Phase 0: Process discovery + +**Goal:** Understand the processes the system supports before identifying constructs. + +Not every session needs this phase. If the user arrives with entities and lifecycles already in mind, skip to Phase 1. If they arrive with a process description or a vague idea, start here. + +Let the user describe the system in their own words before imposing Allium structure. Capture the process, the actors, the outcomes, then organise into constructs. See [process discovery](./references/process-discovery.md) for specific techniques. + +**Outputs:** Process names and outcomes. Rough sequence of steps. Actors identified. Enough to write a coarse spec (entities with transition graphs and open questions). + +**Watch for:** The urge to jump to entity definitions too early. Stay at the process level until the flow is clear. + ### Phase 1: Scope definition **Goal:** Understand what we are specifying and where the boundaries are. @@ -159,61 +183,62 @@ Questions to ask: 1. "What is this system fundamentally about? In one sentence?" 2. "Where does this system start and end? What's in scope vs out?" 3. "Who are the users? Are there different roles?" -4. "What are the main things being managed, the nouns?" -5. "Are there existing systems this integrates with? What do they handle?" +4. "Are there existing systems this integrates with? What do they handle?" -**Outputs:** List of actors and roles. List of core entities. Boundary decisions (what is external). One-sentence description. +If Phase 0 was skipped, also ask: "What are the key processes this system supports? What does success look like for each?" This anchors entity identification to processes rather than enumerating nouns in isolation. The techniques in [process discovery](./references/process-discovery.md) apply here too — use past tense recall and outcome-first questioning if the user struggles to articulate the process. -**Watch for:** Scope creep ("and it also does X, Y, Z", gently refocus). Assumed knowledge ("obviously it handles auth", make explicit). +**Outputs:** List of actors and roles. List of core entities (derived from the process if Phase 0 ran). Boundary decisions (what is external). One-sentence description. + +**Watch for:** Scope creep ("and it also does X, Y, Z", gently refocus). Assumed knowledge ("obviously it handles auth", make explicit). Descriptions that suggest a [library spec](./references/library-spec-signals.md) rather than application-specific logic (e.g. OAuth, payment processing, email delivery). ### Phase 2: Happy path flow **Goal:** Trace the main journey from start to finish. -Questions to ask: +If Phase 0 produced a walking skeleton (see [process discovery](./references/process-discovery.md)), use it as the starting point. Otherwise, ask: "If we could only build one path through this process, what would it be?" Write the skeleton as a coarse spec and describe it back to the user in domain terms (see [assessing specs](../../references/assessing-specs.md#communicating-with-stakeholders)). -1. "Walk me through a typical [X] from start to finish" -2. "What happens first? Then what?" -3. "What triggers this? A user action? Time passing? Something else?" -4. "What changes when that happens? What state is different?" -5. "Who needs to know when this happens? How?" - -**Technique:** Follow one entity through its lifecycle. +Then flesh out: "What triggers each step? Who's involved? What changes?" Follow one entity through its lifecycle, capturing state transitions, actors and triggers. ``` Candidacy: - pending_scheduling -> scheduling_in_progress -> scheduled -> - interview_complete -> feedback_collected -> decided + applied -> screening -> interviewing -> deciding -> hired | rejected ``` -**Outputs:** State machines for key entities. Main triggers and their outcomes. Communication touchpoints. +**Outputs:** Transition graphs for key entities. Main triggers and their outcomes. Actor assignments at each step. **Watch for:** Jumping to edge cases too early ("but what if...", note it and stay on happy path). Implementation details creeping in ("the API endpoint...", redirect to outcomes). -### Phase 3: Edge cases and errors +After writing spec constructs, run `allium check` if the CLI is available. Fix structural issues before continuing — don't wait until Phase 4 to validate. -**Goal:** Discover what can go wrong and how the system handles it. +After establishing the skeleton, consult [detail elicitation](./references/detail-elicitation.md) for techniques on filling in rules, surfaces, fields and data dependencies. -Questions to ask: +### Phase 3: Edge cases and failure paths + +**Goal:** Discover what can go wrong and how the system handles it. -1. "What if [actor] doesn't respond?" -2. "What if [condition] isn't met when they try?" -3. "What if this happens twice? Or in the wrong order?" -4. "How long should we wait before [action]?" -5. "When should a human be alerted to intervene?" -6. "What if [external system] is unavailable?" +Consult [obstacle elicitation](./references/obstacle-elicitation.md) for techniques. The key approaches: -**Technique:** For each rule, ask "what are all the ways requires could fail?" +- Use the pre-mortem: "Imagine this system has been built and it's failing. What went wrong?" +- At each step: "What if nobody does anything here? After a day? A week?" +- At each handoff: "Who takes over? How do they know it's their turn? What do they need to see?" +- At each transition: "What if the preconditions aren't met? Can this be reversed?" +- For external dependencies: "How does this information enter the system? What if the external service is unavailable?" -**Outputs:** Timeout and deadline rules. Retry and escalation logic. Error states. Recovery paths. +**Outputs:** Exception transitions. Temporal triggers with `requires` guards. Escalation paths. Terminal error states. Invariants. **Watch for:** Infinite loops ("then it retries, then retries again...", need terminal states). Missing escalation, because eventually a human needs to know. When stakeholders state system-wide properties ("balance never goes negative", "no two interviews overlap for the same candidate"), these are candidates for top-level invariants. Capture them as `invariant Name { expression }` declarations. +After writing rules and exception transitions, run `allium check` if the CLI is available. Fix issues before moving to refinement. + ### Phase 4: Refinement -**Goal:** Clean up the specification and identify gaps. +**Goal:** Verify and complete the specification. + +Consult [assumption checking](./references/assumption-checking.md) for techniques. Describe what the spec says in domain terms and test it against the user's mental model. Trace concrete scenarios through the spec. Test ordering assumptions. Verify actor assignments. + +If the Allium CLI is available, run `allium check` and use diagnostics to identify structural gaps. If `allium analyse` is available and the spec has rules and surfaces, run it and use findings to surface process-level gaps. Consult [actioning findings](../../references/actioning-findings.md) for how to translate findings into domain questions. Questions to ask: @@ -222,7 +247,7 @@ Questions to ask: 3. "This rule references [X], do we need to define that, or is it external?" 4. "Is this detail essential here, or should it live in a detailed spec?" -**Technique:** Read back the spec and ask "does this match your mental model?" +**Technique:** Take a concrete scenario and trace it through the spec. "Let's say Alice applies for the Senior Engineer role. Walk me through what happens to her candidacy." **Outputs:** Complete entity definitions. Open questions documented. Deferred specifications identified. External boundaries confirmed. @@ -269,18 +294,16 @@ Better to record an open question than assume. open question "When candidate declines, do they return to pool or exit?" ``` -### Use concrete examples - -Abstract discussions get stuck. Ground them. - -"Let's say Alice is a candidate for the Senior Engineer role. She's been sent an invitation with three slots. Walk me through what happens when she clicks on Tuesday 2pm." - ### Iterate willingly It is normal to revise earlier decisions. "Earlier we said all admins see all notifications. But now you're describing role-specific dashboards. Should we revisit that?" +### Prioritise depth over breadth + +Fully develop the most important entity first. Leave others coarse with open questions. The user can return to flesh them out in a later session. Trying to develop every entity to the same level in one conversation risks context exhaustion without completing anything. + ### Know when to stop Not everything needs to be specified now. @@ -299,10 +322,6 @@ When someone says "obviously" or "of course", probe. "You said obviously the adm Some people want to cover every edge case immediately. "Let's capture that as an open question and stay on the main flow for now. We'll come back to edge cases." -### The "Technical Solution" trap - -Engineers especially jump to solutions. "I hear you saying we need real-time updates. At the domain level, what does the user need to see and when?" - ### The "Vague Agreement" trap Do not accept "yes" without specifics. "You said yes, candidates can reschedule. How many times? Is there a limit? What happens after that?" @@ -321,17 +340,17 @@ A comment noting that two terms are equivalent is not a resolution. It guarantee ## Elicitation session structure -**Opening (5 min).** Explain Allium briefly: "We're capturing what the software does, not how it's built." Set expectations: "I'll ask lots of questions, some obvious-seeming." Agree on scope for this session. +These timings apply to human-facilitated sessions. In an LLM conversation, use the phase outputs to decide when to advance rather than watching the clock. -**Scope definition (10-15 min).** Identify actors, entities, boundaries. Get the one-sentence description. +**Opening.** Explain Allium briefly: "We're capturing what the software does, not how it's built." Agree on scope for this session. -**Happy path (20-30 min).** Trace main flow start to finish. Capture states, triggers, outcomes. Note communications. +**Scope definition.** Identify actors, entities, boundaries. Get the one-sentence description. -**Edge cases (15-20 min).** Timeouts and deadlines. Failure modes. Escalation paths. +**Happy path.** Trace main flow start to finish. Capture states, triggers, outcomes. -**Wrap-up (5-10 min).** Read back key decisions. List open questions. Identify next session scope if needed. +**Edge cases.** Timeouts and deadlines. Failure modes. Escalation paths. -**After session.** Write up specification draft. Send for review. Note questions for next session. +**Wrap-up.** Read back key decisions. List open questions. Name which entities are still coarse and what they need next. Identify next session scope if needed. ## After elicitation @@ -340,4 +359,10 @@ For targeted changes where you already know what you want, use the `tend` skill. ## References - [Language reference](../../references/language-reference.md), full Allium syntax +- [Assessing specs](../../references/assessing-specs.md), how to assess spec maturity and choose the right level of analysis +- [Actioning findings](../../references/actioning-findings.md), translating checker findings into domain questions +- [Process discovery](./references/process-discovery.md), techniques for when the user hasn't articulated the process yet +- [Detail elicitation](./references/detail-elicitation.md), techniques for filling in rules, surfaces and data dependencies +- [Obstacle elicitation](./references/obstacle-elicitation.md), techniques for exploring failure paths, timeouts and handoffs +- [Assumption checking](./references/assumption-checking.md), techniques for verifying the spec matches the user's mental model - [Recognising library spec opportunities](./references/library-spec-signals.md), signals, questions and decision framework for identifying library specs during elicitation diff --git a/skills/elicit/references/assumption-checking.md b/skills/elicit/references/assumption-checking.md new file mode 100644 index 0000000..f7b029f --- /dev/null +++ b/skills/elicit/references/assumption-checking.md @@ -0,0 +1,53 @@ +# Assumption checking + +Use these techniques when you have a coarse or complete spec and need to verify it matches the user's mental model. Show-back and ordering checks work on coarse specs (transition graphs without rules). Scenario traces require rules and surfaces to be defined. Actor verification works at any stage. + +## Show back what you've heard + +After capturing a process or a set of rules, write the spec, then describe what it says in domain language. Don't present raw Allium syntax — translate constructs into a narrative the stakeholder can validate. + +"Based on what you've described, here's the lifecycle for Candidacy. Applied, then screening, then interviewing, then deciding, and from there either hired or rejected. Screening can also lead directly to rejection. Is this right?" + +Let the user correct, refine and extend. Common responses: +- "Yes, but you're missing X" → add the missing transition or entity +- "Not quite — Y happens before Z" → reorder the transitions +- "What about W?" → the user remembered something they hadn't mentioned + +## Test ordering assumptions + +When the transition graph is taking shape, test whether the declared ordering is correct. + +"Could these steps happen in a different order? What if the background check completed before screening was finished — would that change anything?" + +This surfaces: +- **False ordering constraints** — steps the user assumed were sequential but could be parallel +- **Missing concurrency** — two things that can happen simultaneously but the graph forces them into sequence +- **Hidden dependencies** — steps that truly must follow a specific order, revealing data dependencies + +If the user says "those could happen in either order", the transition graph may need restructuring. If they say "no, X absolutely must happen before Y", ask why — the answer is usually a data dependency that should be a `requires` clause. + +## Verify actor assignments + +After identifying actors and their surfaces, check the assignments. + +"I have the recruiter screening candidates and the hiring manager making the final decision. Is it always the hiring manager? Could a recruiter make the decision for junior roles?" + +Actor boundaries are often assumed rather than decided. Testing them reveals: +- **Role overlap** — two actors who can do the same thing, needing explicit modelling +- **Delegation** — one actor acting on behalf of another +- **Conditional assignment** — different actors for different entity states or types + +## Check completeness at transition points + +When moving from one entity to the next, or from happy path to edge cases, pause and check. + +"Before we move on to interviews — looking at the screening flow, is there anything we haven't covered? Any situation that could come up that we haven't accounted for?" + + +## Verify against real scenarios + +Take a concrete scenario and trace it through the spec. + +"Let's say Alice applies for the Senior Engineer role on Monday. Walk me through what happens to her candidacy using the spec we've written. Does each step match what you'd expect?" + +If the spec produces a different outcome than the user expects, you've found a gap. The gap might be a missing rule, a wrong guard, or an unstated assumption. diff --git a/skills/elicit/references/detail-elicitation.md b/skills/elicit/references/detail-elicitation.md new file mode 100644 index 0000000..4f39c1b --- /dev/null +++ b/skills/elicit/references/detail-elicitation.md @@ -0,0 +1,64 @@ +# Detail elicitation + +Use these techniques when an entity has a lifecycle (transition graph) but needs rules, surfaces, fields and data dependencies filled in. The shape is known; the detail isn't. + +## Start from examples, not abstractions + +Before writing rules, collect concrete scenarios. Ask for at least two specific cases. + +"Give me a case where someone was hired. Now give me one where they were rejected at screening. What was different?" + +The differences between the cases reveal the `requires` guards. The commonalities reveal the `ensures` outcomes. Rules emerge from comparing scenarios rather than being defined in the abstract. + +When a rule is ambiguous or the user can't articulate the conditions, ask for more examples. "Can you give me a case where this went a different way?" Each new example narrows the rule. + +## Actor walkthrough + +Pick a specific human actor and walk through their perspective in first person. For system actors (external APIs, background services), use third person instead: "The payment gateway receives a charge request. What does it need? What does it return?" + +"You're the recruiter. You open the system on Monday morning. What's in front of you?" The answer is surface `exposes` — the data the actor sees. + +"What can you do from here?" The answer is surface `provides` — the actions available. + +"When would this action not be available?" The answer is the `when` guard on the provides clause. + +"After you've done that, what happens next? Who takes over?" The answer reveals the handoff to the next actor and the next surface. + +## Trace data flow backward + +When you encounter a decision point or a rule with preconditions, work backward from the requirement. + +"The hiring manager needs to see interview feedback before deciding. Where does that feedback come from? Who provides it? At what point in the process?" + +Each "where does this come from?" reveals a data dependency. Follow the chain until you reach a surface where an actor enters the data or an external system provides it. If the chain ends without a source, you've found a gap — a `missing_producer` in checker terms. + +## Ground abstract descriptions + +When a user describes something abstractly ("the system shows relevant information"), ground it with a concrete question. + +"If you were looking at the screen right now, what would you see? What specific information?" This surfaces the exact fields that need to be in `exposes`. + +"Can you sketch what that screen looks like, in words? What's at the top? What's the main content?" + +## Prompt for external system boundaries + +When a step depends on data from outside the system, ask how it enters. + +"You mentioned the background check results come back. How does that happen? Does someone enter them manually, or does an external service send them automatically?" + +The answer determines whether you need a surface facing a human actor or a contract integration point facing a system. Many process gaps involve external systems (payment processors, identity verification, notification services) where the spec needs an entry point but the user assumes the data just appears. + +## What to produce + +If an entity's transition graph has grown beyond eight or so states, consider whether the lifecycle should be split. A booking entity that spans request, rental, inspection and deposit settlement might be clearer as separate entities linked by relationships. Ask the user: "This entity is covering a lot of ground. Would it be clearer to separate the [X] phase from the [Y] phase into its own entity?" + +At the end of detail elicitation for an entity, you should have: + +- **Fields** with types, including state-dependent fields (`when` clauses) +- **Rules** witnessing every transition, with `requires` and `ensures` +- **Surfaces** for each actor that interacts with this entity, with `exposes` and `provides` +- **Relationships** connecting this entity to related entities +- **Config** for any variable values (durations, thresholds, limits) +- **Open questions** for anything unresolved + +Write the spec, then describe what it says in domain language and verify it with the user before moving on (see [assumption checking](assumption-checking.md)). diff --git a/skills/elicit/references/obstacle-elicitation.md b/skills/elicit/references/obstacle-elicitation.md new file mode 100644 index 0000000..68f0b4a --- /dev/null +++ b/skills/elicit/references/obstacle-elicitation.md @@ -0,0 +1,70 @@ +# Obstacle elicitation + +Use these techniques when exploring failure paths, timeouts, exception transitions and actor handoffs. + +## Use the pre-mortem + +Instead of the abstract "what can go wrong?", use a concrete framing. + +"Imagine it's six months from now. This system has been built and deployed. Something has gone wrong and people are frustrated. What happened?" + +People are better at imagining concrete failure than listing abstract risks. The pre-mortem produces vivid, specific failure modes rather than generic edge cases. Each failure mode maps to an exception transition, a timeout rule, or an invariant. + +Follow up each failure with: "How should the system have prevented that? Or handled it?" The answer is the rule or guard that's missing from the spec. + +## Ask what happens when nothing happens + +At every step where a human actor needs to act, ask: "What if nobody does anything? After a day? After a week?" + +The answer is one of: +- "Nothing, it just waits." This is a design decision worth making explicit. Document it as the intended behaviour, possibly with an open question about whether it's acceptable. +- "After X time, Y happens." This is a temporal trigger: `when: entity.timestamp_field + config.duration <= now` with `requires: entity.status = expected_state` to prevent re-firing. Do not use `becomes` for time-delayed behaviour — `becomes` fires immediately when an entity enters a state, not after a delay. +- "Someone should be notified." This surfaces a notification or escalation path. + +Most specs underspecify inaction. The happy path assumes everyone acts promptly. Real systems have stale candidacies, expired invitations and abandoned carts. These need rules. + +## Explore handoffs between actors + +At every state transition, ask: "Who takes over at this point? How do they know it's their turn? What do they need to see?" + +The answers reveal: +- **Actor transitions** — which actor is responsible for the next step +- **Notification needs** — how the next actor learns they need to act +- **Information requirements** — what the next actor's surface must expose +- **Related surface links** — how surfaces connect to each other + +Handoffs are where processes break in practice. The outgoing actor assumes the incoming actor knows what happened. The incoming actor assumes they'll be told. The spec needs to make the handoff explicit: what triggers the notification, what information it carries, and what the next actor sees when they arrive. + +## Enumerate alternatives at each step + +For each step in the happy path, systematically ask: "What else could happen here?" Keep asking until the user can't think of anything more. This is the discipline that prevents gaps: stories and informal descriptions only capture the paths someone happens to think of. Enumeration forces completeness. + +Work through the happy path step by step: +1. State the step: "At this point, the recruiter reviews the application." +2. Ask: "What's the main thing that happens?" (The happy path outcome — already captured.) +3. Ask: "What else could happen?" (First alternative — maybe rejection.) +4. Ask: "Anything else?" (Second alternative — maybe deferral, or requesting more information.) +5. Keep asking until exhausted. +6. For each alternative: "What happens next if this path is taken?" (Follow the alternative to its terminal state.) + +Each alternative becomes either an exception transition in the graph, an additional rule, or an open question if the user isn't sure. If an alternative branches into its own multi-step flow, capture it as an open question and return to it in a later pass rather than following every branch immediately. + +## Systematically test each transition + +After enumeration, test each transition for robustness: +- "What if the preconditions aren't met? What should happen?" +- "Can this transition be reversed? Can someone undo it?" +- "Is there a time limit on being in this state?" +- "Can this transition happen more than once?" + +For critical entities, test every transition. For less critical entities, focus on the transitions most likely to fail or stall. Critical entities are those central to the system's value proposition, those that handle money or compliance-sensitive data, or those the user mentioned during the pre-mortem. Transitions most likely to stall are those that depend on external actors or systems, those with temporal dependencies, and those where a human must act. + +## What to capture + +Obstacle elicitation produces: +- **Exception transitions** (screening → rejected, interview → cancelled) +- **Temporal triggers** with `requires` guards (invitation expires after 48 hours) +- **Escalation paths** (stuck candidacy → notify recruiter after 5 days) +- **Terminal error states** (background check flagged → candidacy terminated) +- **Invariants** (system-wide properties that must hold: "no candidate can have two active candidacies for the same role") +- **Open questions** for unresolved failure scenarios diff --git a/skills/elicit/references/process-discovery.md b/skills/elicit/references/process-discovery.md new file mode 100644 index 0000000..58c2c0f --- /dev/null +++ b/skills/elicit/references/process-discovery.md @@ -0,0 +1,68 @@ +# Process discovery + +Use these techniques when the user hasn't articulated the process yet, when they're starting from scratch, or when you need to understand the shape of a system before getting into construct-level detail. + +## Let the user talk first + +Before imposing any Allium structure, let the user describe the process in their own words. Don't interrupt for entity types, field names or state transitions. Capture the raw description, then organise it into constructs afterward. If the description becomes unclear or contradictory, ask a brief clarifying question, but don't redirect into Allium constructs yet. + +Prompt with: "Tell me about this system. What does it do?" or "Walk me through the main thing that happens, start to finish." + +## Use past tense + +When the user struggles to articulate a process in the abstract, switch to past tense. Recalling what happened is easier than prescribing what should happen. + +"Tell me about the last time someone was hired at your company" produces richer material than "describe the hiring process." Follow up with "and then what happened?" to walk the timeline. The events become rule triggers, the actors become actors, the decisions become guards. + +## Start from outcomes + +Most people can name what they're trying to achieve before they can describe how they get there. Ask about the destination before asking about the route. + +"What does success look like for this process?" or "When this process finishes well, what's the result?" The answer gives you the terminal states. Then work backward: "What has to happen before that? And before that?" + +If there are multiple outcomes (hired vs rejected, fulfilled vs refunded), capture them all. They define the shape of the transition graph. + +## Find the walking skeleton + +Once you have a rough sense of the process, ask: "If we could only build one path through this, what would it be? The simplest journey from start to finish." + +The answer is the happy path — the coarse spec. Entities with transition graphs showing the main flow. Everything else (exception paths, alternative flows, edge cases) is added incrementally. + +Once you have the skeleton, write it as a coarse Allium spec (entities with transition graphs, actors, open questions) and describe it back to the user in domain language for validation — don't present raw syntax. The skeleton is the transition from free-form discovery to formalisation. + +## Identify actors early + +Ask "who's involved?" early in the conversation. For each actor: "What do they need to do their job?" and "What do they need to see?" + +Each actor's perspective is a partial view of the process. The full process emerges from composing these views. If two actors describe the same step differently, you've found either an ambiguity or a handoff that needs clarifying. + +## Layered decomposition + +For complex processes, work through layers in order. Each layer surfaces a different kind of Allium construct. Ask about each layer before moving to the next. + +1. **Events.** "What are the things that happen in this process?" Capture in past tense ("candidate applied", "background check completed", "offer accepted"). These become entity state transitions and rule triggers. +2. **Commands.** "For each event, what triggered it? A person doing something, or the system reacting?" Commands from people become surface `provides` actions. System reactions become rules with `becomes` or `transitions_to` triggers. +3. **Actors.** "Who issued each command? Which role or system?" Each distinct role or system becomes an actor declaration. +4. **Entities.** "Which thing in the system changed when this event happened?" Group events by the entity they affect. Each group becomes an entity with a lifecycle. +5. **Policies.** "Are there any automatic reactions — whenever X happens, Y should follow?" These become rules with chained triggers or `becomes` triggers. +6. **Information needs.** "At each decision point, what did the actor need to see to make the decision?" These become surface `exposes` and reveal data dependencies between entities. +7. **Unknowns.** "Is there anything here you're not sure about, or where different people would give different answers?" These become `open_questions`. + +This layered approach produces a richer set of constructs than open-ended conversation. Use it when the process involves multiple actors, crosses entity boundaries, or when the user gives detailed but unstructured descriptions that need organising. For processes with a single actor and a straightforward lifecycle, the techniques above (outcomes-first, walking skeleton) are sufficient. + +## What to capture + +Whether using layered decomposition or open-ended discovery, note: + +- **Events** (things that happen) → entity state transitions, rule triggers +- **Actors** (people or systems involved) → actor declarations +- **Decisions** (choices someone makes) → rule guards, alternative transitions +- **Information needs** ("they need to see X to decide") → surface exposes, data dependencies +- **Outcomes** (what success and failure look like) → terminal states +- **Unknowns** ("I'm not sure how that works") → open questions + +Before finding the walking skeleton, capture as prose notes or simple bullet lists ("Candidate applied → recruiter screened → interviews happened → decision made"). Don't use Allium syntax yet. After the skeleton is clear, organise into Allium constructs and describe the result back to the user in domain terms for correction. + +## When to stop + +Process discovery is complete when you can write the walking skeleton: you know the main entities, their lifecycle states, the actors involved and the terminal outcomes. You don't need every detail — that's what later phases provide. If you have enough to write a coarse spec with transition graphs and open questions, move on. diff --git a/skills/propagate/SKILL.md b/skills/propagate/SKILL.md index b9166ac..fee7979 100644 --- a/skills/propagate/SKILL.md +++ b/skills/propagate/SKILL.md @@ -18,7 +18,7 @@ Before propagating tests, you need: 3. **Test obligations** — from `allium plan ` (JSON listing every required test) 4. **Domain model** — from `allium model ` (JSON describing entity shapes, constraints, state machines) -If the CLI tools are not available, derive test obligations manually from the spec using the test-generation taxonomy in `references/test-generation.md`. +If the CLI tools are not available, derive test obligations manually from the spec using the test-generation taxonomy in [`references/test-generation.md`](../../references/test-generation.md). ## Modes @@ -61,6 +61,12 @@ Categories from the test-generation taxonomy: - **Transition graph tests** — every declared edge is reachable via its witnessing rule, undeclared transitions are rejected, terminal states have no outbound rules, non-terminal states have at least one exit, exact correspondence between enum values and graph edges - **State-dependent field tests** — presence when in qualifying state, absence when outside, presence obligations on entering the `when` set, absence obligations on leaving, no obligation when moving within or outside, convergent transitions all set the field, guard required to access `when`-qualified fields, derived value `when` inference via input intersection - **Scenario tests** — happy path, edge cases, order independence +- **Data flow chain tests** — exercise full chains from surface capture through rules to downstream rule preconditions. For each chain (surface provides trigger → rule ensures field → downstream rule requires field), generate an integration test that submits data through the surface and verifies it reaches the downstream precondition. +- **Reachability tests** — walk from each initial state (via `.created()`) to each terminal state, following a valid path through the transition graph. Each test exercises a complete lifecycle. +- **Deadlock scenario tests** — for states where `allium analyse` identifies potential deadlocks, generate tests that put the entity in the stuck state and verify whether it can progress. +- **Cross-entity process tests** — for processes spanning multiple entities, generate integration tests that exercise the full process from start to terminal state across all participating entities. + +If `allium analyse` is available, use its findings to prioritise test generation. A `missing_producer` or `dead_transition` finding indicates a gap worth exercising with a test. A `deadlock` finding should generate a test documenting that the entity cannot escape the stuck state. Consult [actioning findings](../../references/actioning-findings.md) for the finding type taxonomy. ## Test output kinds @@ -140,9 +146,10 @@ Before generating cross-module tests: 1. Trace the trigger emission graph from the plan output: which rules emit triggers, and which rules in other specs receive them 2. Check whether the codebase has an existing integration test fixture that wires the participating components (a pipeline test, an end-to-end test helper, a test harness class) 3. If a fixture exists, reuse it. Cross-module tests should compose existing wiring, not rebuild it -4. If no fixture exists, generate only the test skeleton with TODOs marking where component wiring is needed +4. If no fixture exists but the codebase structure is clear enough to understand the wiring (service constructors, dependency injection, event bus configuration), generate the fixture and the test +5. If the wiring is too complex or opaque to generate confidently, generate a test skeleton with TODOs marking where component wiring is needed -Cross-module tests are integration tests by nature. They verify that the spec's trigger chains are faithfully implemented across component boundaries, but the setup cost is high. Prioritise them after single-component tests are passing. +Cross-module tests are integration tests by nature. They verify that the spec's trigger chains are faithfully implemented across component boundaries. Prioritise them after single-component tests are passing. ### Reusing existing tests @@ -161,7 +168,7 @@ Deferred specifications are fully specified in separate files. When the target c ## Process -1. **Read the spec** — understand entities, rules, surfaces, invariants, transition graphs, state-dependent fields, contracts, config, defaults +1. **Read the spec** — understand entities, rules, surfaces, invariants, transition graphs, state-dependent fields, contracts, config, defaults. Read [assessing specs](../../references/assessing-specs.md) to gauge the spec's maturity. A coarse spec (entities and transition graphs but no rules) will produce limited test obligations — mostly structural tests. If the spec is too coarse for meaningful test generation, suggest using the `elicit` or `distill` skill to develop it further before propagating tests. A spec with rules and surfaces enables the full test taxonomy including data flow chain tests and reachability tests. 2. **Read test obligations** — from `allium plan` output or manual derivation 3. **Read domain model** — from `allium model` output or manual derivation 4. **Explore the codebase** — find existing tests, test framework, entity implementations, rule implementations @@ -205,5 +212,5 @@ When generator specs are available, use them to produce valid test data: - Generated tests are a starting point. They may need adjustment for project-specific patterns. - The implementation bridge is LLM-mediated. Complex or unusual codebases may need manual guidance on the mapping. -- Cross-module test generation is not yet supported. Each spec generates tests independently. +- Cross-module tests require understanding component wiring across service boundaries. When the codebase structure is clear, full tests can be generated. When wiring is opaque, tests are generated as skeletons with TODOs for manual setup. - Runtime trace validation and model checking are separate workstreams. diff --git a/skills/tend/SKILL.md b/skills/tend/SKILL.md index 8f99e01..0f65e67 100644 --- a/skills/tend/SKILL.md +++ b/skills/tend/SKILL.md @@ -41,6 +41,20 @@ If the caller describes a feature in implementation terms ("the API returns a 40 **Be minimal.** Add what's needed and nothing more. Don't speculatively add fields, rules or config that weren't asked for. Don't restructure working specs for aesthetic reasons. +## Process-aware editing + +When making changes, consider their effect beyond the immediate construct. + +**Check data flow when adding rules.** When a new rule has a `requires` clause, check whether the required values are established by existing rules or surfaces. If not, say so: "This rule requires `background_check.status = clear`, but nothing in the spec sets this. Should we add a rule or surface for that?" + +**Check transition graph impact.** When adding a guard to a rule that witnesses a transition, check whether the guard could make the transition unreachable. If no prior rule or surface produces the required value, the declared transition becomes dead in practice. Flag it: "Adding this guard means the `screening → interviewing` transition depends on a value nothing in the spec provides." + +**Check surface coverage for external triggers.** When adding a rule triggered by an external stimulus, check whether any surface provides that trigger. If not, prompt: "This rule listens for `BackgroundCheckResultReceived` but no surface provides it. Should we add a surface or contract for the external system?" + +**Consider invariants for cross-entity constraints.** When a rule modifies entities across a relationship (e.g. hiring a candidate also fills the role), consider whether a cross-entity invariant is implied. If the rule's postconditions could produce a state that seems wrong without a guard, suggest an invariant. + +**Assess the spec before editing.** Read [assessing specs](../../references/assessing-specs.md) to understand the spec's maturity. Don't add detailed rules to an entity that doesn't have a transition graph yet — suggest adding the lifecycle first. Don't add surfaces without actors. + ## Boundaries - You work on `.allium` files only. You do not modify implementation code. @@ -76,7 +90,9 @@ Spec evolution can require many edit-validate cycles. If you anticipate a long i ## Verification -After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is installed. Fix any reported issues before presenting the result. If the CLI is not available, verify against the [language reference](../../references/language-reference.md). +After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is installed. Fix any reported issues before presenting the result. If the CLI is not available, verify against the [language reference](../../references/language-reference.md). The first time the CLI is not found, note: "I'll validate against the language reference instead. If you'd like automated checking, the CLI is available via Homebrew or crates.io — see the README for details." + +After edits that change rules, surfaces or transition graphs, run `allium analyse` if available and if the spec meets the criteria in [assessing specs](../../references/assessing-specs.md) (at least one entity has both witnessing rules and surfaces defined). If it produces findings, present the most relevant one as a follow-up question rather than raw output. Consult [actioning findings](../../references/actioning-findings.md) for how to translate findings into domain questions. ## Output diff --git a/skills/weed/SKILL.md b/skills/weed/SKILL.md index 6a1ff36..f7aea5a 100644 --- a/skills/weed/SKILL.md +++ b/skills/weed/SKILL.md @@ -24,15 +24,27 @@ You operate in one of three modes, determined by the caller's request: **Update code.** Modify the implementation to match what the spec says. The code becomes a faithful implementation of specified behaviour. -If no mode is specified, default to **check** and present findings before making changes. +If no mode is specified, default to **check** and report all findings. ## How you work For each entity, rule or trigger in the spec, find the corresponding implementation. For each significant code path, check whether the spec accounts for it. Report mismatches in both directions: spec says X but code does Y, and code does Z but the spec is silent. +### Process-level checks + +Beyond construct-by-construct comparison, check process-level properties: + +- **Transition reachability in code.** For each transition declared in the spec's transition graph, verify the implementation has a code path that triggers it. If a transition is declared but no code path produces it, flag it. +- **Surface-trigger coverage.** For each rule with an external stimulus trigger, verify the implementation has a corresponding entry point (API endpoint, webhook handler, message consumer). If the spec says `BackgroundCheckResultReceived` is provided by a surface, verify the code has the corresponding handler. +- **Undeclared transitions in code.** Check whether the implementation produces state changes not declared in the spec's transition graph. If code can transition an entity from state A to state C but the graph only allows A → B → C, flag it. +- **Invariant enforcement.** For each expression-bearing invariant in the spec, check whether the implementation enforces it (database constraint, application-level check, test assertion). If no enforcement exists, flag the gap. +- **Bottom-up process reconstruction.** For entities with status fields, trace the state machine from the code: which states exist, which transitions the code produces, which actors trigger them. Compare the reconstructed process to the spec's transition graphs. Present the reconstructed process to the user for validation: "From the code, I see this lifecycle for Order: placed → paid → shipped → delivered, with cancellation possible from placed or paid. The spec's transition graph matches except it doesn't include cancellation from paid. Is this a spec gap or a code bug?" + +Report process-level divergences alongside construct-level ones. Read [assessing specs](../../references/assessing-specs.md) to understand the spec's maturity before checking — don't flag process-level gaps on a coarse spec that hasn't reached that level of development yet. + ## Divergence classification -When you find a mismatch, do not assume which side is correct. Report each divergence as one of: +When you find a mismatch, propose a classification with your reasoning. The caller confirms or overrides. Classify each divergence as one of: - **Spec bug.** The spec is wrong, code is correct. Fix the spec. - **Code bug.** The code is wrong, spec is correct. Fix the code. @@ -65,7 +77,7 @@ When code has repeated interface contracts across service boundaries (e.g. the s ## Boundaries -- You do not build new specifications from scratch. That belongs to the `tend` skill or the `elicit` skill. +- You do not build new specifications from scratch. That belongs to the `elicit` skill. - You do not extract specifications from code. That belongs to the `distill` skill. - You do not modify `references/language-reference.md`. The language definition is governed separately. - You do not make architectural decisions. Flag wider implications and let the caller decide. @@ -76,7 +88,9 @@ Spec alignment checks can require many edit-validate cycles. If you anticipate a ## Verification -After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is installed. Fix any reported issues before presenting the result. If the CLI is not available, verify against the [language reference](../../references/language-reference.md). +After every edit to a `.allium` file, run `allium check` against the modified file if the CLI is installed. Fix any reported issues before presenting the result. If the CLI is not available, verify against the [language reference](../../references/language-reference.md). The first time the CLI is not found, note: "I'll validate against the language reference instead. If you'd like automated checking, the CLI is available via Homebrew or crates.io — see the README for details." + +If `allium analyse` is available, run it after completing divergence checks. Use findings to identify process-level gaps that construct-by-construct comparison misses. A `missing_producer` finding might indicate either a spec gap (the code handles it but the spec doesn't model it) or a code gap (nobody implemented the data path). Classify each finding by checking whether the code addresses it. Consult [actioning findings](../../references/actioning-findings.md) for how to translate findings into domain questions. ## Output format @@ -86,7 +100,7 @@ When reporting divergences (check mode), use this structure for each finding: ### [Entity/Rule name] Spec: [what the spec says] (file:line) Code: [what the code does] (file:line) -Classification: [ask user] +Classification: [proposed classification with reasoning] ``` Group related divergences together. Lead with the most consequential findings.