From 40db003c3174e04c311d05c7a7f6b32590a8c1be Mon Sep 17 00:00:00 2001 From: Yavor Panayotov Date: Tue, 23 Jun 2026 11:14:32 +0300 Subject: [PATCH] Bump vendored allium to v3.6.0 Pin scripts/allium-ref.txt to v3.6.0 and re-vendor plugins/allium via scripts/sync-allium.sh. Ships the Allium loop documentation (juxt/allium#53) to the marketplace. Co-Authored-By: Claude Opus 4.8 (1M context) --- plugins/allium/.claude-plugin/plugin.json | 2 +- plugins/allium/.codex-plugin/plugin.json | 2 +- plugins/allium/README.md | 27 ++++ plugins/allium/skills/allium/SKILL.md | 14 ++ .../allium/references/recommended-loops.md | 145 ++++++++++++++++++ scripts/allium-ref.txt | 2 +- 6 files changed, 189 insertions(+), 3 deletions(-) create mode 100644 plugins/allium/skills/allium/references/recommended-loops.md diff --git a/plugins/allium/.claude-plugin/plugin.json b/plugins/allium/.claude-plugin/plugin.json index 6762607..cc3b736 100644 --- a/plugins/allium/.claude-plugin/plugin.json +++ b/plugins/allium/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "allium", - "version": "3.5.0", + "version": "3.6.0", "description": "Velocity through clarity.", "author": { "name": "JUXT", diff --git a/plugins/allium/.codex-plugin/plugin.json b/plugins/allium/.codex-plugin/plugin.json index 9b0ee1b..45621d8 100644 --- a/plugins/allium/.codex-plugin/plugin.json +++ b/plugins/allium/.codex-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "allium", - "version": "3.5.0", + "version": "3.6.0", "description": "Velocity through clarity.", "author": { "name": "JUXT", diff --git a/plugins/allium/README.md b/plugins/allium/README.md index 35d1789..ea79670 100644 --- a/plugins/allium/README.md +++ b/plugins/allium/README.md @@ -29,6 +29,33 @@ Two forces feed the spec, and one loop keeps it honest against the code: `/elicit` works forward from intent through conversation; `/distill` works backward from existing code, filtering out implementation detail. `/tend` makes targeted edits as requirements change; `/weed` finds where spec and code have diverged and reconciles them in either direction; `/propagate` generates tests from the spec so the implementation is checked against specified behaviour. `/allium` is the entry point — point it at your project and it routes you to the right one. The [skills table](#skills-and-agents) below covers each in detail. +## The Allium loop + +Agentic coding is a loop: the agent **gathers context**, **takes action**, **verifies** the result, and **repeats** until the goal is met. The phase that decides quality is verification — a signal the loop can trust to tell *green* from *done*. Allium is built for that loop, and it owns the hard parts: a *behavioural* verification signal, and context that doesn't drift between sessions. + +``` + gather context ───▶ take action ──────▶ verify ─────────▶ repeat + /elicit · /distill /propagate → tests run tests until + → the spec then implement → /weed converged + (durable context) → CLI checks + ▲ │ + └──────────────── revise intent ◀────────┘ + /tend, when verify shows the SPEC + (not the code) was wrong +``` + +- **Gather context** — `/elicit` (from intent) or `/distill` (from code) produces the spec. Unlike re-reading code each session, the spec *persists*, so meaning doesn't drift. +- **Take action** — `/propagate` turns the spec into tests (the contract), then you implement against them. For new behaviour, confirm the tests *fail* first — a generated test that's already green is already covered or vacuous. +- **Verify** — run the tests, then `/weed` for spec↔code alignment, then the CLI's structural checks. A behaviour-level pass/fail, not merely "unit tests are green." +- **Repeat** — until spec, tests and code agree and no open questions remain. The loop can re-enter at *any* phase: when verification shows the **spec** was wrong, you revise intent (`/tend`), not just the code — the way real development discovers requirements by building, rather than assuming them fixed up front. + +Two entry points run the same loop: + +- **Spec-first** (new features): `/elicit → /propagate → implement → /weed`, looping `/tend → /propagate` as requirements change. +- **Code-first** (existing code): `/distill → review → /propagate → run against the code → /weed`, repeated per area until a pass finds nothing new. + +See [recommended loops](skills/allium/references/recommended-loops.md) for the full walkthrough, both diagrams, exit conditions and the implementation prompt. *(This is the "agentic loop" framing now common in AI engineering; for background see [How Claude Code works](https://code.claude.com/docs/en/how-claude-code-works) and [Loop Engineering](https://thenewstack.io/loop-engineering/).)* + ## Get started Allium works with Claude Code, Codex, Copilot, Cursor, Windsurf, Aider, Continue and 40+ other tools. How you install depends on your editor, but the skills are the same everywhere. diff --git a/plugins/allium/skills/allium/SKILL.md b/plugins/allium/skills/allium/SKILL.md index 60ab538..a50dedb 100644 --- a/plugins/allium/skills/allium/SKILL.md +++ b/plugins/allium/skills/allium/SKILL.md @@ -34,6 +34,19 @@ Allium does NOT specify programming language or framework choices, database sche | Checking spec-to-code alignment | `weed` skill | User wants to find or fix divergences between spec and implementation | | Generating tests from a spec | `propagate` skill | User wants to generate tests, PBT properties or state machine tests from a specification | +## The Allium loop (recommended sequencing) + +The skills are not one-shot commands; they compose into an autonomous-style loop — **gather context → take action → verify → repeat** — that drives three artefacts to agreement: the **spec** (intent), the **tests** (contract), and the **code** (implementation). Gather context with `/elicit` or `/distill` (the spec is durable context); take action with `/propagate` then implementation (in spec-first work, confirm the new tests fail first — a test already green before you implement is already-covered or vacuous); verify by running the tests, then `/weed`, then CLI structural checks; repeat until converged. Verification is the phase that matters most, and the spec-plus-tests-plus-weed signal is what makes the loop trustworthy. After invoking one skill, proactively suggest the next step rather than waiting to be asked. + +Two entry points, one convergence loop: + +- **Spec-first (forward, from intent):** `/elicit` → `/propagate` → implement → `/weed`; use `/tend` then re-`/propagate` when requirements change. +- **Code-first (backward, from existing code):** `/distill` → review intended vs accidental behaviour → `/propagate` → run tests against the code → `/weed` to reconcile → repeat per area. + +The work is "done" when tests pass, `/weed` reports no divergence, and no open questions remain (plus, for code-first, a fresh `/distill` finds nothing new). Two standing rules while looping: never weaken a generated test to make it pass (fix the spec and re-propagate instead), and escalate genuine ambiguity to the human rather than guessing. + +Implementation itself is ordinary coding — Allium produces the spec and tests, not the application code. See the [recommended loops](./references/recommended-loops.md) reference for the full walkthrough, diagrams, exit conditions and the implementation prompt. + ## Quick syntax summary ### Entity @@ -308,4 +321,5 @@ When the `allium` CLI is installed, a hook validates `.allium` files automatical - [Language reference](./references/language-reference.md) — full syntax for entities, rules, expressions, surfaces, contracts, invariants and validation - [Test generation](./references/test-generation.md) — generating tests from specifications +- [Recommended loops](./references/recommended-loops.md) — the gather-context → take-action → verify → repeat loop, with spec-first and code-first walkthroughs - [Patterns](./references/patterns.md) — 9 worked patterns: auth, RBAC, invitations, soft delete, notifications, usage limits, comments, library spec integration, framework integration contract diff --git a/plugins/allium/skills/allium/references/recommended-loops.md b/plugins/allium/skills/allium/references/recommended-loops.md new file mode 100644 index 0000000..b055c55 --- /dev/null +++ b/plugins/allium/skills/allium/references/recommended-loops.md @@ -0,0 +1,145 @@ +# Recommended loops + +Allium keeps three artefacts in agreement: the **spec** (intended behaviour), the **tests** (the executable contract), and the **code** (the implementation). Productive work is driving those three to convergence and keeping them there as things change. This is a loop, not a pipeline. + +There are two entry points — `/elicit` works *forward* from intent, `/distill` works *backward* from existing code — but both run the same convergence loop afterwards. + +## The loop: gather context → take action → verify → repeat + +An autonomous agent works a task as a recursive goal: it **gathers context**, **takes action**, **verifies** the result, and **repeats** until the goal is met. The phase that decides quality is *verification* — a feedback signal the loop can trust. Allium earns its place by strengthening the two hardest phases, context and verification: + +| Phase | What you do | What Allium contributes | +|---|---|---| +| **Gather context** | `/elicit` (forward) or `/distill` (backward) → the spec | The spec *is* the context, and it persists across sessions — no drift, unlike re-reading the code each time | +| **Take action** | `/propagate` → tests, then implement the code | Tests are generated from the spec as the contract; you write code against them | +| **Verify** | run the tests (they must pass), then `/weed` for spec↔code alignment, then the CLI structural checks | A *behavioural* pass/fail signal, not merely "unit tests are green" | +| **Repeat** | iterate until the goal (convergence) is met | The convergence invariant below *is* the goal | + +The point isn't the command order — it's that an autonomous loop is only as good as the signal it verifies against, and Allium supplies a behaviour-level signal: tests projected from intent, plus alignment and structural checks. That is the verification feedback that makes a self-driving loop trustworthy. + +## The convergence invariant + +The loop is "done" for a unit of work when: + +- **tests pass** — the implementation satisfies the spec's obligations; +- **`/weed` reports no divergence** — spec and code agree about behaviour; +- **no open questions remain** — the spec's `open questions` section is empty (ambiguities were resolved, not silently guessed); +- **(code-first only) a fresh `/distill` pass surfaces nothing new** — the spec has caught up with the code. + +While any of these is false, there is work left. Treat that as the loop's exit condition, not a vibe. + +**Convergence governs the whole loop, not the verify phase alone.** Ordinary test-driven coding has a tight inner arc — implement ↔ run tests — that drives *code* toward a contract held fixed. Allium adds an outer arc: the verify phase can feed back into *context*, not just action. When `/weed` shows the **spec** (not the code) is wrong, or a test reveals the intent was ambiguous, or an open question surfaces, you re-enter at gather-context — `/tend` the spec (or ask the human), then re-`/propagate`. So "repeat" can re-enter at any phase: the inner arc converges code → contract, the outer arc converges contract → true intent. This mirrors the software cycle at large — building and testing a thing is what surfaces gaps in what was actually wanted, so requirements themselves iterate; the waterfall assumption that intent is fixed up front is exactly what this avoids. Convergence is reached only when both arcs are still: code matches the contract, and the contract has stopped moving. + +## The skills as loop operators + +Each skill moves one artefact relative to another: + +| Skill | Moves | Direction | +|---|---|---| +| `/elicit` | intent → spec | write spec (forward) | +| `/distill` | code → spec | write spec (backward) | +| `/propagate` | spec → tests | project spec into the contract | +| *(implement)* | tests → code | ordinary coding — **not an Allium skill** | +| `/weed` | code ↔ spec | reconcile divergence either direction | +| `/tend` | edits spec | re-enter the loop after a change | + +The implementation step is plain LLM-and-human coding. Allium has no codegen for the application itself; it produces the spec and the tests, and those hold the hand-written code to the specified behaviour. + +## Loop A — spec-first (forward, from intent) + +**When:** a new feature, or greenfield work where little or no code exists yet. + +``` + ┌──────────────────────────────────────────────────────────┐ + │ until (tests pass ∧ weed clean ∧ no open questions) │ + │ │ + ┌────▶│ 1. /elicit (first pass) ·or· /tend (later passes) │ + │ │ → spec; CLI surfaces structural issues — resolve │ + │ │ 2. /propagate → tests for the new behaviour │ + │ │ 3. run the new tests — CONFIRM THEY FAIL (red) │ + │ │ a test that passes with no new code is a signal: │ + │ │ • already covered → don't duplicate, reference it │ + │ │ • vacuous / mis-asserting → /tend spec or fix test │ + │ │ 4. implement (prompt with spec + failing tests) │ + │ │ 5. run tests │ + │ │ ├─ failing → fix the CODE, back to 4 │ + │ │ └─ a test looks WRONG → it encodes the spec: │ + │ │ /tend the spec, then /propagate (back to 2) │ + │ │ 6. /weed → check spec ↔ code alignment │ + │ │ ├─ divergence → reconcile; if intent changed, /tend │ + └─────┤ └─ open question raised → ask the human, then /tend │ + └──────────────────────────────────────────────────────────┘ + │ + ▼ + spec, tests and code converged +``` + +Step 3 is the "red" check, and it does double duty. It confirms each generated test actually exercises the new behaviour — a test that is *already green* before you implement means the behaviour is already covered (reference the existing test rather than duplicating it) or the test is vacuous (fix it). It also guarantees you see the tests transition red → green, so a later pass is meaningful rather than accidental. + +The other key discipline: when the implementation and a test disagree, decide *which* is wrong. A genuine bug means fix the code (step 5 left branch). A test that asserts the wrong thing means the **spec** is wrong — fix it with `/tend` and re-`/propagate`. Never hand-edit a generated test to make it pass; that severs the contract. + +## Loop B — code-first (backward, from existing code) + +**When:** an existing or unfamiliar codebase you want to capture, verify, or safely change. + +``` + ┌──────────────────────────────────────────────────────────┐ + │ until (distill finds nothing new ∧ failures triaged │ + │ ∧ weed clean) │ + │ │ + ┌────▶│ 1. /distill │ + │ │ → candidate spec: what the code actually does │ + │ │ 2. review the spec │ + │ │ → separate INTENDED behaviour from ACCIDENTAL; │ + │ │ don't bless bugs as requirements │ + │ │ 3. /propagate → tests │ + │ │ 4. run tests against the EXISTING code │ + │ │ ├─ pass → that behaviour is now captured │ + │ │ └─ fail → classify the failure: │ + │ │ • code bug → fix the code │ + │ │ • spec wrong → /tend spec, then /propagate │ + │ │ 5. /weed → reconcile any remaining divergence │ + └─────┤ 6. expand: next area / deepen the spec (another pass) │ + └──────────────────────────────────────────────────────────┘ + │ + ▼ + the area's behaviour is specified, tested and reconciled +``` + +Note the expectation is **inverted** from the spec-first loop: there the new tests should fail until you implement (the "red" check); here they run against code that already exists, so *passing* is the good outcome — it confirms the behaviour is captured. A failing test is *information*, not a defect to suppress: it means the spec and the code disagree, and triaging which is right is the whole point. The equivalent meaningfulness check is catching a test that can't fail for any input (vacuous) during the step-2 review. Distillation of a large codebase usually takes several passes (step 6) — keep going until a pass surfaces nothing new, in the spirit of the [Ralph Wiggum loop](https://ghuntley.com/ralph/). + +## Running the loop autonomously (for the LLM and the agents) + +Both loops can be driven autonomously — `/tend` and `/weed` already ship as standalone agents. When running unattended, treat the loop as an explicit control loop: + +**Per tick:** +1. Advance the spec (`/elicit` or `/distill` on the first tick, `/tend` thereafter only if needed). +2. `/propagate` to refresh tests if the spec changed this tick. +3. Implement / fix code toward green (spec-first) or triage failures (code-first). +4. `/weed` to check alignment. +5. Re-evaluate the convergence invariant. + +**Exit conditions — stop when either holds:** +- the convergence invariant is satisfied; or +- a bounded iteration budget is exhausted (don't spin forever). + +**Guardrails (do not violate, even to reach green):** +- **Confirm new tests fail before implementing (spec-first).** A generated test that is green before any new code is written is a signal — already-covered or vacuous — not success. Resolve it first; don't carry it into the implement step. +- **Never weaken or edit a generated test to pass.** If a test seems wrong, fix the spec and re-propagate. +- **Escalate ambiguity; don't guess.** A real open question goes to the human and into the spec's `open questions` section — silently picking an interpretation is the exact failure Allium exists to prevent. +- **No magic numbers in code that the spec puts in `config`.** Honour the spec's parameters. +- **Fix the code, not the contract**, when code and spec disagree and the spec is right. + +**State worth tracking across ticks:** tests status (pass/fail counts), `/weed` verdict, count of open questions, and — for code-first — whether the last `/distill` pass found anything new. Convergence is all four trending to zero/clean. + +## The "produce the code" prompt + +There is no skill for implementation. Point the model at the spec and the propagated tests: + +> Implement the behaviour described in `.allium`. The generated tests in `` are the contract — make them pass. Follow the spec's rules (`when` / `requires` / `ensures`) for behaviour; the implementation approach is yours, since the spec deliberately doesn't prescribe *how*. Do not weaken or edit the tests to go green — if a test looks wrong, stop and flag it so we revisit the spec. + +## Related + +- [Assessing specs](./assessing-specs.md) — gauge spec maturity before propagating (a coarse spec yields only thin tests). +- [Actioning findings](./actioning-findings.md) — the `/weed` and `allium analyse` finding taxonomy used in steps 4–5. +- [Test generation](./test-generation.md) — what `/propagate` produces. diff --git a/scripts/allium-ref.txt b/scripts/allium-ref.txt index c0c4025..130165b 100644 --- a/scripts/allium-ref.txt +++ b/scripts/allium-ref.txt @@ -1 +1 @@ -v3.5.0 +v3.6.0