diff --git a/docs/adr/0046-expand-orchestrator-tool-list-to-dispatch-authority.md b/docs/adr/0046-expand-orchestrator-tool-list-to-dispatch-authority.md new file mode 100644 index 000000000..a2fa435b9 --- /dev/null +++ b/docs/adr/0046-expand-orchestrator-tool-list-to-dispatch-authority.md @@ -0,0 +1,117 @@ +--- +id: ADR-0046 +title: Expand orchestrator tool list from Read/Grep to full dispatch authority +status: accepted +date: 2026-05-13 +deciders: + - Luis Mendez +consulted: + - Claude Sonnet 4.6 +informed: + - Specorator contributors +supersedes: [] +superseded-by: [] +tags: [orchestrator, agents, security, architecture, goal-loop] +--- + +# ADR-0046 — Expand orchestrator tool list from Read/Grep to full dispatch authority + +## Status + +Accepted + +## Context + +The current orchestrator agent definition (`.claude/agents/orchestrator.md`) declares `tools: [Read, Grep]`. With only these two tools, the orchestrator is advisory-only: it can inspect state and recommend what should happen next, but it cannot dispatch subagents, update workflow state, or gate on user decisions. Every action requires a manual slash command from the user. + +The goal-loop feature (issue #501, PRD-ORCH-001) requires the orchestrator to: + +1. Spawn specialist subagents (researcher, architect, planner, dev, qa, reviewer) via the Agent tool (REQ-ORCH-001). +2. Write and update `workflow-state.md` to track phase transitions and persist state before every HITL gate (REQ-ORCH-002, REQ-ORCH-022). +3. Write new artifacts to `specs//` as the goal-loop progresses — specifically `scope.md` and `session-summary.md`. +4. Present synchronous HITL gates to the user via AskUserQuestion (REQ-ORCH-008, REQ-ORCH-011, REQ-ORCH-015). + +Granting the orchestrator `Agent` tool access is a trust-boundary expansion. The orchestrator, as the root session agent (via plugin `settings.json agent: orchestrator`), already operates with the session's full permission mode. Adding `Agent`, `Write`, and `Edit` to its tool list makes that scope explicit and exercisable rather than implicit. + +The platform constraint that subagents cannot spawn subagents (Claude Code hard limit) means the orchestrator must be the root session agent — it cannot itself be invoked as a subagent. This architectural constraint actually reduces risk: the orchestrator's expanded tool list does not propagate to any subagent context. + +## Decision + +We expand the orchestrator agent's tool list from `[Read, Grep]` to `[Agent, Read, Write, Edit, AskUserQuestion]`. + +The rationale for each addition: + +- **Agent** — required to dispatch specialist subagents per REQ-ORCH-001. Without this, the goal-loop cannot spawn any specialist. +- **Write** — required to create `workflow-state.md`, `scope.md`, and `session-summary.md`. The orchestrator is the sole owner of `workflow-state.md` transitions (REQ-ORCH-002). +- **Edit** — required to update `workflow-state.md` in-place between phases without rewriting the full file on each transition. +- **AskUserQuestion** — required to implement the three HITL gates (post-scope, post-design, post-review) and the stall gate (REQ-ORCH-008, REQ-ORCH-011, REQ-ORCH-014, REQ-ORCH-015). +- **Read** — retained for pre-flight precondition checks (REQ-ORCH-003) and for reading `workflow-state.md` on session resume. +- **Grep** — removed. The orchestrator does not perform search operations in the goal-loop; search is delegated to specialist subagents. Removing Grep narrows the tool surface. + +The orchestrator's write boundary is restricted to `specs//` directories and their content files. It does not gain Bash, WebSearch, or any other tool. + +## Considered options + +### Option A — Keep Read/Grep; use a wrapper subagent as the dispatch authority + +Have the orchestrator remain advisory-only; introduce a separate "goal-loop runner" subagent with full dispatch authority that the user invokes via a slash command. + +- Pros: No change to existing orchestrator trust surface. +- Cons: The goal-loop runner would itself need Agent tool access — we have not reduced the trust surface, only moved it. Additionally, the platform constraint (subagents cannot spawn subagents) means a subagent cannot be the dispatch authority. This option is architecturally impossible under the platform constraints. + +### Option B — Expand orchestrator tool list (chosen) + +Promote the existing orchestrator to full dispatch authority by adding Agent, Write, Edit, AskUserQuestion. + +- Pros: Consistent with the platform model (root session agent has dispatch authority); no new agent definition; the trust expansion is explicit in the agent frontmatter; the platform constraint enforces that this authority does not cascade to subagents. +- Cons: The orchestrator now has Write access to `specs/` directories; this is a wider trust surface than the current Read-only posture. + +### Option C — Use Agent Teams (experimental) + +Use `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` to delegate dispatch authority to a team lead without expanding the orchestrator's own tool list. + +- Pros: Experimental feature; no tool list change on the orchestrator. +- Cons: Rejected in research.md (Alternative C) as an experimental feature with known limitations: skills and MCP not applied to teammates, no session resumption, one team per lead. Reserved for v2. + +## Consequences + +### Positive + +- The orchestrator can now autonomously drive the full goal-loop without any user slash-command chaining (REQ-ORCH-001, REQ-ORCH-002, REQ-ORCH-006). +- HITL gates are enforced by the orchestrator itself via AskUserQuestion, not by convention. +- `workflow-state.md` transitions are owned and written by a single agent, preventing multi-agent write races. +- Subagents continue to have no Agent tool access — they cannot spawn further subagents, preserving the star topology. + +### Negative + +- The orchestrator now has Write access to `specs//` paths. A defective orchestrator implementation could overwrite spec artifacts. Mitigation: the orchestrator's write boundary is documented and lint-checked; subagents do not inherit this access. +- Removing Grep is a mild capability reduction for the existing advisory-only use case. Any user who invokes the orchestrator outside a goal-loop and expects search behaviour will notice a gap. Mitigation: the orchestrator's advisory use case is superseded by the goal-loop; the description in the agent definition is updated to reflect the new primary role. + +### Neutral + +- The plugin's `settings.json` continues to declare `agent: orchestrator`, making the orchestrator the default session agent when the plugin is enabled. This does not change. +- The `build-claude-plugin.ts` pipeline copies `.claude/agents/orchestrator.md` into the plugin bundle unchanged; the tool list expansion is automatically included in the next bundle build. + +## Compliance + +- The orchestrator's YAML frontmatter `tools:` field in `.claude/agents/orchestrator.md` must list `[Agent, Read, Write, Edit, AskUserQuestion]` exactly. +- `check-agents.ts` must validate that the orchestrator's frontmatter does not declare `hooks`, `mcpServers`, or `permissionMode` (REQ-ORCH-020). +- Design spec (Part C — Components table) must document the orchestrator's write boundary as `specs//` only. +- The release-criteria checklist in `requirements.md` includes verification that all 85 existing slash commands continue to function after the orchestrator tool expansion (REQ-ORCH-005, REQ-ORCH-021). + +## References + +- PRD-ORCH-001 — Goal-oriented orchestrator plugin requirements +- REQ-ORCH-001 — Orchestrator dispatch via Agent tool +- REQ-ORCH-002 — Orchestrator ownership of workflow-state.md transitions +- REQ-ORCH-020 — Plugin agent frontmatter validation +- REQ-ORCH-021 — Backward compatibility for non-plugin users +- `specs/goal-oriented-orchestrator-plugin/design.md` Part C — Architecture +- `research.md` — Alternative B (Anthropic native Orchestrator-Subagent pattern, recommended) +- ADR-0043 — Plugin bundle distribution model +- [Claude Code — Create custom subagents](https://code.claude.com/docs/en/sub-agents) +- [GitHub Issue #19077 — Subagents cannot spawn subagents](https://github.com/anthropics/claude-code/issues/19077) + +--- + +> **ADR bodies are immutable.** To change a decision, supersede it with a new ADR; only the predecessor's `status` and `superseded-by` pointer fields may be updated. diff --git a/docs/adr/0047-adopt-goal-loop-workflow-state-schema-extensions.md b/docs/adr/0047-adopt-goal-loop-workflow-state-schema-extensions.md new file mode 100644 index 000000000..b76c217bd --- /dev/null +++ b/docs/adr/0047-adopt-goal-loop-workflow-state-schema-extensions.md @@ -0,0 +1,123 @@ +--- +id: ADR-0047 +title: Extend workflow-state.md schema with goal-loop fields +status: accepted +date: 2026-05-13 +deciders: + - Luis Mendez +consulted: + - Claude Sonnet 4.6 (architect agent) +informed: + - Specorator contributors +supersedes: [] +superseded-by: [] +tags: [orchestrator, goal-loop, workflow-state, schema, zod, artifact] +--- + +# ADR-0047 — Extend workflow-state.md schema with goal-loop fields + +## Status + +Accepted + +## Context + +`workflow-state.md` is the durable checkpoint for all Specorator session state. ADR-0042 established a typed-artifact reader seam (Zod schema) for frontmatter parsing. The existing schema captures lifecycle stage and status fields sufficient for the 11-stage manual workflow. + +The goal-loop (PRD-ORCH-001) introduces an orchestrator that drives multi-phase sessions autonomously. Session resume (REQ-ORCH-022), stall detection (REQ-ORCH-014), and pre-flight precondition checks (REQ-ORCH-003) all require the orchestrator to read structured state from `workflow-state.md` that is not present in the current schema. Specifically: + +1. **Current phase within the goal-loop** — the orchestrator must know which of the six phases (scope, research, design, plan, implement, review) is active when resuming a session. +2. **HITL state** — which gate is pending; what the gate content was (so the gate can be replayed without re-running the phase). +3. **Researcher count** — how many analyst subagents were dispatched in the research wave (used in status messages and for wave-cost auditing in `session-summary.md`). +4. **Wave schedule** — the topological wave plan derived from `tasks.md` (wave index → list of task IDs). Persisted so that the orchestrator can resume mid-wave after a session interrupt without re-parsing `tasks.md`. +5. **Stall counters** — per-task retry count, keyed by task ID. Allows the orchestrator to detect stalls across session restarts, not just within a single session. + +The release criteria in `requirements.md` explicitly state: "`workflow-state.md` Zod schema (ADR-0042 prerequisite) is in place before implementation of REQ-ORCH-002 and REQ-ORCH-022." + +## Decision + +We extend the `workflow-state.md` Zod schema (introduced by ADR-0042) with the following optional goal-loop fields in the YAML frontmatter: + +```yaml +goal_loop: + current_phase: scope | research | design | plan | implement | review | complete | aborted + hitl_state: + gate: 1 | 2 | 3 | stall + pending: true | false + gate_content_ref: "specs//workflow-state.md#gate-content" # embedded in body + researcher_count: + wave_schedule: + - wave: 1 + task_ids: [T-ORCH-001, T-ORCH-002] + - wave: 2 + task_ids: [T-ORCH-003] + stall_counters: + T-ORCH-003: 2 + artifacts_produced: + - specs//scope.md + - specs//research.md +``` + +All `goal_loop` fields are optional — their absence indicates the session is using the manual 11-stage command workflow, not the goal-loop. Existing `workflow-state.md` files without a `goal_loop` key are valid under the extended schema. + +The `hitl_state.gate_content_ref` points to a section in the `workflow-state.md` body (not a separate file). Gate content is embedded in the body as a Markdown block so the entire checkpoint is a single file. + +The Zod schema extension follows the additive-only convention established by ADR-0042: no existing required field is changed; only new optional fields are added. + +## Considered options + +### Option A — Extend workflow-state.md schema (chosen) + +Add optional `goal_loop` fields to the existing Zod schema. Single file; single source of truth for session state. + +- Pros: consistent with the existing state model; no new infrastructure; session resume reads the same file as phase tracking; the `build-claude-plugin.ts` pipeline requires no changes. +- Cons: `workflow-state.md` grows in size during a goal-loop session. For a 5-wave session with 20 tasks, stall_counters and wave_schedule add ~30 lines of YAML. + +### Option B — Introduce a separate goal-loop-state.md file + +Store goal-loop-specific state in `specs//goal-loop-state.md` distinct from `workflow-state.md`. + +- Pros: keeps the existing `workflow-state.md` schema unchanged; separates concerns. +- Cons: creates a second "source of truth" for session state; the orchestrator must write and read two files atomically to maintain consistency; session resume becomes a two-file operation with a risk of partial-write inconsistency. + +### Option C — Store goal-loop state in memory only (no disk persistence) + +Keep goal-loop state in the orchestrator's context window, not on disk. + +- Pros: no schema changes; simple. +- Cons: session resume is impossible — REQ-ORCH-022 requires state to survive session interruption. Directly contradicts NFR-ORCH-008. + +## Consequences + +### Positive + +- Session resume (REQ-ORCH-022) is reliable: all state needed to replay a HITL gate lives in `workflow-state.md`. +- Stall detection (REQ-ORCH-014) is persistent across session restarts: `stall_counters` survive a process restart. +- Pre-flight checks (REQ-ORCH-003) can use `artifacts_produced` to verify preconditions without filesystem stat calls on every field. +- The schema extension is additive — no existing Specorator workflow or test is affected. + +### Negative + +- The Zod schema for `workflow-state.md` (ADR-0042) must be updated before implementation of REQ-ORCH-002 and REQ-ORCH-022 can begin. This is a blocking prerequisite. +- Gate content embedded in the `workflow-state.md` body makes the file longer during a session. Mitigated by the fact that only one gate is pending at a time and old gate content can be cleared on gate resolution. + +### Neutral + +- The `workflow-state.md` Zod schema lives in the scripts layer (established by ADR-0042). The extension PR touches only the schema module and adds no new scripts. + +## Compliance + +- The Zod schema module (path established by ADR-0042) must be updated to include the `goal_loop` optional field group before the implementation phase of the goal-loop feature begins. +- The orchestrator's system prompt must document which `goal_loop` sub-fields it writes at each phase transition. +- `npm run verify` must include schema validation of any `workflow-state.md` file produced by the test suite. + +## References + +- PRD-ORCH-001 — REQ-ORCH-002, REQ-ORCH-003, REQ-ORCH-014, REQ-ORCH-022; NFR-ORCH-008 +- ADR-0042 — Typed artifact reader seam for frontmatter parsing (prerequisite schema) +- DESIGN-ORCH-001 Part C — Data model section (workflow-state.md extended fields) +- `specs/goal-oriented-orchestrator-plugin/research.md` §State management model + +--- + +> **ADR bodies are immutable.** To change a decision, supersede it with a new ADR; only the predecessor's `status` and `superseded-by` pointer fields may be updated. diff --git a/docs/adr/0048-introduce-scope-md-and-session-summary-md-as-goal-loop-artifacts.md b/docs/adr/0048-introduce-scope-md-and-session-summary-md-as-goal-loop-artifacts.md new file mode 100644 index 000000000..b4f89a015 --- /dev/null +++ b/docs/adr/0048-introduce-scope-md-and-session-summary-md-as-goal-loop-artifacts.md @@ -0,0 +1,114 @@ +--- +id: ADR-0048 +title: Introduce scope.md and session-summary.md as canonical goal-loop artifacts +status: accepted +date: 2026-05-13 +deciders: + - Luis Mendez +consulted: + - Claude Sonnet 4.6 (architect agent) +informed: + - Specorator contributors +supersedes: [] +superseded-by: [] +tags: [orchestrator, goal-loop, artifacts, scope, session-summary, templates] +--- + +# ADR-0048 — Introduce scope.md and session-summary.md as canonical goal-loop artifacts + +## Status + +Accepted + +## Context + +The goal-loop (PRD-ORCH-001) produces two new artifacts that have no equivalent in the existing 11-stage lifecycle: + +1. **`specs//scope.md`** — produced by the scope phase. Contains the EARS acceptance criteria extracted by the grill skill from the user's problem statement or GitHub issue body. This file is the basis for the Gate 1 HITL presentation (REQ-ORCH-008), the reviewer subagent's validation targets (REQ-ORCH-015), and the session summary's criteria-status table (REQ-ORCH-016). It is the user-editable source of truth for "what we agreed to build." + +2. **`specs//session-summary.md`** — produced at goal-loop completion (Gate 3 accepted). Contains: the decisions made during the session, the EARS acceptance criteria status (pass/fail per criterion), the list of artifacts produced with their paths, the traceability section mapping REQ/T/TEST IDs to artifacts, and the open follow-ups (deferred tasks). This is the primary handoff artifact for solo developers and the audit record for enterprise evaluators. + +Neither artifact is created by any existing `/spec:*` command. They are introduced exclusively by the goal-loop conductor skill. + +The requirement in `docs/sink.md` is that every artifact type's location is documented. A new artifact type that lands in `specs//` must be registered. Similarly, per `AGENTS.md` conventions, artifact templates belong in `templates/` to guide both the orchestrator's writer and future human authors. + +## Decision + +We introduce `scope.md` and `session-summary.md` as canonical goal-loop artifact types with the following properties: + +**scope.md:** +- Location: `specs//scope.md` +- Owner (writer): orchestrator (written in the scope phase, before Gate 1) +- User-editable: yes — the Gate 1 "Edit" path directs the user to edit this file +- Re-read: orchestrator re-reads after user edits; criteria list is re-presented at Gate 1 +- Contains: YAML frontmatter (feature slug, created timestamp, EARS criteria count) + body with numbered EARS criteria, each with: criterion text, EARS pattern type, source (free-text or issue reference) +- Template: `templates/scope.md` is created as a reference template + +**session-summary.md:** +- Location: `specs//session-summary.md` +- Owner (writer): orchestrator (written when Gate 3 is accepted or session is aborted with partial results) +- User-editable: no during session; read-only after completion +- Contains: YAML frontmatter (feature slug, session start/end timestamps, goal-loop phase at completion, artifact list) + body sections: Decisions, Acceptance Criteria Status, Artifacts Produced, Traceability, Open Follow-ups +- Template: `templates/session-summary.md` is created as a reference template + +Both templates follow the `templates/` convention (Markdown with frontmatter, kebab-case filename, single artifact type per template) established by existing templates in the repository. + +Both artifacts are added to `docs/sink.md` under the `specs//` section. + +## Considered options + +### Option A — Reuse existing artifacts (no new types) + +Embed scope criteria in `requirements.md` and session summary content in `review.md` / `retrospective.md`. + +- Pros: no new artifact types; existing templates cover the surface. +- Cons: `requirements.md` is produced by Stage 3 (`/spec:requirements`) and has a fixed structure (PRD format with EARS sections); embedding goal-loop scope criteria in it would require the PM agent to be involved in scope extraction, violating the goal-loop's autonomous scope phase. `review.md` is a reviewer agent artifact; `session-summary.md` serves a different audience (user-facing handoff, not agent-facing quality check). Reuse would require both documents to serve two incompatible purposes. + +### Option B — Introduce scope.md and session-summary.md as new artifact types (chosen) + +- Pros: single-responsibility per artifact; clear ownership (orchestrator writes, user reads/edits scope.md; orchestrator writes session-summary.md as a terminal artifact); no schema conflicts with existing stage artifacts; templates enable future human authoring outside the goal-loop. +- Cons: two new artifact types added to the `specs//` space; `docs/sink.md` must be updated. + +### Option C — Store scope criteria only in workflow-state.md (no scope.md) + +- Pros: one fewer artifact file. +- Cons: scope criteria are user-editable (Gate 1 "Edit" path); embedding editable content in `workflow-state.md` — which the user is told not to edit manually — creates a contradiction. A dedicated `scope.md` is cleaner and consistent with the file-based artifact model. + +## Consequences + +### Positive + +- The Gate 1 "Edit" path is clean: the user opens a well-structured file, edits criteria, and returns. The orchestrator re-reads a stable, typed file. +- `session-summary.md` serves as the primary audit record for enterprise evaluators without requiring them to parse multiple agent artifacts. +- Templates enable future human authoring and manual goal-loop entry without the orchestrator. +- `docs/sink.md` is kept accurate. + +### Negative + +- Two new template files must be created and maintained. +- `docs/sink.md` requires an update to register both artifact types. +- The `core-lifecycle/manifest.md` and `core-lifecycle/schema.json` (ADR-0036) should be updated to list the new output artifact types; this is a non-blocking follow-up. + +### Neutral + +- The existing `specs//` artifact space is not restructured. `scope.md` and `session-summary.md` land alongside the existing artifacts (idea.md, research.md, etc.) without displacing them. +- Neither artifact is produced by any existing `/spec:*` command. Existing users who do not use the goal-loop will never encounter these files. + +## Compliance + +- `templates/scope.md` is created with a valid frontmatter schema and body structure before the implementation phase. +- `templates/session-summary.md` is created with a valid frontmatter schema and body structure before the implementation phase. +- `docs/sink.md` is updated to register `specs//scope.md` and `specs//session-summary.md` under the `specs//` section. +- The `core-lifecycle/manifest.md` `outputs:` list is updated to include both artifact paths (follow-up, not blocking). + +## References + +- PRD-ORCH-001 — REQ-ORCH-008 (scope.md produced at Gate 1), REQ-ORCH-015 (scope.md used as review target), REQ-ORCH-016 (session-summary.md content requirements) +- DESIGN-ORCH-001 Part C — Data model section +- ADR-0036 — Plugin manifest standard (core-lifecycle outputs list) +- `docs/sink.md` — artifact location registry +- `templates/` — existing template conventions + +--- + +> **ADR bodies are immutable.** To change a decision, supersede it with a new ADR; only the predecessor's `status` and `superseded-by` pointer fields may be updated. diff --git a/docs/adr/README.md b/docs/adr/README.md index 103de0f71..c19f15ccd 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -58,6 +58,9 @@ Records of architecturally significant decisions. Format follows Michael Nygard' | [0043](0043-distribute-claude-plugin-bundle-from-orphan-dist-branch.md) | Distribute Claude Code plugin bundle from an orphan dist branch via git-subdir | Accepted | | [0044](0044-restore-npmjs-trusted-publishing.md) | Restore npmjs.com Trusted Publishing — re-enable OIDC + provenance | Accepted | | [0045](0045-adopt-docs-backlog-canonical.md) | Adopt docs/backlog/ as the canonical issue and pull-request mirror | Accepted | +| [0046](0046-expand-orchestrator-tool-list-to-dispatch-authority.md) | Expand orchestrator tool list from Read/Grep to full dispatch authority | Accepted | +| [0047](0047-adopt-goal-loop-workflow-state-schema-extensions.md) | Extend workflow-state.md schema with goal-loop fields | Accepted | +| [0048](0048-introduce-scope-md-and-session-summary-md-as-goal-loop-artifacts.md) | Introduce scope.md and session-summary.md as canonical goal-loop artifacts | Accepted | ## ADR Dispositions diff --git a/specs/goal-oriented-orchestrator-plugin/design.md b/specs/goal-oriented-orchestrator-plugin/design.md new file mode 100644 index 000000000..593a7e68a --- /dev/null +++ b/specs/goal-oriented-orchestrator-plugin/design.md @@ -0,0 +1,1714 @@ +--- +id: DESIGN-ORCH-001 +title: Goal-oriented orchestrator plugin — Design +stage: design +feature: goal-oriented-orchestrator-plugin +status: accepted +owner: architect +collaborators: + - ux-designer + - ui-designer + - architect +inputs: + - PRD-ORCH-001 + - RESEARCH-ORCH-001 +adrs: + - ADR-0046 + - ADR-0047 + - ADR-0048 +created: 2026-05-13 +updated: 2026-05-13 +--- + +# Design — Goal-oriented orchestrator plugin + +## Context + +Specorator's current command-chain-driven onboarding requires users to manually invoke 11 sequential slash commands before receiving value. The orchestrator agent exists today but is advisory-only — it cannot dispatch subagents, cannot update workflow state, and cannot enforce stage gates. This design promotes the orchestrator to full dispatch authority and introduces the goal-loop: a six-phase conductor that moves a user from a free-text problem statement or GitHub issue reference to a fully traceable, spec-driven resolution without any manual slash-command chaining. + +The surface this design covers is a **conversational CLI tool** (Claude Code), not a visual application. Every "screen" is a turn in a chat conversation. UX here means: what the orchestrator says, when it says it, what options it offers, and how it recovers from failure. The medium is text; the interaction primitives are `AskUserQuestion` calls and orchestrator status messages. + +## Goals (design-level) + +- DG1: A first-time user who can describe their problem in plain English (or paste a GitHub issue number) should reach a confirmed, structured scope — with EARS acceptance criteria they have approved — within one conversation turn plus one explicit confirmation, without reading any documentation. +- DG2: Every HITL gate must present options that are skimmable in under 10 seconds; the user must never need to hold the full session context in their head to make a good decision at a gate. +- DG3: Every error or stall state must name the affected artifact or task, explain what went wrong in one sentence, and offer at least one forward path. "Something went wrong" is not a valid error message. +- DG4: A user who interrupted a session at any HITL gate must be able to resume from exactly that gate without re-running prior phases. +- DG5: All existing `/spec:*` slash commands remain discoverable and usable unchanged. The orchestrator is an accelerator, not a cage. + +## Non-goals + +- This design does not define visual styling, colours, or font choices. Those are the ui-designer's territory (Part B). +- This design does not specify data structures, file schemas, or service boundaries. Those are the architect's territory (Part C). +- This design does not introduce new requirements beyond those in `requirements.md`. Anything missing is escalated, not invented. +- This design does not cover the web product page, documentation site, or any non-CLI surface. +- This design does not cover async or PR-based approval flows; synchronous `AskUserQuestion` is the chosen gate pattern (NG3 in requirements.md). + +--- + +## Part A — UX + +### User flows + +#### Flow A1 — Free-text problem statement entry (REQ-ORCH-006) + +The user opens a Claude Code session with the Specorator plugin enabled. The orchestrator is the active session agent. + +```mermaid +sequenceDiagram + actor User + participant Orch as Orchestrator + participant Grill as grill skill + + User->>Orch: Types a free-text problem description (no slash prefix) + Orch->>Orch: Detects: no slash prefix, no issue reference → scope phase + Orch->>Orch: Writes workflow-state.md (phase: scope, status: in-progress) + Orch->>Grill: Invokes grill skill with problem statement as seed + Grill-->>Orch: Returns structured EARS acceptance criteria + Orch->>Orch: Writes workflow-state.md (phase: scope, status: awaiting-hitl-1) + Orch->>User: AskUserQuestion — Gate 1 (scope confirmation) + User->>Orch: Responds: approve / edit / abort + alt approve + Orch->>Orch: Advances to research wave + else edit + Orch->>User: "Open specs//scope.md and edit the criteria, then reply 'done' to continue." + User->>Orch: Replies "done" + Orch->>Orch: Re-reads edited criteria, re-presents Gate 1 + else abort + Orch->>Orch: Writes workflow-state.md (phase: aborted) + Orch->>User: "Session aborted. specs//scope.md has been written for reference. No other artifacts were produced. To start fresh, describe your problem again." + end +``` + +**Entry condition:** The orchestrator is the active session agent (via plugin `settings.json agent: orchestrator`) and the user's opening message is not prefixed by a slash command and does not contain a GitHub issue reference pattern. + +**Exit condition:** User confirms scope (Gate 1 approved) or explicitly aborts. + +--- + +#### Flow A2 — GitHub issue reference entry (REQ-ORCH-007) + +```mermaid +sequenceDiagram + actor User + participant Orch as Orchestrator + participant GH as GitHub (read) + participant Grill as grill skill + + User->>Orch: Sends "#501" or a GitHub issue URL + Orch->>Orch: Detects issue reference pattern + Orch->>Orch: Status message: "Fetching issue #501..." + Orch->>GH: Reads issue title and body + GH-->>Orch: Returns title + body text + Orch->>Orch: Status message: "Read issue #501: [title]. Starting scope phase." + Orch->>Grill: Invokes grill skill with issue title + body as seed + Grill-->>Orch: Returns structured EARS acceptance criteria + Orch->>Orch: Writes workflow-state.md (phase: scope, status: awaiting-hitl-1) + Orch->>User: AskUserQuestion — Gate 1 (scope confirmation, prefaced with issue title) + User->>Orch: Responds: approve / edit / abort +``` + +**Notes:** The issue title is displayed at the top of Gate 1 to anchor the user's context. The flow after Gate 1 is identical to Flow A1. + +--- + +#### Flow A3 — `/issue:tackle` entry (REQ-ORCH-023) + +```mermaid +sequenceDiagram + actor User + participant Orch as Orchestrator + + User->>Orch: /issue:tackle #501 + Orch->>Orch: Normalises to: issue reference #501 + note over Orch: Identical to Flow A2 from this point + Orch->>Orch: Reads issue title + body, invokes grill, presents Gate 1 +``` + +**Notes:** `/issue:tackle` is treated as syntactic sugar for submitting an issue reference. No divergent path exists; this prevents two separate mental models for the same action. + +--- + +#### Flow A4 — Research wave and design synthesis (REQ-ORCH-009, 010, 011) + +This flow begins after Gate 1 is approved. The user does not interact until Gate 2; however, they receive progress messages so the session does not appear hung. + +```mermaid +flowchart TD + A([Gate 1 approved]) --> B[Orchestrator assesses scope surface area] + B --> C{1–5 researcher\nsubagents needed?} + C --> D[Status: 'Starting research wave with N analyst agents...'] + D --> E[Parallel Agent tool calls — N analysts dispatched] + E --> F[Status: 'Research wave complete. Synthesising findings...'] + F --> G[Orchestrator de-duplicates and merges into research.md] + G --> H[Status: 'Dispatching architect for design synthesis...'] + H --> I[Architect subagent produces design.md] + I --> J[Orchestrator writes workflow-state.md — awaiting-hitl-2] + J --> K[AskUserQuestion — Gate 2: design approval] +``` + +**Status messages (non-interactive, displayed inline):** + +- `Starting research wave — dispatching [N] analyst agent(s)...` — shown immediately after Gate 1 approval, before any Agent tool call. +- `Research complete — [N] findings merged into research.md.` — shown after synthesis. +- `Producing design document (design.md)...` — shown while architect subagent runs. + +These messages are not questions; they require no response. They exist to prevent the user from thinking the session has stalled during what can be a multi-minute autonomous phase. + +--- + +#### Flow A5 — Plan phase (REQ-ORCH-012) + +```mermaid +flowchart TD + A([Gate 2 approved]) --> B[Status: 'Planning implementation — decomposing design into tasks...'] + B --> C[Planner subagent produces tasks.md with DAG edges] + C --> D[Status: 'Plan complete — N tasks across M waves. Starting implementation.'] + D --> E[Implement waves — Flow A6] +``` + +**Notes:** The plan phase does not have its own HITL gate. The user sees the task count and wave count in the transition message. If a user wants to inspect `tasks.md` before implementation begins, they can open the file at any time — but the orchestrator does not pause to prompt them to do so. This is intentional: the design-approval gate (Gate 2) is the last affordable correction point for structural decisions; task-level changes are handled via the targeted-revision path at Gate 3. + +--- + +#### Flow A6 — Implement waves (REQ-ORCH-013, REQ-ORCH-014) + +```mermaid +flowchart TD + A([Plan ready]) --> B[Compute topological wave schedule from tasks.md] + B --> C[Status: 'Wave 1 of M — dispatching K task agents in parallel...'] + C --> D{All wave-1 tasks\ncomplete without stall?} + D -- yes --> E[Status: 'Wave 1 complete. Advancing to wave 2 of M.'] + E --> F{More waves?} + F -- yes --> C + F -- no --> G[Flow A7: Review phase] + D -- stall detected --> H[Stall gate — Flow A8] + H --> I{User chose?} + I -- retry --> C + I -- skip --> E + I -- abort --> J[Session summary with partial results] +``` + +**Status messages per wave:** + +- `Wave [N] of [M] — running [K] task(s) in parallel worktrees...` +- `Wave [N] complete — [K] task(s) merged.` + +--- + +#### Flow A7 — Review phase and Gate 3 (REQ-ORCH-015) + +```mermaid +flowchart TD + A([All implement waves complete]) --> B[Status: 'All implementation waves complete. Starting review...'] + B --> C[Reviewer + QA subagents validate against EARS criteria] + C --> D[Orchestrator writes workflow-state.md — awaiting-hitl-3] + D --> E[AskUserQuestion — Gate 3: review verdict] + E --> F{User chose?} + F -- accept --> G[Session summary — Flow A9] + F -- targeted revision --> H[Status: 'Re-entering implementation for affected tasks...'] + H --> I[Implement waves — affected tasks only — Flow A6] + I --> C +``` + +--- + +#### Flow A8 — Stall detection and escalation (REQ-ORCH-014) + +```mermaid +flowchart TD + A([Subagent stalled after 3 retries]) --> B[Orchestrator writes workflow-state.md — stall-detected] + B --> C[AskUserQuestion — Stall gate] + C --> D{User chose?} + D -- retry --> E[Orchestrator retries task — resets counter] + D -- skip --> F[Task marked deferred — continue to next wave] + D -- abort --> G[Session summary with partial results written] +``` + +--- + +#### Flow A9 — Session completion (REQ-ORCH-016) + +```mermaid +flowchart TD + A([Gate 3 accepted]) --> B[Orchestrator writes session-summary.md] + B --> C[Orchestrator updates workflow-state.md — complete] + C --> D[Displays: 'Goal-loop complete. Summary written to specs/slug/session-summary.md.'] + D --> E[Lists artifact paths produced] + E --> F([Session ends]) +``` + +--- + +#### Flow A10 — Session resume (REQ-ORCH-022) + +A user who re-opens a session that was interrupted at a HITL gate encounters this flow. + +```mermaid +sequenceDiagram + actor User + participant Orch as Orchestrator + + User->>Orch: Opens Claude Code session (any message) + Orch->>Orch: Reads workflow-state.md on startup + Orch->>Orch: Detects: in-progress goal-loop, phase = [phase], status = awaiting-hitl-[N] + Orch->>User: AskUserQuestion — Resume prompt + User->>Orch: Responds: resume / restart / abandon + alt resume + Orch->>Orch: Replays HITL gate [N] with its original content + else restart + Orch->>Orch: Clears phase state, re-enters scope phase + else abandon + Orch->>User: "Goal-loop for [slug] abandoned. Partial artifacts remain in specs/[slug]/. Start fresh with a new problem statement." + end +``` + +--- + +### Information architecture + +The goal-loop does not introduce new top-level navigation for the user. All artifacts land in the existing `specs//` convention. The orchestrator is the single entry point; the six goal-loop phases are not separately addressable by the user — they are internal orchestrator states. + +**Deep-link convention:** There is no URL-based deep-linking in a CLI context. Session resume is file-based: `workflow-state.md` is the bookmark. A user can direct-link to a specific gate by resuming a session; the orchestrator reads the saved state and replays the gate. + +**Feature slug derivation:** The orchestrator derives the feature slug from the first noun phrase of the problem statement or from the GitHub issue number (e.g., `issue-501`). The user sees the slug in the Gate 1 confirmation message. If the slug conflicts with an existing `specs/` folder, the orchestrator appends a short hash suffix and notes this in the Gate 1 message. + +**Artifact reachability map:** + +| Artifact | Phase produced | Reachable by user | +|---|---|---| +| `specs//scope.md` | Scope | Edit directly; orchestrator re-reads after user edits | +| `specs//research.md` | Research wave | Read-only during session; inspect at any time | +| `specs//design.md` | Design synthesis | Edit directly at Gate 2; orchestrator re-reads | +| `specs//tasks.md` | Plan | Read-only during session | +| `specs//session-summary.md` | Review (on accept) | Read-only; the primary handoff artifact | +| `specs//workflow-state.md` | All phases (orchestrator-owned) | Read only; do not edit manually during a session | + +--- + +### HITL gate designs + +All gate prompts follow a consistent structure: + +1. **One-line context anchor** — names the feature slug and current phase so the user knows where they are. +2. **Structured content block** — the information the user must evaluate (criteria list, design summary, or verdict table). +3. **Explicit options** — each option has a label and a one-sentence description of what happens next. +4. **Escape hatch** — every gate offers an abort or abandon path so the user never feels trapped. + +--- + +#### Gate 1 — Scope confirmation (REQ-ORCH-008) + +Presented after the grill skill completes EARS extraction. The user must be able to evaluate this in under 10 seconds for a well-scoped problem (≤5 criteria). + +**Trigger:** grill skill returns structured criteria. Orchestrator writes `workflow-state.md` before displaying. + +**Prompt structure:** + +``` +Goal-loop · [feature-slug] · Scope confirmation + +Issue: [issue title if from GitHub issue, else omitted] + +I extracted the following acceptance criteria from your description. +Review each one — if anything is wrong or missing, choose "Edit" below. + +ACCEPTANCE CRITERIA +─────────────────── +1. [EARS criterion 1] +2. [EARS criterion 2] +3. [EARS criterion 3] +... + +What would you like to do? + + A Approve — looks right. Start the research phase. + E Edit — open specs/[slug]/scope.md, make changes, reply "done". + X Abort — stop here. No further artifacts will be written. +``` + +**Option definitions:** + +| Label | Option | What happens next | +|---|---|---| +| A | Approve | Orchestrator advances to research wave. Status message confirms: "Scope approved. Starting research wave." | +| E | Edit | Orchestrator outputs the file path to `specs//scope.md` and waits. On "done", orchestrator re-reads the file, re-extracts criteria, and re-presents Gate 1. | +| X | Abort | Orchestrator outputs: "Session aborted. scope.md has been written to specs/[slug]/scope.md for reference. No other artifacts were produced." Session ends. | + +**Design rationale:** The criteria are presented as a flat numbered list rather than a rich table to keep the gate skimmable. The edit path is file-based, not in-chat, because in-chat editing of structured data (EARS clauses) has high error rates and breaks the artifact-as-memory model. + +--- + +#### Gate 2 — Design approval (REQ-ORCH-011) + +Presented after the architect subagent produces `design.md`. This is the last affordable correction point before implementation begins. + +**Trigger:** Architect subagent returns; `design.md` is written. Orchestrator writes `workflow-state.md` before displaying. + +**Prompt structure:** + +``` +Goal-loop · [feature-slug] · Design approval + +The architect has produced a design document (specs/[slug]/design.md). +Here is the inline summary — the full document is available at that path. + +DESIGN SUMMARY +────────────── +Architecture decisions: + · [Decision 1 — one sentence] + · [Decision 2 — one sentence] + · [Decision 3 — one sentence] + +Key components: + · [Component 1 — one sentence role] + · [Component 2 — one sentence role] + +Risks flagged: + · [Risk 1 — one sentence] + · [Risk 2 — one sentence] + +What would you like to do? + + A Approve — proceed to planning and implementation. + E Edit — open specs/[slug]/design.md, make changes, reply "done". + R Reject — provide a reason and I will restart the research phase with your feedback. +``` + +**Option definitions:** + +| Label | Option | What happens next | +|---|---|---| +| A | Approve | Orchestrator advances to plan phase. Status: "Design approved. Decomposing into implementation tasks." | +| E | Edit | Orchestrator outputs path to `design.md` and waits. On "done", orchestrator re-reads and re-presents Gate 2 with the updated summary. | +| R | Reject | Orchestrator asks: "Briefly describe what is wrong with this design." User replies with free text. Orchestrator records the rejection reason in `workflow-state.md` and re-enters the research phase with the rejection as additional context. Status: "Returning to research phase with your feedback." | + +**Design rationale:** The summary is rendered in three fixed sections (decisions, components, risks) because these are the three categories a developer needs to validate before approving implementation. Unrestricted summaries would vary in structure and be harder to scan. The reject path explicitly captures a reason to prevent the research phase from reproducing the same design. + +--- + +#### Gate 3 — Review verdict (REQ-ORCH-015) + +Presented after reviewer and QA subagents validate implementation output against the EARS acceptance criteria. + +**Trigger:** All implement waves complete; reviewer + QA subagents return verdict. Orchestrator writes `workflow-state.md` before displaying. + +**Prompt structure:** + +``` +Goal-loop · [feature-slug] · Review verdict + +Implementation is complete. The reviewer has validated each acceptance criterion. + +ACCEPTANCE CRITERIA — REVIEW RESULTS +────────────────────────────────────── +1. [Criterion text] PASS [one-line evidence] +2. [Criterion text] PASS [one-line evidence] +3. [Criterion text] FAIL [one-line explanation of gap] +4. [Criterion text] PASS [one-line evidence] + +Overall: [N] passed, [M] failed. + +What would you like to do? + + A Accept — write session summary and close this goal-loop. + T Targeted revision — specify which criterion to fix; I will re-run only the affected tasks. +``` + +**Option definitions:** + +| Label | Option | What happens next | +|---|---|---| +| A | Accept | Orchestrator produces `session-summary.md`, updates `workflow-state.md` to `complete`, displays artifact list. | +| T | Targeted revision | Orchestrator asks: "Which criterion number(s) need revision, and what should change?" User replies. Orchestrator re-enters implement waves for affected tasks only, with reviewer findings attached as context. | + +**Targeted revision follow-up prompt:** + +``` +Which criterion number(s) should be revised? You can name multiple (e.g., "3" or "2, 3"). +Optionally describe what the correct behaviour should be: +``` + +**Design rationale:** Showing pass/fail per criterion with evidence lets the user make a targeted decision rather than a binary accept/reject. The targeted revision path re-runs only affected tasks to avoid discarding work on passing criteria. All-fail outcomes still offer the accept path — the user may decide partial results are sufficient for their purposes. + +--- + +#### Stall gate — Subagent stall escalation (REQ-ORCH-014) + +Presented when a subagent has completed three consecutive retry iterations without producing progress. + +**Trigger:** Stall counter reaches 3 for a given task. Orchestrator writes `workflow-state.md` (stall noted) before displaying. + +**Prompt structure:** + +``` +Goal-loop · [feature-slug] · Task stalled + +Task [T-ORCH-NNN] in wave [N] has not made progress after 3 attempts. + +TASK DETAILS +──────────── +Task: [task title] +Phase: Implement wave [N] +Retries: 3 + +Last agent output (summarised): + [2–4 sentence summary of what the subagent reported or attempted] + +What would you like to do? + + R Retry — dispatch the agent again for this task. + S Skip — mark this task as deferred and continue with the remaining waves. + Note: tasks that depend on this one will also be deferred. + X Abort session — stop all implementation. A partial session summary will be written + listing completed tasks, deferred tasks, and the reason for stopping. +``` + +**Option definitions:** + +| Label | Option | What happens next | +|---|---|---| +| R | Retry | Orchestrator resets the stall counter for this task and dispatches the subagent again. If stall recurs, gate is presented again. | +| S | Skip | Task and all dependent tasks are marked `deferred` in `workflow-state.md`. Orchestrator continues with remaining independent waves. Deferred tasks appear in `session-summary.md` under "Open follow-ups." | +| X | Abort session | Orchestrator writes partial `session-summary.md` with completed tasks, deferred tasks, and stop reason. `workflow-state.md` is updated to `aborted`. Orchestrator outputs: "Session aborted. Partial session summary written to specs/[slug]/session-summary.md." | + +**Design rationale:** The "Skip" option explicitly names the cascade effect (dependent tasks are also deferred) because a user who does not understand DAG dependencies may skip a foundational task and then wonder why later tasks were not run. Surfacing this in the option description prevents confusion. + +--- + +#### Resume prompt (REQ-ORCH-022) + +Presented when the orchestrator detects an in-progress goal-loop on session start. + +**Prompt structure:** + +``` +Goal-loop · [feature-slug] · Session found + +I found an in-progress goal-loop session for "[feature-slug]". +It was interrupted at: [phase name] (Gate [N] pending your decision). + +Last saved: [timestamp from workflow-state.md] +Artifacts produced so far: [comma-separated list] + +What would you like to do? + + C Continue — resume from Gate [N] with the previously extracted content. + R Restart — discard this session's state and start the scope phase again. + A Abandon — leave the partial artifacts in specs/[slug]/ and start a new session with a different problem. +``` + +**Option definitions:** + +| Label | Option | What happens next | +|---|---|---| +| C | Continue | Orchestrator replays Gate N with its original content from `workflow-state.md`. The user sees the gate as if the session never interrupted. | +| R | Restart | Orchestrator clears the phase state in `workflow-state.md` (preserves artifact files) and re-enters the scope phase. | +| A | Abandon | Orchestrator marks the session abandoned in `workflow-state.md`. Outputs the paths of any artifacts that were written. Accepts a new problem statement immediately. | + +--- + +### Empty / loading / error states + +In a CLI conversational context, "empty", "loading", and "error" states are inline orchestrator messages — not visual components. Each state must name what is happening, why it matters to the user, and what the user can do. + +--- + +#### Empty states + +**No problem statement detected (on session open)** + +Shown when the user's opening message is not recognisable as a problem statement, slash command, or issue reference. + +``` +Welcome to Specorator. + +To start a goal-loop session, describe your problem or paste a GitHub issue reference. + + Examples: + "Add rate limiting to the API gateway" + "#501" + "https://github.com/org/repo/issues/501" + +Or use any /spec:* command directly if you prefer manual control. +``` + +**No in-progress session detected (on attempted resume)** + +Shown when the user types "resume" or similar but there is no `workflow-state.md` with an in-progress goal-loop. + +``` +No in-progress goal-loop session found for this repository. + +To start a new session, describe your problem or paste a GitHub issue reference. +``` + +**Research wave returns no findings** + +Shown after the research wave if all analyst subagents return empty or unusable output. + +``` +Goal-loop · [feature-slug] · Research returned no findings + +The research wave completed but produced no usable findings. This can happen when +the problem scope is very narrow or the analysts' questions were too broad to answer +from available context. + +Proceeding to design synthesis with the scope criteria only. +If the resulting design.md is not adequate, use the Reject option at Gate 2 +to provide additional context for a second research pass. +``` + +Design proceeds; the user is not blocked but is warned. + +--- + +#### Loading / progress states + +These are inline status messages displayed synchronously as the orchestrator advances between phases. They are not interactive. + +| Phase transition | Message displayed | +|---|---| +| Gate 1 approved → research | `Scope confirmed. Starting research wave — dispatching [N] analyst agent(s)...` | +| Research complete → synthesis | `Research complete — [N] finding(s) merged into research.md. Producing design document...` | +| Gate 2 approved → plan | `Design approved. Decomposing into implementation tasks...` | +| Plan complete → implement wave 1 | `Plan ready — [N] tasks across [M] wave(s). Starting wave 1...` | +| Wave N complete → wave N+1 | `Wave [N] complete — [K] task(s) merged. Advancing to wave [N+1] of [M]...` | +| Final wave complete → review | `All [M] waves complete. Starting review...` | +| Review complete → Gate 3 | `Review complete. Presenting verdict.` | +| Gate 3 accepted → summary | `Writing session summary...` | +| Summary written → done | `Goal-loop complete. Artifacts written to specs/[slug]/.` | + +--- + +#### Error states + +**Precondition check failure — missing artifact (REQ-ORCH-003)** + +Shown when the orchestrator's pre-flight check finds a required predecessor artifact absent or empty before dispatching a subagent. + +``` +Goal-loop · [feature-slug] · Missing prerequisite + +Cannot advance to the [phase] phase because [artifact-filename] is absent or empty. + +Expected path: specs/[slug]/[artifact-filename] + +Options: + · If you edited this file and it should exist, reply "check again" and I will retry. + · If you want to abort this session, reply "abort". +``` + +**Specific wording by artifact:** + +| Missing artifact | Message extension | +|---|---| +| `scope.md` | "The scope document is missing. This usually means the scope phase was not completed. Reply 'restart scope' to re-run it." | +| `research.md` | "The research document is missing. Reply 'restart research' to re-run the research wave." | +| `design.md` | "The design document is missing. Reply 'restart design' to re-run design synthesis." | +| `tasks.md` | "The task plan is missing. Reply 'restart plan' to re-run the plan phase." | + +**Issue fetch failure (REQ-ORCH-007)** + +Shown when the orchestrator cannot read the GitHub issue body. + +``` +Goal-loop · Could not fetch issue + +Could not read GitHub issue [#NNN or URL]. + +Possible reasons: + · The issue number does not exist in this repository. + · The repository is private and the session does not have read access. + · The GitHub API is unavailable. + +Options: + · Paste the issue title and description as free text and I will use that as the scope context. + · Check the issue reference and reply with the corrected number. + · Reply "abort" to stop. +``` + +**Grill skill extraction failure** + +Shown when the grill skill cannot extract structured EARS criteria from the problem statement. + +``` +Goal-loop · [feature-slug] · Scope extraction incomplete + +I was not able to extract clear acceptance criteria from your description. +This usually means the problem statement is too broad or contains conflicting goals. + +I have saved what I could extract to specs/[slug]/scope.md (possibly partial). + +Options: + · Open that file, add or clarify the acceptance criteria, then reply "done". + · Reply with a narrower problem description and I will try again. + · Reply "abort" to stop. +``` + +**Wave execution failure — subagent returns error** + +Shown when a subagent in an implement wave returns an explicit error (as distinct from a stall). + +``` +Goal-loop · [feature-slug] · Task failed + +Task [T-ORCH-NNN] in wave [N] returned an error. + +Task: [task title] +Error: [one-sentence description of what the subagent reported] + +Options: + · Reply "retry" to dispatch this task again. + · Reply "skip" to mark this task deferred and continue. + · Reply "abort" to stop the session and write a partial summary. +``` + +**Session state corrupted or unreadable** + +Shown when `workflow-state.md` exists but cannot be parsed (e.g., manually edited and broken). + +``` +Goal-loop · Session state unreadable + +I found a workflow-state.md at specs/[slug]/workflow-state.md but could not parse it. + +To recover: + · If you want to restart this goal-loop from scratch, reply "restart". + · If you believe the file is valid, reply "check again". + · To abandon this session entirely, reply "abandon". Artifact files in specs/[slug]/ will remain. +``` + +--- + +### Accessibility considerations + +In a CLI conversational context, accessibility means: language clarity, progressive disclosure, discoverability of options, and recovery paths. The WCAG visual conformance model does not apply here; the principles behind it do. + +**Language clarity** + +- All orchestrator messages use plain English at a reading level accessible to a mid-career developer unfamiliar with Specorator's terminology. +- EARS notation is not explained at every gate. Gate 1 presents criteria as a numbered list without labelling them "EARS criteria" — a user does not need to know the term to evaluate whether the list captures their intent. +- Jargon in error messages is avoided. "The architect subagent" is written as "the design document generator" in error copy where the subagent identity is irrelevant to the recovery action. +- All technical terms that a user must act on (file paths, task IDs) are presented on their own line, not embedded in a sentence, so they are easy to copy. + +**Option discoverability** + +- Every `AskUserQuestion` gate lists all available options explicitly. There are no hidden commands. +- Option labels use single uppercase letters (A, E, R, X, T, S, C) that are easy to type. The full option word follows so the label is always intelligible without memorisation. +- The abort/abandon path is always the last option so it does not accidentally attract the user's attention before they read the other options. + +**Progressive disclosure** + +- The resume prompt shows only the phase name and gate number, not the full gate content, until the user chooses "Continue." This prevents information overload when a user opens a session without knowing it has in-progress state. +- The design summary at Gate 2 is capped at three sections (decisions, components, risks) and a maximum of five bullets per section. The user is explicitly directed to the file path for the full document. +- The stall gate shows a "summarised" last output, not a full transcript. Long transcripts would bury the recovery options. + +**Recovery path completeness** + +Every error state provides at least one explicit forward path. No state leaves the user with only a description of what went wrong. Specifically: + +- Every error that names a file path also names the action the user can take on that file. +- Every fetch or network error offers a paste-as-text fallback so the user is never blocked by an unavailable external resource. +- Every gate offers an abort/abandon option so the user can always exit cleanly. + +**Keyboard / interaction model** + +Because this is a CLI chat interface, "keyboard navigation" means: the user types their response and presses Enter. There are no tab-stops, focus traps, or pointer interactions. The design ensures: + +- Option labels (A, E, R, X, T, S, C) are single characters so the user never needs to type a long string to select an option. +- When the user must provide free text (targeted revision reason, rejection reason), the prompt clearly states that free text is expected. It does not present a structured option list for those responses. +- All confirmations that require the user to open a file and return to the chat explicitly state the trigger phrase to continue (e.g., "reply 'done'"). The user is never left wondering what to type to resume. + +**Screen-reader parity** + +Claude Code's CLI output is plain text rendered in a terminal. There are no icon-only buttons, no images, and no non-text content requiring `aria-label`. The ASCII separator lines (`───────────────────`) used for visual grouping in gate prompts are decorative and acceptable in this medium; they are not structural elements. If a screen reader reads them aloud, the content remains fully intelligible without them. + +--- + +### Requirements coverage — Part A + +| REQ ID | Requirement summary | Addressed in Part A | +|---|---|---| +| REQ-ORCH-006 | Goal-loop entry from free-text problem statement | Flow A1 | +| REQ-ORCH-007 | Goal-loop entry from GitHub issue reference | Flow A2; error state: issue fetch failure | +| REQ-ORCH-008 | Scope phase: EARS extraction and Gate 1 HITL | Gate 1 design; Flow A1/A2 | +| REQ-ORCH-009 | Research wave: parallel analyst dispatch | Flow A4 status messages | +| REQ-ORCH-010 | Research wave: de-duplicated synthesis | Flow A4 status messages | +| REQ-ORCH-011 | Design synthesis: Gate 2 HITL | Gate 2 design; Flow A4/A5 | +| REQ-ORCH-012 | Plan phase: tasks.md with DAG edges | Flow A5; plan transition message | +| REQ-ORCH-013 | Implement waves: parallel wave dispatch | Flow A6 status messages | +| REQ-ORCH-014 | Stall detection: escalation after 3 retries | Stall gate design; Flow A8 | +| REQ-ORCH-015 | Review phase: Gate 3 HITL | Gate 3 design; Flow A7 | +| REQ-ORCH-016 | Session summary on loop completion | Flow A9 | +| REQ-ORCH-022 | workflow-state.md written before every AskUserQuestion | Noted at each gate trigger condition | +| REQ-ORCH-023 | /issue:tackle absorbed as orchestrator entry mode | Flow A3 | +| REQ-ORCH-003 | Pre-flight precondition check | Error state: missing artifact | + +**Requirements not in Part A scope (covered by Part B or Part C):** + +| REQ ID | Part | +|---|---| +| REQ-ORCH-001 | Part C (architect) — Agent tool dispatch mechanism | +| REQ-ORCH-002 | Part C — workflow-state.md ownership | +| REQ-ORCH-004 | Part C — model selection | +| REQ-ORCH-005 | Part C — slash command backward compatibility | +| REQ-ORCH-017 | Part C — plugin manifest | +| REQ-ORCH-018 | Part C — settings.json | +| REQ-ORCH-019 | Part C — build pipeline | +| REQ-ORCH-020 | Part C — agent frontmatter validation | +| REQ-ORCH-021 | Part C — non-plugin-user compatibility | + +--- + +## Part B — UI + +### Key screens / states + +In a CLI conversational context, a "screen" is a distinct orchestrator output state that the user encounters during a goal-loop session. Each state is triggered by a specific system event and follows a defined content pattern. Twelve states cover the full session lifecycle. + +| State | Trigger | Purpose | Content pattern | +|---|---|---|---| +| Welcome | Session opens; no in-progress session detected; first message is not a recognisable problem statement or command | Orient a new user. | Plain text: welcome line, two examples of valid input, mention of `/spec:*` fallback. No separator line. | +| Gate 1 — Scope confirmation | grill skill returns structured EARS criteria. `workflow-state.md` written first. | User approves, edits, or aborts the extracted scope before any autonomous work begins. | Gate header → numbered criteria list (ACCEPT CRITERIA block) → option list (A / E / X). | +| Progress — research wave | Gate 1 approved. | Prevent the user from thinking the session has stalled during autonomous analyst dispatch. | Single status line with `→` prefix, phase label, count of agents dispatched. No interaction expected. | +| Progress — design synthesis | Research wave complete and merged. | Signal transition from research to design. | Single status line with `→` prefix and artifact name. | +| Gate 2 — Design approval | Architect subagent writes `design.md`. `workflow-state.md` written first. | User approves, edits, or rejects the design before implementation begins. | Gate header → three-section DESIGN SUMMARY block (decisions · components · risks) → option list (A / E / R). File path shown in code span. | +| Progress — plan phase | Gate 2 approved. | Signal decomposition is running. | Single status line with task and wave counts. | +| Progress — implement wave N | Each wave starts. | Confirm parallel execution is underway; prevent perceived stall during multi-minute waves. | Status line: wave N of M, task count, "in parallel worktrees". | +| Stall gate | Stall counter reaches 3 for a given task. `workflow-state.md` written first. | Surface a stuck task and give the user explicit control before any further retry. | Gate header → TASK DETAILS block (task ID, phase, retry count, summarised last output) → option list (R / S / X). | +| Gate 3 — Review verdict | Reviewer and QA subagents return. `workflow-state.md` written first. | User accepts results or requests targeted revision of specific failing criteria. | Gate header → ACCEPTANCE CRITERIA — REVIEW RESULTS table (criterion · PASS/FAIL · one-line evidence) → overall tally → option list (A / T). | +| Session summary | Gate 3 accepted; `session-summary.md` written; `workflow-state.md` updated to `complete`. | Close the loop; give the user paths to all produced artifacts. | Single confirmation line → bulleted artifact list in code spans. | +| Resume prompt | Session opens; `workflow-state.md` exists with an in-progress goal-loop. | Let the user resume, restart, or abandon without re-running prior phases. | Gate header → interrupted-at line → last-saved timestamp → comma-separated artifact list → option list (C / R / A). | +| Error state | Various: missing artifact, issue fetch failure, grill failure, wave execution error, corrupted session state. | Name what failed, explain in one sentence, offer at least one forward path. | Gate header (naming the error type) → one-sentence cause → labeled forward-path options as middle-dot bullets. | + +--- + +### Components + +In a CLI context, "components" are repeatable output patterns — formatted text blocks that appear across multiple states. Six patterns cover the full goal-loop surface. + +#### 1. Progress banner + +Used for all non-interactive status messages between phases. The user must not reply to a progress banner. + +Format: + +``` +→ [phase-label] +``` + +Rules: +- Prefix is `→` (U+2192) followed by a single space, then the phase label in square brackets, then a space, then the message. +- The phase label uses the tokens defined in the Tokens section below. +- The message is one sentence, present continuous tense, ending with `...` when work is still running, or a period when the step is complete. +- No separator line before or after a progress banner. It is inline with the conversation flow. +- Never begins with "I" — write "Fetching issue #501..." not "I am fetching issue #501...". + +Example (research wave start): + +``` +→ [research-wave] Dispatching 3 analyst agent(s)... +``` + +Example (research complete): + +``` +→ [research-wave] 3 finding(s) merged into `specs/auth-rework/research.md`. +``` + +#### 2. Gate header + +Used to open every AskUserQuestion gate call and every stall or error state that requires a user decision. The gate header visually separates the interactive state from the preceding progress stream. + +Format: + +``` +Goal-loop · [feature-slug] · [Gate name] +``` + +Rules: +- The full line is the first line of the gate block. No blank line before it within the AskUserQuestion call. +- `Goal-loop` is literal, sentence-case. `·` (middle dot, U+00B7) is the separator; one space on each side. +- `[feature-slug]` is the derived slug in kebab-case, not in code span — it is display text. +- `[Gate name]` is the human-readable gate name in sentence case, followed by nothing (no period, no colon). +- On the line immediately following the gate header, print an ASCII separator: `───────────────────` (em-dash-style box-drawing characters). This is the only separator style used; do not use `---` or blank lines as visual separators within gate blocks. + +Example: + +``` +Goal-loop · auth-rework · Scope confirmation +─────────────────── +``` + +This separator is the same character used in the UX Part A gate sketches; it is consistent with the existing convention established there. + +#### 3. Criteria list + +Used inside Gate 1 (scope confirmation) to present EARS acceptance criteria for user review. + +Format: + +``` +ACCEPTANCE CRITERIA +─────────────────── +1. [EARS criterion text — full sentence as produced by the grill skill] +2. [EARS criterion text] +3. [EARS criterion text] +``` + +Rules: +- The block heading `ACCEPTANCE CRITERIA` is all-caps (consistent with the other block headings in Part A gate sketches: `TASK DETAILS`, `DESIGN SUMMARY`). No colon. +- Followed immediately by a separator line (same style as the gate header separator). +- Criteria are numbered, not bulleted. Numbers are flush-left. +- Each criterion is one line. If a criterion wraps in the terminal, the continuation is indented two spaces to align under the first character of the criterion text. Do not truncate long criteria. +- EARS pattern type (Ubiquitous, Event-driven, etc.) is not shown to the user. The user evaluates the criterion text, not the pattern label. Pattern metadata lives in `scope.md`. +- No trailing punctuation is added to criteria; they are presented as the grill skill produced them. + +#### 4. Pass/fail verdict table + +Used inside Gate 3 (review verdict) to show criterion-by-criterion review results. This is the primary decision support for the most consequential gate. + +Format: + +``` +ACCEPTANCE CRITERIA — REVIEW RESULTS +────────────────────────────────────── +1. [Criterion text, truncated at 52 chars if needed] PASS [one-line evidence] +2. [Criterion text] PASS [one-line evidence] +3. [Criterion text] FAIL [one-line gap description] +4. [Criterion text] PASS [one-line evidence] + +Overall: [N] passed, [M] failed. +``` + +Rules: +- The block heading and separator follow the same convention as the criteria list. +- `PASS` and `FAIL` are all-caps, fixed-width column (6 chars including trailing space). Align the evidence column after the verdict. +- Criterion text is left-aligned, padded to a fixed width. If the full criterion text would make the row exceed 100 characters, truncate with `...` at 52 characters. The full text is always in `scope.md`. +- Evidence is plain text, maximum one line. Do not wrap evidence; truncate at 60 characters with `...` if the reviewer returned more. +- The `Overall` tally line is separated from the table by one blank line. +- PASS/FAIL is never indicated by color or symbol only — the words `PASS` and `FAIL` are always spelled out. This ensures accessibility in monochrome and screen-reader contexts. + +#### 5. Artifact link + +Used when the orchestrator references a file produced or consumed during the session. File paths are always shown in code spans and relative to the repository root. + +Format: `` `specs//` `` + +Rules: +- Always relative to repository root. Never absolute. +- Always in a code span (backtick-delimited in Markdown; rendered as monospace in Claude Code output). +- When listing multiple artifacts (e.g., in the session summary or session resume prompt), use a bullet list where each bullet contains exactly one code span. +- When a file path appears inside a sentence, it remains in a code span but is not put on its own line. +- Never use a trailing slash for directory references; always name the specific file. + +Example (in session summary): + +``` +Artifacts produced: + · `specs/auth-rework/scope.md` + · `specs/auth-rework/research.md` + · `specs/auth-rework/design.md` + · `specs/auth-rework/tasks.md` + · `specs/auth-rework/session-summary.md` +``` + +#### 6. Option labels + +Used in every AskUserQuestion gate call. Options are the only interactive elements in a goal-loop session. + +Format: + +``` + [LETTER] [Option word] — [one-sentence description of what happens next] +``` + +Rules: +- Two spaces before the letter, two spaces after, then the option word in sentence case, then an em-dash (` — `), then the consequence. +- Letters are single uppercase characters. The letter mnemonically matches the option word where possible (A = Approve, E = Edit, R = Reject/Retry, X = Abort, T = Targeted revision, S = Skip, C = Continue). +- Option words are 1–5 words, imperative mood: `Approve`, `Edit`, `Reject`, `Retry`, `Skip`, `Abort session`, `Targeted revision`, `Continue`, `Restart`, `Abandon`. +- The consequence is one sentence, plain English, no jargon. It describes what the orchestrator will do next, not what the user should do. +- The abort/abandon/abort-session option is always last in the list. +- No period at the end of the consequence line. +- Inline notes (e.g., the cascade warning for the Skip option at the stall gate) appear indented under the consequence on the next line, prefixed with ` ` (five spaces to align under the consequence text). + +--- + +### Tokens + +In a CLI conversational context, tokens are formatting conventions — the rules for when and how to use Markdown emphasis, separators, prefixes, and path notation. These conventions are derived from examining existing SKILL.md files and agent definitions in the codebase, then extended only where the goal-loop surface requires something not yet defined. + +#### Emphasis conventions + +| Need | Convention | Rationale | +|---|---|---| +| File path or command in running text | `` `code span` `` | Consistent with all existing SKILL.md files and agent definitions. | +| Block heading inside a gate or error state | `ALL-CAPS PLAIN TEXT` (no Markdown bold) | All-caps headings are used in Part A gate sketches (`ACCEPTANCE CRITERIA`, `DESIGN SUMMARY`, `TASK DETAILS`). They render clearly in both Markdown and plain-text terminal output. | +| Inline emphasis of a key term or decision | `**bold**` | Used sparingly: only for a term the user must act on (e.g., the task ID in a stall gate). Not used for decoration. | +| Phase names in prose | `[phase-label]` in square brackets | Matches the phase label token style (see below). | +| Italic | Not used | Italic rendering is inconsistent across terminal emulators. The brand voice is direct; italics add no value here. | + +#### Separator style + +A single separator style is used throughout: + +``` +─────────────────── +``` + +This is the box-drawing character U+2500 (`─`), repeated. Length is 19 characters for the gate-header separator; length matches the heading width for content-block separators. This style is established in Part A gate sketches and is consistent with the existing tool-output convention in this codebase (e.g., the orchestrator agent's plain-text output block, the grill skill's output pattern). It renders cleanly as ASCII in all terminal emulators and is read aloud by screen readers as a series of dashes, which does not obscure content. + +`---` (Markdown horizontal rule) is not used inside gate blocks. It is valid in section breaks of Markdown documents (e.g., this design.md) but would conflict with Claude Code's Markdown rendering when embedded in conversational output. + +Blank lines are used as paragraph separators within a gate block (e.g., between the header block and the option list, between the verdict table and the overall tally). They are not used as visual dividers. + +No emoji prefixes. The Specorator brand uses zero emoji anywhere in its product page or documentation, and this convention extends to CLI output. + +#### Status line prefix + +All progress banners begin with `→` (U+2192) followed by a single space and the phase label. This character is used throughout the Specorator brand as the standard "next" or "advancing" indicator (SKILL.md files, the design system README). It is the only prefix used for progress banners. + +Error and forward-path option bullets use `·` (middle dot, U+00B7) as the bullet character. This is consistent with the existing brand convention for meta-item separation and option lists outside AskUserQuestion gates (e.g., the forward-path options in error states that do not warrant a full gate call). + +Inside AskUserQuestion option lists, the prefix is the option letter followed by two spaces (not a bullet character). + +#### File path format + +All file paths: +- Are relative to the repository root. +- Are wrapped in a code span: `` `specs//.md` ``. +- Use forward slashes regardless of platform. +- Never use `./` or `../` prefixes. +- Are listed one per bullet when appearing in a list. + +#### Phase label convention + +Phase labels appear inside square brackets at the start of progress banners and in gate headers. The canonical set for this feature: + +| Phase | Label | +|---|---| +| Scope phase | `[scope]` | +| Research wave | `[research-wave]` | +| Design synthesis | `[design]` | +| Plan phase | `[plan]` | +| Implement wave | `[wave-N]` where N is the wave number (e.g., `[wave-1]`, `[wave-2]`) | +| Review phase | `[review]` | +| Session complete | `[done]` | + +Gate headers do not use phase labels — they use the human-readable gate name after the feature slug (e.g., "Scope confirmation", "Design approval", "Review verdict"). + +--- + +### Content + +#### Tone and vocabulary + +**Voice in status messages.** The orchestrator speaks in first person, present continuous or simple present. "Dispatching 3 analyst agent(s)..." not "3 analyst agents have been dispatched." Short, direct, no hedging. This matches the Specorator brand voice: "opinionated, predictable, direct." + +**Tense rules by message type:** + +| Message type | Tense | Example | +|---|---|---| +| Work in progress | Present continuous + `...` | `Dispatching 3 analyst agent(s)...` | +| Phase complete | Simple past, period | `3 finding(s) merged into research.md.` | +| Gate context sentence | Simple present | `The architect has produced a design document.` | +| Forward-path consequence | Simple future | `Orchestrator advances to the research wave.` | +| Error explanation | Simple past | `Could not read GitHub issue #501.` | + +**How to refer to subagents in user-facing copy.** Users do not need to know the word "subagent." Use the role's function instead: + +| Internal term | User-facing copy | +|---|---| +| analyst subagent | analyst agent | +| architect subagent | the design document generator (in error copy only); "the architect" (in gate copy where precision matters) | +| planner subagent | the task planner | +| dev/qa subagents | task agents | +| reviewer subagent | the reviewer | + +"Agent" is acceptable in user-facing copy because it is used natively by Claude Code and is familiar to the target persona (senior solo developer, small engineering team). "Subagent" is never used in user-facing copy. + +**How to refer to the goal-loop.** The goal-loop is never called that in user-facing output. Refer to it as "this session" (in progress messages and error states) or use no name at all when the context is clear. In session-boundary messages (resume prompt, session summary, abandon message), "goal-loop session" is acceptable because the user needs to understand they are interacting with a session object that can be resumed or abandoned. + +**Vocabulary for Gate 3 resolution options.** At Gate 3, the user is choosing what happens to the completed implementation work: + +- `Accept` — not "approve" (approve implies the work is conditional; accept implies the work is sufficient and the session can close). +- `Targeted revision` — not "fix" (too casual) and not "reject" (implies discarding all work; only specific tasks are re-run). + +**Vocabulary for Gate 1 and Gate 2 approval options.** At Gate 1 and Gate 2: +- `Approve` — confirms the extracted or produced artifact and authorises the next phase to begin. +- `Edit` — signals the user will modify the file and return. +- `Reject` — at Gate 2 only; signals the design is structurally wrong and research must restart with new context. +- `Abort` — stops the session cleanly and writes no further artifacts. + +**Forbidden words in orchestrator output.** Consistent with the Specorator brand voice rules: +- Never: "seamlessly", "magical", "revolutionary", "AI-powered", "leverage", "supercharge." +- Never: "subagent" in user-facing output. +- Never: "please" in status messages (it is padding; the user knows the orchestrator is not sentient). +- Never: "Something went wrong." Every error message names what went wrong and why. This is enforced by design goal DG3. + +#### Microcopy standards by state + +**Welcome message.** Sentence-case. Two sentences maximum for the orientation line. Examples are indented under a short lead-in. No period on example lines (they are illustrative, not statements). Mention of `/spec:*` commands is present but subordinate — it is the last sentence. + +**Gate context anchor sentences.** The first sentence of every gate block after the separator line names what the orchestrator did and what artifact resulted. "The architect has produced a design document (`specs/auth-rework/design.md`)." The user knows where to look before they see the options. + +**Error state heading.** The gate header for error states uses a descriptive name that names the problem category, not a generic "Error." Examples: "Missing prerequisite", "Could not fetch issue", "Scope extraction incomplete", "Task failed", "Task stalled", "Session state unreadable." Each name is a noun phrase in sentence case. No exclamation mark. + +**Forward-path options in error states.** Presented as middle-dot bullets, not as a full AskUserQuestion option list, when the error is recoverable by a free-text reply (e.g., "reply 'done'" or "reply 'abort'"). AskUserQuestion with labeled options (A/R/X) is used only when the error has discrete branching paths that warrant distinct handling. + +**Session summary.** Closes with a single confirmation line in simple past tense: "Goal-loop complete." followed by the path to `session-summary.md` on the next line as a code span. Then the artifact list. No closing salutation or congratulatory language. + +**Slug in user-facing copy.** The feature slug is displayed in kebab-case without code span formatting in gate headers (it is a display identifier). It is shown in a code span only when it appears as part of a file path. + +#### Requirements coverage — Part B + +| REQ ID | Requirement summary | Addressed in Part B | +|---|---|---| +| REQ-ORCH-006 | Goal-loop entry from free-text problem statement | Welcome state content pattern | +| REQ-ORCH-007 | Goal-loop entry from GitHub issue reference | Progress banner for issue fetch; error state for issue fetch failure | +| REQ-ORCH-008 | EARS extraction and Gate 1 HITL | Gate 1 formatting: gate header, criteria list component, option labels | +| REQ-ORCH-009 | Research wave parallel dispatch | Progress banner — research wave; phase label `[research-wave]` | +| REQ-ORCH-010 | Research synthesis | Progress banner — design synthesis transition line | +| REQ-ORCH-011 | Gate 2 HITL design approval | Gate 2 formatting: gate header, DESIGN SUMMARY block, option labels; artifact link convention | +| REQ-ORCH-012 | Plan phase tasks.md | Progress banner — plan phase | +| REQ-ORCH-013 | Implement waves parallel dispatch | Progress banner — implement wave N; phase label `[wave-N]` | +| REQ-ORCH-014 | Stall detection escalation | Stall gate: gate header, TASK DETAILS block, option labels (R/S/X) | +| REQ-ORCH-015 | Review phase Gate 3 HITL | Gate 3: gate header, pass/fail verdict table component, option labels (A/T) | +| REQ-ORCH-016 | Session summary artifact | Session summary state content pattern | +| REQ-ORCH-022 | workflow-state.md written before every AskUserQuestion | Gate header convention notes this; the gate header itself serves as the user signal that state has been persisted | +| REQ-ORCH-023 | /issue:tackle entry | No distinct UI treatment; absorbed into Flow A2 formatting | +| REQ-ORCH-003 | Pre-flight precondition check | Error state: Missing prerequisite; middle-dot forward-path bullets | + +--- + +## Part C — Architecture + +### System overview + +The goal-loop is a purely in-process, file-mediated orchestration system. There are no external services, no databases, and no network APIs beyond the GitHub MCP tool (issue reference reads) and Claude Code's native Agent tool (subagent dispatch). All state persists to disk in `specs//` Markdown files. The orchestrator is the root session agent; all other specialists are subagents that report back to it. + +```mermaid +graph TD + User([User]) -- problem statement / issue ref --> Orch + + subgraph "Root session agent — orchestrator" + Orch[Orchestrator
tools: Agent, Read, Write, Edit, AskUserQuestion] + GrillSkill[goal-loop conductor skill] + Orch --> GrillSkill + end + + GrillSkill -- scope phase --> ScopePhase[Scope phase
grill skill invocation] + ScopePhase -- writes --> ScopeMd[(specs/slug/scope.md)] + ScopePhase -- AskUserQuestion --> Gate1{Gate 1\nScope confirmation} + Gate1 -- approved --> ResearchWave + + subgraph "Research wave — parallel" + ResearchWave[Research wave scheduler
1–5 parallel Agent calls] + ResearchWave --> Analyst1[analyst subagent 1] + ResearchWave --> Analyst2[analyst subagent 2] + ResearchWave --> AnalystN[analyst subagent N] + end + ResearchWave -- synthesise + write --> ResearchMd[(specs/slug/research.md)] + ResearchMd --> DesignPhase + + subgraph "Design synthesis" + DesignPhase[Design synthesis phase] + DesignPhase --> ArchSub[architect subagent] + end + ArchSub -- writes --> DesignMd[(specs/slug/design.md)] + DesignMd -- AskUserQuestion --> Gate2{Gate 2\nDesign approval} + Gate2 -- approved --> PlanPhase + + subgraph "Plan phase" + PlanPhase[Plan phase] + PlanPhase --> PlannerSub[planner subagent] + end + PlannerSub -- writes --> TasksMd[(specs/slug/tasks.md)] + TasksMd --> DAGScheduler + + subgraph "Implement waves — topological order" + DAGScheduler[DAG wave scheduler\ntopological sort Kahn BFS] + DAGScheduler --> Wave1[Wave 1 executor] + Wave1 --> DevSub1[dev subagent\nworktree isolated] + Wave1 --> DevSub2[dev subagent\nworktree isolated] + Wave1 --> StallDet1[Stall detector\nretry counter per task] + StallDet1 -- 3 retries --> StallGate{Stall gate\nAskUserQuestion} + StallGate -- retry/skip/abort --> Wave1 + Wave1 -- all tasks complete --> WaveN[Wave N executor ...] + end + WaveN --> ReviewPhase + + subgraph "Review phase" + ReviewPhase[Review phase] + ReviewPhase --> ReviewSub[reviewer subagent] + ReviewPhase --> QASub[qa subagent] + end + ReviewSub -- verdict --> Gate3{Gate 3\nReview verdict\nAskUserQuestion} + Gate3 -- accepted --> Summary[Session summary writer] + Summary -- writes --> SessionSummaryMd[(specs/slug/session-summary.md)] + Summary -- updates --> WorkflowState[(specs/slug/workflow-state.md\ncomplete)] + + Orch -- writes before every gate --> WorkflowState + + subgraph "Plugin package" + PluginJson[.claude-plugin/plugin.json] + SettingsJson[settings.json\nagent: orchestrator] + BuildScript[build-claude-plugin.ts] + BuildScript -- generates --> PluginJson + BuildScript -- copies agents/skills/commands --> PluginBundle[claude-plugin/specorator/] + end +``` + +**Topology notes:** +- The orchestrator is the sole root session agent. Subagents cannot spawn further subagents (Claude Code platform hard limit). +- All parallelism is orchestrator-to-subagent only — a star topology with the orchestrator at centre. +- Worktree isolation is applied only to implementer (dev/qa) subagents. Research analyst subagents operate without worktree isolation (read-only research questions); the architect and planner subagents each receive their input artifact by path reference and write a single output file. +- The GitHub MCP tool is the only external resource the orchestrator reads directly; all other reads are from `specs//` files. + +--- + +### Components and responsibilities + +| Component | Type | Responsibility | Writes | Reads | Tools / invokes | +|---|---|---|---|---|---| +| **Orchestrator agent** | Root session agent | Drives the full goal-loop; owns all state transitions; dispatches all subagents; presents all HITL gates; enforces write boundary to `specs//` | `workflow-state.md`, `scope.md`, `research.md` (synthesis), `session-summary.md` | Any `specs//` artifact, `workflow-state.md` | Agent, Read, Write, Edit, AskUserQuestion | +| **goal-loop conductor skill** | Skill (`.claude/skills/goal-loop/`) | Encapsulates the six-phase sequencing logic invoked by the orchestrator; not a subagent — runs in the orchestrator's context | (via orchestrator) | Phase outputs | — | +| **Scope phase** | Phase within conductor | Invokes the grill skill to extract EARS criteria from the problem statement or GitHub issue body; writes `scope.md`; calls Gate 1 | `scope.md` | Problem statement or issue body | grill skill, AskUserQuestion | +| **Research wave scheduler** | Phase logic within conductor | Assesses scope surface area; determines researcher count (1–5); dispatches parallel Agent calls to analyst subagents; collects and de-duplicates outputs; writes merged `research.md` | `research.md` | `scope.md` | Agent (parallel calls to analyst subagent) | +| **Design synthesis phase** | Phase logic within conductor | Dispatches architect subagent with `scope.md` + `research.md` as inputs; waits for `design.md`; extracts inline summary; calls Gate 2 | (architect writes `design.md`) | `scope.md`, `research.md`, `design.md` | Agent (architect subagent), AskUserQuestion | +| **DAG wave scheduler** | Phase logic within conductor | Parses `tasks.md` for task nodes and `depends_on` edges; runs Kahn's BFS to produce topological wave list; persists wave schedule to `workflow-state.md`; advances wave by wave | `workflow-state.md` (wave_schedule field) | `tasks.md` | — | +| **Implement wave executor** | Phase logic within conductor | For each wave: dispatches dev/qa subagents in parallel with `isolation: worktree`; collects results; validates against task expected output; drives merge after wave completion | `workflow-state.md` (wave progress) | `tasks.md`, `workflow-state.md` (wave_schedule), `scope.md` | Agent (dev subagent, qa subagent, `isolation: worktree`) | +| **Stall detector** | Logic component within wave executor | Maintains per-task retry counter in `workflow-state.md` (stall_counters field); on third consecutive unproductive retry, halts dispatch and calls the stall gate | `workflow-state.md` (stall_counters) | Per-subagent return value | AskUserQuestion (stall gate) | +| **Review phase** | Phase logic within conductor | Dispatches reviewer and qa subagents with EARS criteria from `scope.md` as explicit validation targets; collects criterion-by-criterion verdict; calls Gate 3 | `workflow-state.md` (awaiting-hitl-3) | `scope.md` (criteria), all implemented artifacts | Agent (reviewer subagent, qa subagent), AskUserQuestion | +| **Session summary writer** | Phase logic within conductor | On Gate 3 acceptance: produces `session-summary.md` with decisions, criteria status, artifact list, traceability IDs, and open follow-ups; updates `workflow-state.md` to `complete` | `session-summary.md`, `workflow-state.md` (complete) | `scope.md`, `design.md`, `tasks.md`, `workflow-state.md`, all phase outputs | Write, Edit | +| **Plugin manifest** | Static artifact | Declares the Specorator plugin to Claude Code; makes the orchestrator the main session agent | — | — | — | +| **build-claude-plugin.ts** | Build script | Generates `.claude-plugin/plugin.json` from `package.json#version`; copies `.claude/agents`, `.claude/skills`, `.claude/commands` into `claude-plugin/specorator/`; rewrites relative Markdown links; supports `--check` mode for CI | `.claude-plugin/plugin.json`, `claude-plugin/specorator/**` | `.claude/**`, `.mcp.json`, `package.json` | Node.js fs | + +**Specialist subagent roles (dispatched by orchestrator; not new):** + +| Subagent | Role in goal-loop | Key inputs | Output artifact | +|---|---|---|---| +| analyst | Researcher in research wave | Bounded research question derived from scope | Section of `research.md` (merged by orchestrator) | +| architect | Design synthesis | `scope.md`, `research.md` | `specs//design.md` | +| planner | Plan phase | `scope.md`, `design.md` | `specs//tasks.md` with `depends_on` edges | +| dev | Implement wave | Task spec from `tasks.md`, worktree isolation | Code changes in worktree | +| qa | Implement wave and review | Task spec or EARS criteria from `scope.md` | Test results; review verdict contribution | +| reviewer | Review phase | EARS criteria from `scope.md`, all implemented artifacts | Criterion-by-criterion pass/fail verdict | + +--- + +### Data model + +#### workflow-state.md — extended fields for goal-loop + +The existing `workflow-state.md` schema (typed by ADR-0042 Zod reader) is extended with an optional `goal_loop` block. Fields without `goal_loop` are unaffected — the manual 11-stage command workflow continues to write only the existing fields. + +```yaml +# Existing fields (unchanged) +stage: implement +status: in-progress +feature: auth-rework +updated: 2026-05-13T14:23:00Z + +# New optional block — present only during a goal-loop session +goal_loop: + current_phase: implement # scope | research | design | plan | implement | review | complete | aborted + hitl_state: + gate: 2 # 1 | 2 | 3 | stall — which gate is pending + pending: false # true = orchestrator is waiting for user response + researcher_count: 3 # how many analyst subagents were dispatched + wave_schedule: + - wave: 1 + task_ids: [T-ORCH-001, T-ORCH-002] + - wave: 2 + task_ids: [T-ORCH-003] + stall_counters: + T-ORCH-003: 1 # retry count per task ID; reset on progress + artifacts_produced: + - specs/auth-rework/scope.md + - specs/auth-rework/research.md + - specs/auth-rework/design.md + - specs/auth-rework/tasks.md +``` + +Gate content for HITL gate replay (needed for session resume) is embedded in the `workflow-state.md` body as a Markdown block under a `## Gate content` heading. This keeps the checkpoint as a single file. + +See ADR-0047 for the full schema decision. + +--- + +#### specs/\/scope.md — new artifact + +Produced by the scope phase before Gate 1. User-editable (Gate 1 "Edit" path). Re-read by the orchestrator after user edits. + +```yaml +--- +id: SCOPE--001 +feature: +created: +source: free-text | github-issue- +ears_count: +--- +``` + +Body structure: +``` +# Scope — + +## Problem statement + + + +## Acceptance criteria + +1. [EARS criterion text] + Pattern: Ubiquitous | Event-driven | Unwanted | State-driven | Optional + Source: problem-statement | issue-#NNN + +2. [EARS criterion text] + Pattern: Event-driven + Source: problem-statement +``` + +Traceability note: `scope.md` criteria are the source for the Gate 3 pass/fail table and the `session-summary.md` criteria-status section. They are not the same as `requirements.md` (which is a full PRD produced by Stage 3) — `scope.md` is a lighter, session-scoped artifact. + +See ADR-0048 for the artifact decision. + +--- + +#### specs/\/session-summary.md — new artifact + +Produced at goal-loop completion (Gate 3 accepted) or on abort. Not user-editable during a session; the primary handoff and audit artifact. + +```yaml +--- +id: SESSION--001 +feature: +session_start: +session_end: +goal_loop_outcome: complete | aborted +artifacts_produced: + - specs//scope.md + - specs//research.md + - specs//design.md + - specs//tasks.md + - specs//session-summary.md +--- +``` + +Body sections (required, in order): + +1. **Decisions** — key architectural and scope decisions made during the session, with the gate at which each was confirmed. +2. **Acceptance criteria status** — pass/fail per EARS criterion (from `scope.md`), with one-line evidence per criterion. +3. **Artifacts produced** — list of file paths with one-sentence description of each artifact's role. +4. **Traceability** — maps REQ/T/TEST IDs to their artifact files for the session's scope. +5. **Open follow-ups** — deferred tasks (skipped via stall gate), unresolved failing criteria, and any open questions noted during the session. + +See ADR-0048 for the artifact decision. + +--- + +#### .claude-plugin/plugin.json — generated by build-claude-plugin.ts + +```json +{ + "name": "specorator", + "version": "", + "description": "Spec-driven agentic software development workflow for Claude Code.", + "author": { "name": "Luis Mendez" }, + "repository": "https://github.com/Luis85/agentic-workflow", + "license": "MIT" +} +``` + +No `agent` key in `plugin.json`. The agent declaration lives in `settings.json` (see below). This matches the current build script output (`buildExpectedManifest()` function in `build-claude-plugin.ts`). No changes to `plugin.json` structure are required for this feature. + +--- + +#### settings.json — agent key declaration + +```json +{ + "agent": "orchestrator" +} +``` + +Located at `claude-plugin/specorator/settings.json`. Declares the orchestrator as the main session agent when the Specorator plugin is enabled. Written by the build script (not currently generated — must be added as a `fileCopyPlan` entry in `build-claude-plugin.ts` from a canonical source file at `.claude/settings-plugin.json` or similar). + +--- + +### Data flow + +#### Happy path: problem statement → session summary + +``` +User submits problem statement + │ + ▼ +Orchestrator detects: not a slash command, not an issue ref + → writes workflow-state.md: {goal_loop: {current_phase: scope, hitl_state: {pending: false}}} + │ + ▼ +Scope phase (grill skill invoked in orchestrator context) + → grill skill asks clarifying questions until EARS criteria unambiguous + → orchestrator writes scope.md with extracted criteria + → orchestrator writes workflow-state.md: {goal_loop: {current_phase: scope, hitl_state: {gate: 1, pending: true}}} + → orchestrator calls AskUserQuestion (Gate 1) + │ + ├── User: Edit → user edits scope.md → replies "done" + │ → orchestrator re-reads scope.md → re-presents Gate 1 + │ + ├── User: Abort → session ends; scope.md remains on disk + │ + └── User: Approve → + orchestrator writes workflow-state.md: {goal_loop: {current_phase: research, hitl_state: {pending: false}}} + │ + ▼ +Research wave scheduler + → orchestrator assesses scope surface area → determines N (1–5) + → emits status banner: "→ [research-wave] Dispatching N analyst agent(s)..." + → issues N parallel Agent tool calls (analyst subagent, bounded question per subagent) + → each analyst returns its findings + → orchestrator de-duplicates and merges findings + → orchestrator writes research.md + → emits status: "→ [research-wave] N finding(s) merged into research.md." + │ + ▼ +Design synthesis phase + → orchestrator emits status: "→ [design] Producing design document..." + → dispatches architect subagent (inputs: scope.md path, research.md path) + → architect writes design.md to specs//design.md + → orchestrator reads design.md, extracts inline summary (decisions, components, risks) + → orchestrator writes workflow-state.md: {goal_loop: {current_phase: design, hitl_state: {gate: 2, pending: true}}} + → orchestrator calls AskUserQuestion (Gate 2) + │ + ├── User: Edit → user edits design.md → replies "done" + │ → orchestrator re-reads design.md → re-presents Gate 2 + │ + ├── User: Reject → orchestrator records rejection reason in workflow-state.md + │ → returns to research wave with rejection as additional context + │ + └── User: Approve → + orchestrator writes workflow-state.md: {goal_loop: {current_phase: plan}} + │ + ▼ +Plan phase + → orchestrator emits status: "→ [plan] Decomposing design into tasks..." + → dispatches planner subagent (inputs: scope.md, design.md) + → planner writes tasks.md with depends_on edges + → orchestrator reads tasks.md, runs topological sort → wave schedule + → orchestrator writes wave_schedule to workflow-state.md + → emits status: "→ [plan] N tasks across M wave(s). Starting wave 1..." + │ + ▼ +Implement waves (for each wave W): + → orchestrator writes workflow-state.md: {goal_loop: {current_phase: implement}} + → emits status: "→ [wave-W] Dispatching K task agent(s)..." + → issues K parallel Agent tool calls (dev/qa subagents, isolation: worktree, task spec per call) + → each subagent returns result or error + → orchestrator validates results; updates stall_counters for any unproductive tasks + → stall detection: if stall_counters[taskId] >= 3 → stall gate (see stall path below) + → all tasks in wave complete → orchestrator merges worktrees (via reviewer subagent) + → emits status: "→ [wave-W] K task(s) merged." + → advances to wave W+1, or exits to review phase if no more waves + │ + ▼ +Review phase + → orchestrator emits status: "→ [review] Validating against acceptance criteria..." + → dispatches reviewer subagent (inputs: scope.md criteria, all implemented artifact paths) + → dispatches qa subagent (inputs: scope.md criteria, test suite) + → orchestrator collects criterion-by-criterion verdict + → orchestrator writes workflow-state.md: {goal_loop: {current_phase: review, hitl_state: {gate: 3, pending: true}}} + → orchestrator calls AskUserQuestion (Gate 3) + │ + ├── User: Targeted revision → orchestrator asks which criteria; + │ re-enters implement waves for affected tasks only + │ + └── User: Accept → + orchestrator emits status: "→ [done] Writing session summary..." + orchestrator writes session-summary.md + orchestrator writes workflow-state.md: {goal_loop: {current_phase: complete}} + orchestrator displays artifact list +``` + +**Artifact authorship summary:** + +| Phase | Artifact written | Written by | +|---|---|---| +| Scope | `scope.md` | Orchestrator | +| Scope | `workflow-state.md` (phase updates) | Orchestrator | +| Research | `research.md` | Orchestrator (after merging analyst outputs) | +| Design synthesis | `design.md` | Architect subagent | +| Plan | `tasks.md` | Planner subagent | +| Implement waves | Code files in worktrees | Dev/qa subagents | +| Review | (verdict returned, not a file) | Reviewer/qa subagents | +| Completion | `session-summary.md` | Orchestrator | + +--- + +#### Stall path: implement wave → retry → escalation + +``` +Implement wave executor dispatches dev subagent for task T-NNN + │ + ▼ +Subagent returns result + │ + ├── Result shows progress → stall_counters[T-NNN] = 0 → continue + │ + └── Result shows no progress (substantively identical to previous attempt + OR subagent reports it cannot proceed) + → stall_counters[T-NNN] += 1 + │ + ├── stall_counters[T-NNN] < 3 → orchestrator retries immediately (re-dispatches) + │ + └── stall_counters[T-NNN] == 3 → + orchestrator writes workflow-state.md: {stall noted for T-NNN} + orchestrator calls AskUserQuestion (Stall gate) + │ + ├── User: Retry → stall_counters[T-NNN] = 0; re-dispatch + │ + ├── User: Skip → + │ task T-NNN marked deferred in workflow-state.md + │ all tasks with depends_on: [T-NNN] also marked deferred + │ orchestrator continues with remaining wave tasks + │ + └── User: Abort → + orchestrator writes partial session-summary.md + (completed tasks, deferred tasks, stop reason) + workflow-state.md updated to: aborted +``` + +--- + +### Interaction / API contracts (sketch) + +Full contracts go in `spec.md`. This section captures the bounded interface between the orchestrator and each specialist it dispatches. + +#### Orchestrator → grill skill (scope phase) + +The grill skill runs in the orchestrator's own context (not a subagent dispatch). It is a skill invocation, not an Agent call. + +- **Input:** The problem statement string (free text or GitHub issue title + body). +- **Behaviour:** The grill skill asks clarifying questions one at a time until goals, constraints, and acceptance criteria are unambiguous (per `.claude/skills/grill/SKILL.md` conventions). +- **Output:** Structured EARS criteria list — each criterion as a tuple: `(text: string, pattern: EARSPattern, source: string)`. +- **Pre-condition:** Problem statement is non-empty. +- **Post-condition:** At least one EARS criterion is extracted. If zero criteria are extracted, the scope phase emits the "Scope extraction incomplete" error state (Part A). +- **Side effect:** Orchestrator writes `scope.md` from the returned criteria list. + +--- + +#### Orchestrator → analyst subagent (research wave) + +- **Invocation:** Parallel `Agent` tool calls, one per subagent. Count N is 1–5, determined by scope surface area. +- **Input per subagent:** A bounded research question string derived from the scope. No two subagents receive the same question. Each subagent also receives the path to `scope.md` for context. +- **Expected output schema:** Unstructured text findings (the analyst role does not produce a typed artifact). The orchestrator merges and de-duplicates. +- **Pre-condition:** `scope.md` exists and is non-empty. +- **Post-condition:** At least one analyst returns findings (empty findings handled by the "Research returned no findings" state in Part A). +- **Side effect:** Orchestrator writes `research.md` from the merged findings. +- **Error:** If all analyst subagents return empty output, orchestrator surfaces the "Research wave returned no findings" message and proceeds to design synthesis with scope criteria only. + +--- + +#### Orchestrator → architect subagent (design synthesis) + +- **Invocation:** Single `Agent` tool call. +- **Input:** System prompt references `scope.md` and `research.md` by path; the architect reads them on dispatch. +- **Expected output:** The architect writes `specs//design.md` to disk. The orchestrator detects completion by checking that `design.md` is present and non-empty. +- **Pre-condition:** Both `scope.md` and `research.md` exist and are non-empty. +- **Post-condition:** `design.md` exists and is non-empty. +- **Side effect:** None beyond the artifact file. +- **Error:** If `design.md` is absent after the architect subagent returns, the orchestrator surfaces the "Missing prerequisite" error state. +- **Model:** If `SPECORATOR_HEAVY_MODEL` is set, the Agent call specifies that model for the architect (REQ-ORCH-004). + +--- + +#### Orchestrator → planner subagent (plan phase) + +- **Invocation:** Single `Agent` tool call. +- **Input:** System prompt references `scope.md` and `design.md` by path. +- **Expected output schema for tasks.md:** Each task entry must include: + - `id`: string (e.g., `T--NNN`) + - `title`: string + - `description`: string + - `depends_on`: list of task IDs (empty list if no dependencies) + - `expected_output`: string (one sentence describing the artifact or change produced) +- **Pre-condition:** `scope.md` and `design.md` exist and are non-empty. +- **Post-condition:** `tasks.md` exists; all `depends_on` references resolve to task IDs present in the same file; topological sort is acyclic. +- **Error:** If `tasks.md` is absent or contains a cyclic dependency, the orchestrator surfaces the "Missing prerequisite" error state and invites the user to restart the plan phase. + +--- + +#### Orchestrator → dev/qa subagent (implement wave) + +- **Invocation:** Parallel `Agent` tool calls, one per task in the wave. Each call specifies `isolation: worktree`. +- **Input per subagent:** A task spec block (task ID, title, description, expected output from `tasks.md`) + path to `scope.md` for acceptance criteria reference. +- **Expected output:** Subagent reports completion (code changes committed to its worktree) or an explicit error. +- **Pre-condition:** The task's `depends_on` tasks have all completed in prior waves. +- **Post-condition:** If successful, the worktree contains the changes described in `expected_output`. The orchestrator merges after each wave (via reviewer subagent). +- **Stall detection:** The orchestrator compares the subagent's return value to its previous return for the same task. If substantively identical or if the subagent reports it cannot proceed, `stall_counters[taskId]` is incremented. +- **Model:** If `SPECORATOR_HEAVY_MODEL` is set, dev subagents use that model (REQ-ORCH-004). + +--- + +#### Orchestrator → reviewer/qa subagent (review phase) + +- **Invocation:** Two separate `Agent` tool calls (reviewer + qa), issued in parallel or sequentially (implementation choice; the spec does not mandate ordering). +- **Input:** EARS criteria list from `scope.md` (verbatim criterion text) + paths to implemented artifacts. +- **Expected output schema:** For each EARS criterion, a verdict tuple: `(criterion_index: int, status: PASS | FAIL, evidence: string)`. The evidence string is one sentence. +- **Pre-condition:** All implement waves are complete; `scope.md` is present. +- **Post-condition:** Every criterion in `scope.md` has exactly one verdict entry. +- **Error:** If the reviewer subagent returns without covering all criteria, the orchestrator asks for a retry rather than presenting an incomplete Gate 3. +- **Model:** If `SPECORATOR_HEAVY_MODEL` is set, reviewer subagent uses that model (REQ-ORCH-004). + +--- + +### Key decisions + +| # | Decision | Rationale | Status | ADR | +|---|---|---|---|---| +| D1 | Scope intake via grill skill (EARS extraction) | EARS maps 1:1 to tests; grill is already the proven intake primitive | Resolved (idea.md) | — | +| D2 | Dynamic researcher count (1–5, orchestrator-determined) | Anthropic research shows performance gains plateau above 5 parallel agents | Resolved (idea.md) | — | +| D3 | Design.md file-based; inline summary at Gate 2 | File-based artifacts survive session boundaries; consistent with "spec is the memory" principle | Resolved (idea.md) | — | +| D4 | tasks.md extended with `depends_on` edges; wave schedule by topological sort | Reuses proven format; DAG edges are the only addition needed for wave-parallel execution | Resolved (idea.md) | — | +| D5 | `isolation: worktree` per implementer subagent | Prevents parallel write conflicts; no external infrastructure; native to Claude Code | Resolved (idea.md) | — | +| D6 | Review validation targets = EARS criteria from `scope.md` + requirements.md | Two-layer validation: human-declared intent + machine-checkable EARS clause coverage | Resolved (idea.md) | — | +| D7 | Plugin packaging: `.claude-plugin/plugin.json` + `settings.json {agent: orchestrator}` | Claude Code `settings.json agent` key is the supported mechanism for orchestrator-first entry | Resolved (idea.md) | — | +| D8 | Orchestrator tool list expanded to Agent, Read, Write, Edit, AskUserQuestion | Required to achieve dispatch authority, state ownership, and HITL gating | Accepted | ADR-0046 | +| D9 | goal-loop state persisted as extended fields in `workflow-state.md` | Single checkpoint file; session resume reads one file; additive to existing schema | Accepted | ADR-0047 | +| D10 | `scope.md` and `session-summary.md` introduced as new canonical artifact types | Single-responsibility artifacts with distinct ownership, user-editability rules, and templates | Accepted | ADR-0048 | +| D11 | `settings.json` added as a new `fileCopyPlan` entry in `build-claude-plugin.ts` | The agent key must be in the plugin bundle's `settings.json`; the build script is the correct generator | Architecture-level (Part C) | — | +| D12 | Gate content embedded in `workflow-state.md` body for session resume | Single checkpoint file; no new artifact for transient state; cleared after gate resolution | Architecture-level (Part C) | — | + +--- + +### Alternatives considered + +**LangGraph / CrewAI (Alternative A in research.md):** Rejected. Requires a persistent checkpointer backend (PostgreSQL or Redis) incompatible with zero-dependency plugin distribution; Python-first; contradicts tool-agnostic Layer 0 positioning. See research.md §Alternative A. + +**Claude Code Agent Teams (Alternative C in research.md):** Reserved for v2. Known limitations: `skills` and `mcpServers` frontmatter fields are silently ignored when running as a teammate; no session resumption for in-process teammates; disabled by default. See research.md §Alternative C. + +**Wrapper subagent as dispatch authority (ADR-0046 Option A):** Architecturally impossible. The platform hard limit (subagents cannot spawn subagents) means any dispatch authority must be the root session agent. The orchestrator is the only viable dispatch authority. + +**Separate `goal-loop-state.md` file (ADR-0047 Option B):** Rejected. Creates two "sources of truth" for session state; requires atomic two-file writes; increases risk of partial-write inconsistency during session interruption. + +**Embedding scope criteria in `requirements.md` (ADR-0048 Option A):** Rejected. `requirements.md` is a full PRD produced by Stage 3 (`/spec:requirements`) and has a distinct structure. `scope.md` serves a different audience and lifecycle phase. + +--- + +### Risks + +References to `RISK-ORCH-001` through `RISK-ORCH-012` are in `research.md §Risks`. Architecture-specific notes follow. + +| Risk reference | Architecture-level note | +|---|---| +| RISK-ORCH-001 (error compounding 17.2x) | Mitigated by: typed output schemas at each phase boundary; reviewer subagent as a blocking check before Gate 3; stall detection preventing infinite loops. | +| RISK-ORCH-002 (parallel write conflicts) | Mitigated by: `isolation: worktree` per dev/qa subagent; orchestrator-mediated merge after each wave via reviewer subagent — not automatic. | +| RISK-ORCH-004 (orchestrator context exhaustion) | Mitigated by: orchestrator reads artifact files by path (not full conversation history); subagents spawn with clean contexts; session summary truncates session state at completion. | +| RISK-ORCH-005 (infinite loops / stalled subagents) | Mitigated by: stall_counters per task in `workflow-state.md`; hard limit of 3 retries before HITL escalation; stall_counters persist across session restarts. | +| RISK-ORCH-006 (decomposition errors) | Mitigated by: planner outputs explicit `depends_on` edges; orchestrator validates acyclicity before writing wave schedule; human review at Gate 2 before implementation begins. | +| RISK-ORCH-007 (agent performance degradation over consecutive runs) | Mitigated by: fresh subagent spawn per task; orchestrator does not reuse persistent agents across phases. | +| RISK-ORCH-008 (plugin manifest naming collision) | Resolved: `plugins/*/manifest.md` (ADR-0036) and `.claude-plugin/plugin.json` are separate files at different paths with separate concerns. `build-claude-plugin.ts` generates the latter and does not touch the former. | +| RISK-ORCH-009 (`settings.json` agent priority) | Documented as known behaviour: Claude Code project settings (`.claude/settings.json`) override plugin settings for the same key. Orchestrator is the plugin default, not forced. | +| RISK-ORCH-012 (orchestrator becoming monolithic) | Mitigated by: decomposing the goal-loop into phase-specific logic sections within the conductor skill; each phase has defined inputs, outputs, and a single responsibility. The orchestrator's system prompt invokes the conductor skill rather than containing all phase logic inline. | + +**Architecture-specific risk (new):** + +| ID | Risk | Severity | Likelihood | Mitigation | +|---|---|---|---|---| +| RISK-ORCH-013 | Orchestrator writes to `specs/` outside the declared write boundary (e.g., overwrites `requirements.md` from a prior stage run) | High | Low | Write boundary documented in ADR-0046 and in orchestrator system prompt; `check-agents.ts` does not enforce path restrictions at runtime — this is a system-prompt-level constraint, not a tool-level one | +| RISK-ORCH-014 | `settings.json` agent key conflict between plugin default and project `.claude/settings.json` | Low | Low | Documented as known behaviour per RISK-ORCH-009; implementation team must test priority resolution during beta | +| RISK-ORCH-015 | Topological sort produces incorrect wave order if planner writes malformed `depends_on` edges (circular or self-referential) | Medium | Low | Orchestrator validates acyclicity via Kahn's BFS before writing wave schedule; self-referential edges are trivially detected; circular dependencies surface as BFS termination failure → orchestrator reports an error and invites user to restart the plan phase | + +--- + +### Performance, security, and observability + +#### Performance + +- **Scope phase (NFR-ORCH-001):** Target ≤ 30 seconds from problem statement to Gate 1. The scope phase is the grill skill running in the orchestrator's context — no subagent dispatch latency. Bottleneck is grill skill iteration count; bounded by early exit on EARS completeness. +- **Research wave parallelism (NFR-ORCH-002):** N analyst subagents dispatched in a single orchestrator turn (parallel Agent tool calls). Wall-clock time scales approximately as the slowest analyst, not as N × analyst-time. This is the primary parallelism mechanism. +- **Design-to-Gate-2 target (NFR-ORCH-006):** ≤ 5 minutes for well-scoped issues (≤ 5 EARS criteria, ≤ 3 research questions). Dominated by architect subagent latency; model selection via `SPECORATOR_HEAVY_MODEL` allows trading cost for quality. +- **Worktree creation overhead:** Each dev/qa subagent with `isolation: worktree` creates a new worktree. On a large monorepo, 5 parallel worktrees may take 10–30 seconds to create. This is logged in `workflow-state.md` as wave start/end timestamps for empirical measurement during beta (per requirements.md open questions). +- **Orchestrator context management:** The orchestrator reads artifact files by path rather than accumulating phase outputs in-context. Each subagent spawns with a clean context (no conversation history). These two mechanisms prevent context rot across a long session. + +#### Security + +- **Write boundary enforcement:** The orchestrator has Write and Edit tools. By convention (enforced in the system prompt and documented in ADR-0046), the orchestrator writes only to `specs//` paths. It does not write to `.claude/`, `docs/`, or any other directory. This is a system-prompt constraint, not a platform-enforced path restriction. +- **GitHub issue content:** The orchestrator reads GitHub issue title and body via the scoped GitHub MCP read tool. This content is passed to the grill skill as the initial problem statement — it is not executed as instructions. Prompt injection via a malicious issue body is mitigated by the grill skill's structured output extraction (EARS criteria, not free-form LLM instructions). +- **Plugin agent frontmatter validation (REQ-ORCH-020 / NFR-ORCH-007):** `check-agents.ts` runs in CI and rejects any plugin agent that declares `hooks`, `mcpServers`, or `permissionMode` in its YAML frontmatter. This prevents a plugin agent from escalating its own permissions. The orchestrator's expanded tool list (`Agent`, `Write`, `Edit`) is declared in frontmatter — this is expected and validated to be present only on the orchestrator definition. +- **Subagent tool isolation:** Subagents do not inherit the Agent tool. The platform hard limit (subagents cannot spawn subagents) ensures the star topology is enforced at the platform level, not just by convention. +- **Plugin bundle trust boundary:** The plugin bundle is distributed via the `dist/claude-plugin` orphan branch (ADR-0043). The bundle contents are generated by `build-claude-plugin.ts` from canonical sources — no hand-edited files in the bundle. Any modification to the canonical sources goes through the standard PR + CI pipeline. + +#### Observability + +The goal-loop has no external telemetry infrastructure. Observability is file-based. + +| Observable | Mechanism | Who reads it | +|---|---|---| +| Current phase | `workflow-state.md#goal_loop.current_phase` | User (session resume), orchestrator (pre-flight checks) | +| HITL gate state | `workflow-state.md#goal_loop.hitl_state` | Orchestrator (session resume replay) | +| Stall events | `workflow-state.md#goal_loop.stall_counters` | Orchestrator (stall detection), user (stall gate prompt) | +| Artifacts produced | `workflow-state.md#goal_loop.artifacts_produced` | User (session resume prompt), orchestrator (pre-flight checks) | +| Wave progress | `workflow-state.md#goal_loop.wave_schedule` (implicit: which waves are complete vs. pending) | Orchestrator (wave advancement) | +| Session outcome | `session-summary.md` | User, teammates, enterprise auditors | +| Phase start/end times | `workflow-state.md#updated` timestamp, refreshed at each write | User (performance tracking) | + +No structured logging to an external system is introduced in v1. If the implementation team needs richer observability during beta, they should append structured log entries to the `workflow-state.md` body rather than introducing external logging infrastructure. + +--- + +### Requirements coverage + +All 23 REQ-ORCH-NNN IDs are mapped below. Requirements covered by Parts A and B are noted for completeness; Part C addresses the architectural and structural concerns for each. + +| REQ ID | Summary | Addressed in | +|---|---|---| +| REQ-ORCH-001 | Orchestrator dispatches subagents via Agent tool | Part C — Components §Orchestrator agent; Interaction contracts; ADR-0046 | +| REQ-ORCH-002 | Orchestrator owns workflow-state.md transitions | Part C — Components §Orchestrator agent; Data model §workflow-state.md; ADR-0046, ADR-0047 | +| REQ-ORCH-003 | Pre-flight precondition check before subagent dispatch | Part C — Components §Orchestrator agent; Data flow (pre-condition per phase); Interaction contracts (pre-conditions per spawn) | +| REQ-ORCH-004 | SPECORATOR_HEAVY_MODEL applied to heavy-tier subagents | Part C — Interaction contracts §architect, §dev, §reviewer (model selection note) | +| REQ-ORCH-005 | Slash commands unchanged for non-goal-loop use | Part C — System overview (orchestrator activated only via plugin settings.json); Security §Plugin bundle trust boundary | +| REQ-ORCH-006 | Goal-loop entry from free-text problem statement | Part A (flow); Part C — System overview (orchestrator as root session agent) | +| REQ-ORCH-007 | Goal-loop entry from GitHub issue reference | Part A (flow); Part C — Security §GitHub issue content; Components §Scope phase | +| REQ-ORCH-008 | Scope phase: EARS extraction and Gate 1 HITL | Part A (gate design); Part C — Components §Scope phase; Interaction contracts §grill skill; Data model §scope.md; ADR-0048 | +| REQ-ORCH-009 | Research wave: parallel analyst dispatch (1–5) | Part C — Components §Research wave scheduler; Data flow §happy path; Interaction contracts §analyst subagent | +| REQ-ORCH-010 | Research wave: de-duplicated synthesis into research.md | Part C — Components §Research wave scheduler; Interaction contracts §analyst subagent (post-condition); Data flow §research wave | +| REQ-ORCH-011 | Design synthesis: architect subagent, Gate 2 HITL | Part A (gate design); Part C — Components §Design synthesis phase; Interaction contracts §architect subagent | +| REQ-ORCH-012 | Plan phase: planner subagent, tasks.md with DAG edges | Part C — Components §DAG wave scheduler; Interaction contracts §planner subagent (expected output schema) | +| REQ-ORCH-013 | Implement waves: parallel dispatch in topological order | Part C — Components §DAG wave scheduler, §Implement wave executor; Interaction contracts §dev/qa subagent | +| REQ-ORCH-014 | Stall detection: HITL after 3 unproductive retries | Part A (stall gate design); Part C — Components §Stall detector; Data model §workflow-state.md stall_counters; Data flow §stall path; RISK-ORCH-005 | +| REQ-ORCH-015 | Review phase: validation against EARS criteria, Gate 3 HITL | Part A (gate design); Part C — Components §Review phase; Interaction contracts §reviewer/qa subagent | +| REQ-ORCH-016 | Session summary artifact at loop completion | Part C — Components §Session summary writer; Data model §session-summary.md; ADR-0048 | +| REQ-ORCH-017 | Plugin bundle includes valid .claude-plugin/plugin.json | Part C — Components §Plugin manifest; Data model §plugin.json | +| REQ-ORCH-018 | Plugin bundle includes settings.json with agent: orchestrator | Part C — Components §Plugin manifest; Data model §settings.json; Key decision D7 | +| REQ-ORCH-019 | build-claude-plugin.ts generates both files without manual editing | Part C — Components §build-claude-plugin.ts; Key decision D11 | +| REQ-ORCH-020 | check-agents.ts rejects hooks, mcpServers, permissionMode in frontmatter | Part C — Security §Plugin agent frontmatter validation; ADR-0046 §Compliance | +| REQ-ORCH-021 | Zero behavioural change for non-plugin users | Part C — System overview (orchestrator active only when plugin enables it); Security §Plugin bundle trust boundary | +| REQ-ORCH-022 | workflow-state.md written before every AskUserQuestion | Part A (noted at each gate); Part C — Data flow (explicit write before each gate call); Data model §workflow-state.md hitl_state; ADR-0047 | +| REQ-ORCH-023 | /issue:tackle absorbed as orchestrator entry mode | Part A (Flow A3); Part C — System overview (orchestrator detects issue reference pattern) | + +--- + +### Open questions + +The following items require empirical validation during implementation beta and are tracked as open questions in `requirements.md`. They are not blockers for the specification: + +1. **`settings.json` agent key priority resolution:** Exact behaviour when the plugin's `settings.json` specifies `agent: "orchestrator"` and the project also has a `.claude/settings.json` with a different `agent` key. Needs testing against the Claude Code runtime; document as known behaviour before GA. + +2. **Wave scheduler performance at scale:** Worktree creation time with 5 parallel subagents on a large monorepo may become a bottleneck. Measure during beta using the `workflow-state.md#updated` timestamp mechanism; set an explicit threshold if warranted (candidate: wave execution wall-clock time ≤ 2 minutes per wave for standard repo sizes). + +3. **Stall detection threshold calibration:** The 3-retry maximum (NFR-ORCH-003) requires empirical validation. If beta testing reveals it is too tight for complex tasks, it should be raised via a spec amendment before general availability. + +--- + +## Quality gate + +- [x] UX: primary flows mapped (10 flows A1–A10); IA clear; empty/loading/error states prescribed (no-input welcome, issue-fetch failure, grill failure, wave error, corrupted state). +- [x] UI: key screens identified (12-state table); CLI design system conventions referenced (tokens, component patterns, microcopy rules). +- [x] Architecture: components (12), data flow (happy path + stall path), integration points (grill skill, 6 subagent spawn contracts, GitHub MCP) all named. +- [x] Alternatives considered and rejected with rationale (LangGraph, CrewAI, Agent teams — in research.md; referenced from Part C). +- [x] Irreversible architectural decisions have ADRs: ADR-0046 (orchestrator tool expansion), ADR-0047 (workflow-state.md schema extension), ADR-0048 (scope.md + session-summary.md artifact types). +- [x] Risks have mitigations: RISK-ORCH-001–015 documented with architecture-level mitigations. +- [x] Every PRD requirement is addressed: all 23 REQ-ORCH-NNN IDs appear in the requirements coverage table. diff --git a/specs/goal-oriented-orchestrator-plugin/idea.md b/specs/goal-oriented-orchestrator-plugin/idea.md new file mode 100644 index 000000000..ba65fc4f6 --- /dev/null +++ b/specs/goal-oriented-orchestrator-plugin/idea.md @@ -0,0 +1,172 @@ +--- +id: IDEA-ORCH-001 +title: Orchestrator-first Claude plugin — goal-loop as core architecture +stage: idea +feature: goal-oriented-orchestrator-plugin +status: accepted +owner: analyst +created: 2026-05-13 +updated: 2026-05-13 +closes: "#501" +--- + +# Idea — Orchestrator-first Claude plugin with goal-loop as core architecture + +## Problem statement + +Specorator's current architecture is **command-chain-driven**: users must manually invoke 11 sequential slash commands (`/spec:idea` → `/spec:research` → … → `/spec:retro`). The orchestrator agent exists but is **advisory-only** — it has Read/Grep tools only, cannot dispatch agents, cannot update workflow state, and cannot enforce stage gates. This means Specorator has all the right building blocks (36 agents, 38 skills, 85 commands, 12 plugin groups) but no single authority that drives a feature from problem statement to shipped solution without constant user hand-holding. The Specorator Claude plugin exists (distributed via `dist/claude-plugin` per ADR-0043) but does not make the orchestrator the primary entry point: installing the plugin gives users the full command palette, not a guided delivery loop. The result is high onboarding friction — a new user must read significant documentation before getting a usable result, while competitors like Cursor, Copilot Workspace, and GitHub Spec Kit deliver value in under five minutes. + +## Proposed architecture + +The **orchestrator becomes the dispatch authority** — not just an advisor. When a user enables the Specorator plugin, Claude Code loads the orchestrator as the main session agent (`settings.json agent: orchestrator`). The orchestrator: + +1. **Scopes** the problem via the `grill` skill (structured EARS-clause intake, AskUserQuestion gates) +2. **Spawns parallel Researcher subagents** (N determined dynamically by scope complexity) each with a clean context and a bounded question +3. **Synthesises** research into a design proposal (invokes analyst/architect subagents), writes to `design.md` +4. **Gates on user approval** (synchronous AskUserQuestion) before locking the plan +5. **Decomposes** the approved design into a task DAG (invokes planner subagent), writes to `tasks.md` +6. **Dispatches implementer subagents** in topological wave order, isolated worktrees per agent, parallel within each wave +7. **Reviews** the output against acceptance criteria (reviewer/qa subagents) +8. **Presents** a structured session summary — decisions made, evidence used, artifacts produced + +This loop — Scope → Research → Design → Plan → Implement → Review — is the **goal-loop**: the canonical pattern for resolving any bounded, outcome-defined problem in Specorator. It is not a parallel track alongside the existing 11-stage lifecycle; it IS the lifecycle, collapsed into a single orchestrated session for issue-resolution use cases. The existing `/spec:*` stage commands become the building blocks the orchestrator invokes, not the primary user interface. + +The project's plugin packaging is simultaneously refactored to be a **proper Claude Code plugin** with `.claude-plugin/plugin.json`, reconciling the existing ADR-0036 capability manifests (`plugins/*/manifest.md`) with the Claude Code plugin format. + +## Architecture diagram + +``` +User submits problem statement + │ + ▼ +┌────────────────────────┐ +│ Orchestrator │ ← Main session agent (settings.json: agent: orchestrator) +│ Scope phase │ grill skill: structured intake → EARS acceptance criteria +│ │ AskUserQuestion gate — confirm scope before research +└──────────┬─────────────┘ + │ + ▼ +┌────────────────────────┐ +│ Research wave │ ← N parallel Researcher subagents (analyst agent class) +│ (parallel) │ Each: bounded question + clean context + worktree isolation +│ │ Orchestrator synthesises results, removes duplication +└──────────┬─────────────┘ + │ + ▼ +┌────────────────────────┐ +│ Design synthesis │ ← Orchestrator + architect subagent +│ │ Produces design.md +│ │ AskUserQuestion gate — user approves / rejects / edits +└──────────┬─────────────┘ + │ (approved) + ▼ +┌────────────────────────┐ +│ Plan phase │ ← Planner subagent decomposes design into task DAG +│ │ tasks.md with explicit dependency edges +│ │ Wave schedule = topological sort (Kahn BFS) +└──────────┬─────────────┘ + │ + ▼ +┌────────────────────────┐ +│ Implement waves │ ← Orchestrator dispatches dev/qa subagents per wave +│ (parallel within │ isolation: worktree per agent +│ each wave) │ Stall detection: counter per wave → escalate to HITL +└──────────┬─────────────┘ + │ + ▼ +┌────────────────────────┐ +│ Review phase │ ← reviewer + qa subagents validate vs acceptance criteria +│ │ AskUserQuestion gate — accept / request specific revision +│ │ Revision re-enters loop at Implement wave +└──────────┬─────────────┘ + │ (accepted) + ▼ +┌────────────────────────┐ +│ Session summary │ ← Orchestrator produces human-readable summary: +│ │ decisions made, evidence used, artifacts produced, +│ │ traceability IDs, open follow-ups +└────────────────────────┘ +``` + +## Target users + +- **Primary:** Senior solo developer or small team (2–10 people) building production features who need structured delivery without the friction of manually chaining 11 slash commands. They accept discipline in exchange for confidence in what they shipped. +- **Secondary:** Agency or service provider doing repeatable client delivery who needs traceable artifacts (ADRs, EARS requirements, traceability.md) to defend decisions and report to stakeholders. +- **Tertiary:** Enterprise evaluator assessing agentic tools against governance requirements (EU AI Act audit trails, ISACA governance standards) — Specorator's ID chain (REQ→T→TEST→review finding) is unique in the market. + +## Desired outcome + +A first-time Specorator user can install the plugin, submit a GitHub issue number or a free-text problem statement, and receive a fully spec-driven, traceable resolution session — complete with requirements, design, implementation, tests, and a session summary — without reading any documentation beyond the welcome message. Experienced users can override any gate, skip any stage, and drop into individual commands when needed. The orchestrator is the accelerator, not a constraint. + +## Resolved decisions (from issue #501 decision table) + +| # | Decision | Resolution | Rationale | +|---|---|---|---| +| D1 | Scope intake format | EARS clauses extracted via `grill` skill (one structured question at a time until goals, constraints, and acceptance criteria are unambiguous) | EARS maps 1:1 to tests; grill is already the proven intake primitive | +| D2 | Researcher subagent count | Dynamic: 1 for narrow/spike, 3 for standard, up to 5 for broad/complex (orchestrator decides based on scope surface area) | Anthropic research shows performance gains plateau above 5 parallel agents; wave size bound prevents context explosion | +| D3 | Design presentation | Generated `design.md` artifact written to specs folder + inline summary in chat; user edits the artifact, not raw chat | File-based artifacts survive session boundaries; consistent with "spec is the memory" principle | +| D4 | Plan format | Existing `tasks.md` format extended with explicit `depends_on` edges; wave schedule derived at runtime by topological sort | Reuses proven format; DAG edges are the only addition needed for wave-parallel execution | +| D5 | Parallel execution model | `isolation: worktree` per implementer subagent (Claude Code native); merge mediated by orchestrator after each wave completes | Prevents parallel write conflicts; no external infrastructure needed; native to Claude Code platform | +| D6 | Review criteria source | Acceptance criteria captured in scope intake (EARS format) + auto-derived from EARS functional requirements in requirements.md | Two-layer validation: human-declared intent + machine-checkable EARS clause coverage | +| D7 | Plugin packaging | Proper `.claude-plugin/plugin.json` manifest + `settings.json { "agent": "orchestrator" }` making orchestrator the main session agent on plugin enable; reconcile ADR-0036 `plugins/*/manifest.md` capability layer with Claude Code plugin format | Claude Code's `settings.json agent` key is the supported mechanism for an orchestrator-first entry point; ADR-0036 manifests become the MCP contract layer (separate concern) | + +## Resolved open questions (from issue #501) + +| Question | Resolution | +|---|---| +| Slash command vs. natural-language trigger? | Natural language is the entry point (orchestrator is the main agent; user just describes their problem). `/orchestrate` slash command remains for explicit invocation or resume. | +| Multi-file codebase vs. single-file — worktree support? | `isolation: worktree` per implementer subagent handles this natively; the orchestrator is not isolated (it needs full repo access for state management). | +| Minimum viable scope for first release? | MVP = goal-loop skill + orchestrator dispatch authority + proper plugin packaging. The 11-stage lifecycle commands remain as building blocks; no stage is removed. | +| Design review step synchronous or async (PR)? | Synchronous AskUserQuestion (chosen by product owner). Three defined gate points: post-scope, post-design, post-review. | +| How does this compose with `/issue:tackle`? | `/issue:tackle` is subsumed as the "issue-first entry mode" of the orchestrator. The orchestrator detects a GitHub issue reference in the input and uses the issue body as the initial scope context, then runs the full goal-loop. `/issue:tackle` becomes an alias that pre-configures the scope phase. | + +## Constraints + +- **Technical:** Subagents cannot spawn subagents (Claude Code platform hard limit). The orchestrator must be the root session agent, not itself a subagent. All parallelism is orchestrator-to-subagent only. +- **Technical:** Agent teams (experimental) require `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` and have known limitations (skills/MCP not loaded in teammate definitions, no session resumption). MVP uses stable subagent model only. +- **Technical:** Plugin agents cannot declare `hooks`, `mcpServers`, or `permissionMode` in frontmatter — these are stripped for security. Plugin-level hooks and MCP are declared in the plugin manifest. +- **Naming collision:** Specorator's `plugins/*/manifest.md` (ADR-0036 capability contract layer) uses a different format than Claude Code's `.claude-plugin/plugin.json`. These must coexist: `plugins/` stays as the MCP contract layer; `.claude-plugin/` is the new Claude Code plugin entry point. +- **Distribution:** Plugin bundle is gitignored on `develop`/`main` (ADR-0043). Build pipeline CI already handles this. Orchestrator-first architecture must work within that distribution model. +- **Backward compatibility:** All 85 existing slash commands must remain functional. The orchestrator is an accelerator on top of the existing command system, not a replacement. +- **Scope (this feature):** Spec only — no implementation code in this iteration. Implementation tracked in a follow-up feature. + +## Out of scope (preliminary) + +- Agent teams mode (experimental) — tracked as v2 candidate once agent teams stabilize in Claude Code +- LangGraph, CrewAI, or any third-party orchestration framework — adding external dependencies contradicts the "methodology, not a product" positioning +- Async/PR-based approval gates — synchronous gates chosen; async mode is a future extension +- Specorator marketplace entry changes — ADR-0043 distribution model is already correct; no marketplace changes in this feature +- Changes to the 11-stage lifecycle artifact formats — orchestrator invokes existing stages, does not alter them + +## Acceptance criteria (refined) + +- [ ] User can submit a free-text problem statement or GitHub issue reference and receive a complete goal-loop session without reading documentation. +- [ ] Orchestrator correctly gates on user approval at three defined points: post-scope, post-design-approval, post-review. +- [ ] Researcher subagents run in parallel, their outputs are merged without duplication, and the synthesised result is written to `research.md` in the feature's spec folder. +- [ ] Implementer subagents run in topological wave order; agents within the same wave run in parallel in isolated worktrees. +- [ ] Orchestrator detects stalled subagents (no progress within max iteration budget) and escalates to human review rather than looping indefinitely. +- [ ] Session ends with a structured summary: decisions made, EARS acceptance criteria status, artifacts produced, traceability IDs, open follow-ups. +- [ ] All existing `/spec:*` slash commands continue to function as standalone invocations. +- [ ] Specorator plugin has a valid `.claude-plugin/plugin.json` and `settings.json` that make the orchestrator the main session agent on plugin enable. +- [ ] The orchestrator's dispatch authority is exercised via the Agent tool, not via text recommendations — it invokes, not advises. + +## References + +- Issue #501 — [original concept](https://github.com/Luis85/agentic-workflow/issues/501) +- ADR-0036 — Adopt plugin manifests as the Specorator capability contract +- ADR-0043 — Distribute Claude Code plugin bundle from orphan dist branch via git-subdir +- ADR-0026 — Freeze v1 workflow track taxonomy +- Claude Code docs — [plugins reference](https://code.claude.com/docs/en/plugins-reference) +- Claude Code docs — [create custom subagents](https://code.claude.com/docs/en/sub-agents) +- Anthropic Engineering — [How we built our multi-agent research system](https://www.anthropic.com/engineering/multi-agent-research-system) + +--- + +## Quality gate + +- [x] Problem statement is one paragraph and understandable to a non-expert. +- [x] Target users named. +- [x] Desired outcome stated. +- [x] Constraints listed. +- [x] Open questions captured and resolved. +- [x] Scope is bounded — no "boil the ocean" framing. diff --git a/specs/goal-oriented-orchestrator-plugin/requirements.md b/specs/goal-oriented-orchestrator-plugin/requirements.md new file mode 100644 index 000000000..8fc486ac4 --- /dev/null +++ b/specs/goal-oriented-orchestrator-plugin/requirements.md @@ -0,0 +1,491 @@ +--- +id: PRD-ORCH-001 +title: Goal-oriented orchestrator plugin +stage: requirements +feature: goal-oriented-orchestrator-plugin +status: accepted +owner: pm +inputs: + - IDEA-ORCH-001 + - RESEARCH-ORCH-001 +created: 2026-05-13 +updated: 2026-05-13 +closes: "#501" +--- + +# PRD — Goal-oriented orchestrator plugin + +## Summary + +We are building two tightly coupled deliverables that ship as one feature: (1) an orchestrator-first architecture refactor that promotes the existing Specorator orchestrator agent from advisory-only to full dispatch authority, and (2) a proper Claude Code plugin package that makes the orchestrator the main session agent when the plugin is enabled. Together they introduce the **goal-loop** — a six-phase conductor skill (Scope → Research → Design → Plan → Implement → Review) that moves a user from a free-text problem statement or GitHub issue reference to a fully traceable, spec-driven resolution session without requiring any manual slash-command chaining. This is built now because Specorator's command-chain-driven onboarding is the confirmed primary adoption blocker — first-time users must read documentation before receiving value — while competitors (GitHub Copilot Workspace, Cursor 2.0, GitHub Spec Kit) deliver a useful result in under five minutes. The orchestrator-first architecture is the minimal change that closes this gap while preserving all 85 existing slash commands and the full 11-stage lifecycle methodology. + +## Goals + +- G1: Enable a first-time user to submit a problem statement or GitHub issue reference and receive a complete, traceable resolution session without reading documentation beyond the welcome message. +- G2: Give the orchestrator full dispatch authority over specialist subagents via the Agent tool, replacing its current advisory-only role. +- G3: Deliver a valid Claude Code plugin package (`.claude-plugin/plugin.json` + `settings.json`) that makes the orchestrator the default session agent on plugin enable. +- G4: Introduce exactly three synchronous human-in-the-loop (HITL) gates — post-scope, post-design, and post-review — giving users meaningful control without excessive interruption. +- G5: Preserve full backward compatibility: all 85 existing slash commands remain functional as standalone invocations. + +## Non-goals + +- NG1: Agent teams mode (`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`) — reserved for v2 when the feature stabilises in Claude Code. +- NG2: Third-party orchestration frameworks (LangGraph, CrewAI, or similar) — incompatible with zero-dependency plugin distribution and tool-agnostic positioning. +- NG3: Async or PR-based approval gates — synchronous AskUserQuestion is the chosen pattern for v1. +- NG4: Changes to the 11-stage lifecycle artifact formats (idea.md, research.md, design.md, tasks.md) — the orchestrator invokes existing stages; it does not alter their schemas. +- NG5: Specorator marketplace entry changes — ADR-0043 distribution model is already correct. +- NG6: Implementation code in this iteration — this feature delivers the specification only; implementation is tracked as a follow-up. +- NG7: MCP capability broker or plugin registry runtime loading — planned for Layer 3; out of scope for this feature. + +## Personas / stakeholders + +| Persona | Need | Why it matters | +|---|---|---| +| Senior solo developer (primary) | Submit a GitHub issue or problem statement and receive a spec-driven session without chaining 11 commands | Today's command-chain onboarding blocks adoption; this persona chooses tools by time-to-first-result | +| Small engineering team (2–10 people) | Traceable artifacts (EARS requirements, ADRs, traceability.md) that survive session boundaries and can be reviewed by teammates | File-based artifacts are the primary mechanism for team handoffs; session-only state is not sufficient | +| Agency / service provider (secondary) | Repeatable, auditable delivery records to defend decisions and report progress to clients | Traceability ID chain (REQ→T→TEST→finding) is a differentiator; no current competitor produces this chain | +| Enterprise evaluator (tertiary) | Evidence of governance and audit trails for EU AI Act, ISACA, and internal risk review | Specorator's ID chain is uniquely positioned in the market; this persona triggers procurement decisions | +| Existing Specorator user | All current slash commands continue to work exactly as before | Backward compatibility is non-negotiable; orchestrator is an accelerator, not a constraint | + +## Jobs to be done + +- When I have a GitHub issue number but no time to manually run 11 slash commands, I want to hand it to the orchestrator and receive a fully resolved, traceable result, so I can focus on decisions rather than command logistics. +- When I am scoping a new feature, I want structured EARS acceptance criteria extracted from my description through a guided conversation, so I can trust that downstream agents work from unambiguous goals. +- When I am reviewing a design proposal, I want to see a generated `design.md` artifact and approve or reject it before any implementation begins, so I retain control at the most consequential decision point. +- When I install the Specorator plugin, I want the orchestrator to be my default session agent immediately, so I receive a guided experience without any additional configuration. +- When a subagent stalls and makes no progress, I want to be notified and given control rather than waiting for an infinite loop, so I can redirect or abort without losing the session state accumulated so far. + +## Functional requirements (EARS) + +> All requirements use EARS notation. One requirement per entry. Stable IDs. MoSCoW priorities use "must", "should", "could". + +--- + +### REQ-ORCH-001 — Orchestrator dispatch via Agent tool + +- **Pattern:** Ubiquitous +- **Statement:** The orchestrator shall invoke specialist subagents (researcher, architect, planner, dev, qa, reviewer) exclusively via the Agent tool, not via text recommendations. +- **Acceptance:** + - Given the orchestrator has determined that a specialist subagent is needed + - When the orchestrator initiates that specialist's work + - Then an Agent tool call is issued with the specialist's agent definition and a bounded prompt — no text instruction to the user to run a slash command is emitted in lieu of dispatch +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-001 + +--- + +### REQ-ORCH-002 — Orchestrator ownership of workflow-state.md transitions + +- **Pattern:** Event-driven +- **Statement:** WHEN the orchestrator completes a goal-loop phase, the orchestrator shall write the updated stage and phase status to `workflow-state.md` before proceeding to the next phase. +- **Acceptance:** + - Given a goal-loop session is active and a phase (scope, research, design, plan, implement, review) has just completed + - When the orchestrator transitions to the next phase + - Then `workflow-state.md` reflects the completed phase and the next active phase before any subagent for the next phase is dispatched + - And specialist subagents do not write stage transitions to `workflow-state.md` +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-001 + +--- + +### REQ-ORCH-003 — Pre-flight precondition check before subagent dispatch + +- **Pattern:** Event-driven +- **Statement:** WHEN the orchestrator is about to dispatch a specialist subagent, the orchestrator shall verify that all required predecessor artifacts exist and are non-empty before issuing the Agent tool call. +- **Acceptance:** + - Given the orchestrator is preparing to dispatch a specialist (e.g., planner) that depends on a predecessor artifact (e.g., design.md) + - When the orchestrator checks the precondition + - Then dispatch proceeds only if the artifact file exists and contains content + - And if the artifact is absent or empty, the orchestrator surfaces a specific error to the user via AskUserQuestion naming the missing artifact — it does not dispatch the subagent +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-003 + +--- + +### REQ-ORCH-004 — Model selection for heavy-tier subagents + +- **Pattern:** Event-driven +- **Statement:** WHEN the orchestrator dispatches a heavy-tier subagent (architect, dev, or reviewer), the orchestrator shall apply the model identifier from the `SPECORATOR_HEAVY_MODEL` environment variable if that variable is set and non-empty. +- **Acceptance:** + - Given the `SPECORATOR_HEAVY_MODEL` environment variable is set to a valid model identifier + - When the orchestrator dispatches an architect, dev, or reviewer subagent + - Then the Agent tool call specifies the model from `SPECORATOR_HEAVY_MODEL` + - And when `SPECORATOR_HEAVY_MODEL` is absent or empty, the orchestrator uses the session default model for all subagents +- **Priority:** should +- **Satisfies:** RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-004 + +--- + +### REQ-ORCH-005 — Standalone slash-command operability + +- **Pattern:** Ubiquitous +- **Statement:** The orchestrator shall not alter the behaviour of any existing `/spec:*` slash command when that command is invoked directly by the user outside of a goal-loop session. +- **Acceptance:** + - Given a user invokes any of the 85 existing slash commands directly (e.g., `/spec:requirements`, `/spec:design`) + - When the command executes + - Then the command completes with the same artifact output and workflow-state.md update it produced before this feature was introduced + - And no orchestrator goal-loop logic is inserted into the command's execution path +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001 +- **Downstream:** SPEC-ORCH-002 + +--- + +### REQ-ORCH-006 — Goal-loop entry from free-text problem statement + +- **Pattern:** Event-driven +- **Statement:** WHEN the orchestrator receives a free-text problem statement from the user as the session's opening message, the orchestrator shall initiate the scope phase of the goal-loop. +- **Acceptance:** + - Given the orchestrator is the active session agent (via plugin `settings.json agent: orchestrator`) + - When the user's first message contains a natural-language problem description that is not prefixed by a slash command + - Then the orchestrator begins the scope phase by invoking the grill skill to extract structured EARS acceptance criteria + - And the orchestrator does not ask the user to run a slash command first +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-006 + +--- + +### REQ-ORCH-007 — Goal-loop entry from GitHub issue reference + +- **Pattern:** Event-driven +- **Statement:** WHEN the orchestrator receives a message containing a GitHub issue reference (e.g., "#501" or a GitHub issue URL), the orchestrator shall fetch the issue body and use it as the initial scope context before initiating the scope phase. +- **Acceptance:** + - Given the orchestrator is the active session agent + - When the user's input contains a GitHub issue number or URL + - Then the orchestrator reads the issue title and body + - And uses that content as the initial problem statement passed to the grill skill + - And the scope phase proceeds as it would from a free-text entry +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-007 + +--- + +### REQ-ORCH-008 — Scope phase: EARS acceptance criteria extraction and HITL gate + +- **Pattern:** Event-driven +- **Statement:** WHEN the scope phase begins, the orchestrator shall invoke the grill skill to extract EARS acceptance criteria from the problem statement and then present a summary to the user via AskUserQuestion before spawning any researcher subagents. +- **Acceptance:** + - Given the goal-loop scope phase is active + - When the grill skill completes its structured intake + - Then the orchestrator presents the extracted EARS acceptance criteria to the user as a numbered list via AskUserQuestion + - And the orchestrator waits for explicit user confirmation (approve, edit, or abort) before advancing to the research phase + - And if the user edits the criteria, the orchestrator incorporates the edits and re-presents before advancing +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001 +- **Downstream:** SPEC-ORCH-008 + +--- + +### REQ-ORCH-009 — Research wave: dynamic parallel researcher dispatch + +- **Pattern:** Event-driven +- **Statement:** WHEN the scope phase is confirmed by the user, the orchestrator shall dispatch between one and five researcher (analyst) subagents in parallel, with the count determined by the scope surface area assessed during the scope phase. +- **Acceptance:** + - Given the scope phase has been confirmed by the user + - When the orchestrator initiates the research wave + - Then the orchestrator issues between one and five parallel Agent tool calls to analyst subagents in a single orchestrator turn + - And each subagent receives a distinct, bounded research question derived from the scope + - And no two subagents receive the same research question +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-009 + +--- + +### REQ-ORCH-010 — Research wave: de-duplicated synthesis into research.md + +- **Pattern:** Event-driven +- **Statement:** WHEN all researcher subagents in the research wave have returned their outputs, the orchestrator shall merge those outputs into a single `research.md` file, removing duplicate findings, before advancing to the design phase. +- **Acceptance:** + - Given all researcher subagents in the current wave have returned results + - When the orchestrator synthesises the research outputs + - Then a single `research.md` is written to `specs//research.md` + - And no finding that appears in two or more researcher outputs is duplicated in the synthesised file + - And the file includes attribution (which subagent surfaced each finding) for traceability +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-010 + +--- + +### REQ-ORCH-011 — Design synthesis: architect subagent produces design.md and HITL gate + +- **Pattern:** Event-driven +- **Statement:** WHEN the research wave is complete, the orchestrator shall dispatch an architect subagent to produce `design.md` and then present the design to the user via AskUserQuestion before advancing to the plan phase. +- **Acceptance:** + - Given `research.md` has been written and the research wave is complete + - When the orchestrator dispatches the architect subagent + - Then the architect subagent writes `design.md` to `specs//design.md` + - And the orchestrator presents an inline summary of `design.md` to the user via AskUserQuestion with options to approve, edit (by editing the file), or reject + - And the orchestrator advances to the plan phase only after the user explicitly approves + - And if the user rejects, the orchestrator records the rejection reason and returns to the research phase with the rejection as additional context +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-011 + +--- + +### REQ-ORCH-012 — Plan phase: planner subagent produces tasks.md with DAG edges + +- **Pattern:** Event-driven +- **Statement:** WHEN the design is approved by the user, the orchestrator shall dispatch a planner subagent that produces `tasks.md` with explicit `depends_on` edges for every task that has a dependency. +- **Acceptance:** + - Given the user has approved `design.md` + - When the planner subagent produces `tasks.md` + - Then every task entry in `tasks.md` that depends on another task includes a `depends_on` field listing the IDs of its predecessor tasks + - And tasks with no dependencies have an empty or absent `depends_on` field + - And the wave schedule derivable from a topological sort of the DAG matches the intended execution order +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-012 + +--- + +### REQ-ORCH-013 — Implement waves: parallel subagent dispatch in topological wave order + +- **Pattern:** Event-driven +- **Statement:** WHEN `tasks.md` is available, the orchestrator shall dispatch dev and qa subagents in topological wave order, with all tasks in the same wave dispatched as parallel Agent tool calls, each subagent isolated in its own worktree. +- **Acceptance:** + - Given `tasks.md` with `depends_on` edges is available + - When the orchestrator computes the wave schedule via topological sort + - Then tasks with no unmet dependencies form the first wave; tasks whose dependencies are in completed waves form subsequent waves + - And within each wave, the orchestrator issues one Agent tool call per task simultaneously in a single orchestrator turn + - And each Agent tool call specifies `isolation: worktree` for the subagent + - And the orchestrator does not advance to the next wave until all tasks in the current wave have returned results +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-013 + +--- + +### REQ-ORCH-014 — Stall detection: escalation after three unproductive iterations + +- **Pattern:** Unwanted behaviour +- **Statement:** IF a subagent completes three consecutive retry iterations without producing progress on its assigned task, THEN the orchestrator shall halt further retries for that subagent and surface the stall to the user via AskUserQuestion, reporting the task ID, the subagent's last output, and the options available (retry, skip, abort session). +- **Acceptance:** + - Given a subagent has been retried for the same task + - When the subagent's third consecutive retry produces no progress (the task output is substantively identical to the previous attempt or the subagent reports it cannot proceed) + - Then the orchestrator issues no further Agent tool calls for that task in the current iteration + - And AskUserQuestion presents the task ID, the subagent's last output, and three explicit options: retry, skip this task, or abort the session + - And the orchestrator waits for the user's choice before taking any further action +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-014 + +--- + +### REQ-ORCH-015 — Review phase: validation against EARS criteria and HITL gate + +- **Pattern:** Event-driven +- **Statement:** WHEN all implement waves are complete, the orchestrator shall dispatch reviewer and qa subagents to validate the implementation output against the EARS acceptance criteria captured in the scope phase, then present the review verdict to the user via AskUserQuestion. +- **Acceptance:** + - Given all implement waves have completed and their worktrees have been merged + - When the orchestrator dispatches the reviewer and qa subagents + - Then each subagent receives the EARS acceptance criteria from the scope phase as explicit validation targets + - And the review verdict lists each acceptance criterion and its pass/fail status + - And the orchestrator presents the verdict via AskUserQuestion with options to accept, or specify a targeted revision + - And if the user specifies a revision, the orchestrator re-enters the implement wave phase with the reviewer's findings attached as additional context for affected tasks +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-009 + +--- + +### REQ-ORCH-016 — Session summary artifact at loop completion + +- **Pattern:** Event-driven +- **Statement:** WHEN the user accepts the review verdict, the orchestrator shall write a session summary artifact to `specs//session-summary.md` listing decisions made, EARS acceptance criteria status, artifacts produced, traceability IDs, and open follow-ups. +- **Acceptance:** + - Given the user has accepted the review verdict + - When the orchestrator produces the session summary + - Then `specs//session-summary.md` is written and contains at minimum: a decisions section, an acceptance-criteria section with pass/fail per criterion, an artifacts section listing each file produced with its path, a traceability section mapping REQ/T/TEST IDs to their artifacts, and an open follow-ups section + - And the orchestrator updates `workflow-state.md` to mark the goal-loop as complete +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001 +- **Downstream:** SPEC-ORCH-016 + +--- + +### REQ-ORCH-017 — Plugin manifest: valid .claude-plugin/plugin.json in bundle + +- **Pattern:** Ubiquitous +- **Statement:** The plugin bundle shall include a `.claude-plugin/plugin.json` file containing at minimum the `name`, `version`, and `description` fields conforming to the Claude Code plugin manifest format. +- **Acceptance:** + - Given the plugin bundle has been built by `build-claude-plugin.ts` + - When the bundle contents are inspected + - Then `.claude-plugin/plugin.json` is present + - And it contains non-empty `name`, `version`, and `description` fields + - And the file is valid JSON +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-017 + +--- + +### REQ-ORCH-018 — Plugin bundle: settings.json declares orchestrator as session agent + +- **Pattern:** Ubiquitous +- **Statement:** The plugin bundle shall include a `settings.json` file that declares `"agent": "orchestrator"` at the top level. +- **Acceptance:** + - Given the plugin bundle has been built by `build-claude-plugin.ts` + - When the bundle contents are inspected + - Then `settings.json` is present in the plugin bundle root + - And it contains the key-value pair `"agent": "orchestrator"` parseable as valid JSON +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001, RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-015 + +--- + +### REQ-ORCH-019 — Plugin bundle generation from canonical sources + +- **Pattern:** Event-driven +- **Statement:** WHEN `build-claude-plugin.ts` is executed, the build script shall generate both `.claude-plugin/plugin.json` and `settings.json` from canonical repository sources without requiring manual editing of either file. +- **Acceptance:** + - Given `build-claude-plugin.ts` is invoked with no extra flags + - When the build completes without error + - Then `.claude-plugin/plugin.json` and `settings.json` in the plugin bundle reflect the current state of their canonical sources + - And no manual editing of `.claude-plugin/plugin.json` or `settings.json` is required after the build +- **Priority:** must +- **Satisfies:** RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-014, SPEC-ORCH-015, SPEC-ORCH-016 + +--- + +### REQ-ORCH-020 — Plugin agent frontmatter must not declare hooks, mcpServers, or permissionMode + +- **Pattern:** Unwanted behaviour +- **Statement:** IF a plugin agent definition's YAML frontmatter declares any of the fields `hooks`, `mcpServers`, or `permissionMode`, THEN the `check-agents.ts` validation script shall emit a build error naming the offending agent file and the prohibited field. +- **Acceptance:** + - Given a plugin agent file has been authored with a `hooks`, `mcpServers`, or `permissionMode` key in its YAML frontmatter + - When `check-agents.ts` runs as part of the build or CI pipeline + - Then the script exits with a non-zero code + - And the error output names the specific agent file and the specific prohibited field + - And no plugin bundle is produced until the violation is corrected +- **Priority:** must +- **Satisfies:** RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-017 + +--- + +### REQ-ORCH-021 — Backward compatibility: zero behavioural change for non-plugin users + +- **Pattern:** Ubiquitous +- **Statement:** The orchestrator shall not introduce any change in observable behaviour for users who invoke Specorator without enabling the plugin. +- **Acceptance:** + - Given a user has not installed or enabled the Specorator Claude Code plugin + - When that user invokes any existing slash command or workflow pattern + - Then the command behaves identically to its pre-feature behaviour + - And the user does not encounter any new prompts, errors, or state changes introduced by the orchestrator-first architecture +- **Priority:** must +- **Satisfies:** IDEA-ORCH-001 +- **Downstream:** SPEC-ORCH-001 + +--- + +### REQ-ORCH-022 — Orchestrator writes workflow-state.md before every AskUserQuestion call + +- **Pattern:** Event-driven +- **Statement:** WHEN the orchestrator is about to call AskUserQuestion at any of the three defined HITL gates or at the stall gate, the orchestrator shall first write the current goal-loop state to `workflow-state.md`. +- **Acceptance:** + - Given a HITL gate has been reached (post-scope, post-design, post-review, or stall escalation) + - When the orchestrator prepares to issue the AskUserQuestion call + - Then `workflow-state.md` is written with the current phase, the accumulated artifact list, and the pending decision before the AskUserQuestion call is issued + - And if the session is interrupted during the human decision window, `workflow-state.md` reflects the last known consistent state +- **Priority:** must +- **Satisfies:** RESEARCH-ORCH-001 +- **Downstream:** SPEC-ORCH-003, SPEC-ORCH-005, SPEC-ORCH-008, SPEC-ORCH-009, SPEC-ORCH-011 + +--- + +### REQ-ORCH-023 — Issue-tackle absorbed as orchestrator entry mode + +- **Pattern:** Event-driven +- **Statement:** WHEN the user invokes `/issue:tackle` with a GitHub issue reference, the orchestrator shall treat this as equivalent to submitting that issue reference directly, initiating the goal-loop scope phase with the issue body as the initial context. +- **Acceptance:** + - Given the user invokes `/issue:tackle #NNN` or `/issue:tackle ` + - When the orchestrator handles this command + - Then the goal-loop scope phase begins with the issue title and body as the initial problem statement + - And the experience is identical to submitting the issue reference as a free-text message to the orchestrator +- **Priority:** should +- **Satisfies:** IDEA-ORCH-001 +- **Downstream:** SPEC-ORCH-002 + +--- + +## Non-functional requirements + +> **Note on inherited baselines:** `docs/steering/quality.md` and `docs/steering/operations.md` are template stubs without populated numeric thresholds. All NFR targets below are introduced by this feature and stated explicitly. New thresholds introduced here are marked "(new threshold)". + +| ID | Category | Requirement | Target | +|---|---|---|---| +| NFR-ORCH-001 | performance | Time from user submitting a problem statement to orchestrator presenting the scope confirmation (first AskUserQuestion) | ≤ 30 seconds (new threshold) | +| NFR-ORCH-002 | performance | Wall-clock time for N parallel researcher subagents versus N sequential researcher runs at N = 3 | Parallel wall-clock time shall be strictly less than sequential wall-clock time (new threshold) | +| NFR-ORCH-003 | reliability | Maximum retry iterations per subagent before stall escalation to HITL | No subagent shall execute more than 3 retry iterations without escalating (new threshold) | +| NFR-ORCH-004 | compatibility | Behavioural change to existing `/spec:*` slash commands after orchestrator refactor | Zero breaking changes — all 85 commands must produce identical outputs to pre-feature behaviour (new threshold) | +| NFR-ORCH-005 | build integrity | Plugin bundle validation before `dist/claude-plugin` update | `build-claude-plugin.ts --check` must pass with exit code 0 before any update to `dist/claude-plugin` (new threshold) | +| NFR-ORCH-006 | performance | Time from problem statement to design-approval HITL gate (post-design) for a well-scoped issue | Target ≤ 5 minutes (new threshold; well-scoped is defined as: single-area change, ≤ 5 EARS criteria, ≤ 3 open research questions) | +| NFR-ORCH-007 | security | Plugin agent frontmatter fields `hooks`, `mcpServers`, and `permissionMode` | `check-agents.ts` must reject any plugin agent bundle that declares these fields, enforced in CI (new threshold) | +| NFR-ORCH-008 | reliability | Goal-loop state durability across session interruption at a HITL gate | `workflow-state.md` written before every AskUserQuestion call; state recoverable from disk after interruption (new threshold) | + +## Success metrics + +- **North star:** Percentage of goal-loop sessions that reach the Session Summary artifact without the user manually invoking any `/spec:*` command. Target: ≥ 70% of sessions in the first 30 days after release. +- **Supporting:** Median elapsed time from problem statement submission to design-approval HITL gate, measured across observed sessions. Target: ≤ 5 minutes for well-scoped issues (≤ 5 EARS criteria). +- **Supporting:** Percentage of sessions where the plugin `settings.json` agent key is respected and the orchestrator is the active session agent on first message. Target: 100% (verifiable in plugin build test). +- **Counter-metric:** Percentage of goal-loop sessions where the user abandons or invokes `/spec:*` manually after the first HITL gate (post-scope). A rate above 25% signals the scope phase is too burdensome or the grill skill extraction is producing low-quality EARS criteria. + +## Release criteria + +What must be true to ship this specification (and, by extension, the implementation it produces): + +- [ ] All `must` functional requirements (REQ-ORCH-001 through REQ-ORCH-022) pass their acceptance criteria. +- [ ] All NFRs met (NFR-ORCH-001 through NFR-ORCH-008) or explicitly waived with an ADR. +- [ ] `check-agents.ts` rejects any plugin agent bundle with prohibited frontmatter fields (REQ-ORCH-020, NFR-ORCH-007). +- [ ] `build-claude-plugin.ts --check` passes on the built bundle (NFR-ORCH-005). +- [ ] All 85 existing slash commands verified to produce identical outputs to their pre-feature behaviour (REQ-ORCH-005, REQ-ORCH-021, NFR-ORCH-004). +- [ ] Test plan executed against goal-loop phases with no critical bugs open against `must` requirements. +- [ ] `workflow-state.md` Zod schema (ADR-0042 prerequisite) is in place before implementation of REQ-ORCH-002 and REQ-ORCH-022. +- [ ] `specs/goal-oriented-orchestrator-plugin/session-summary.md` format documented in the spec (SPEC-ORCH-013). +- [ ] No open clarifications remain in this document. + +## Open questions / clarifications + +None. All clarifications were resolved in issue #501 and are incorporated as resolved decisions D1–D7 in `idea.md`. The following items are noted as requiring empirical validation during implementation beta — they are not blockers for this specification, but the implementation team should open follow-up issues for each: + +- **Priority resolution for `agent` key:** Exact behaviour when the plugin's `settings.json` specifies `agent: "orchestrator"` and the project also has a `.claude/settings.json` with a different `agent` key. Needs testing against the Claude Code runtime; document as known behaviour. +- **Wave scheduler performance at scale:** Worktree creation time with 5 parallel subagents on a large monorepo may become a bottleneck. Measure during beta and set a threshold if needed. +- **Stall detection threshold calibration:** The 3-retry maximum (NFR-ORCH-003) requires empirical validation. If beta testing reveals it is too tight for complex tasks, it should be raised via a spec amendment before general availability. + +## Out of scope + +What we explicitly will not do this cycle: + +- Agent teams mode (`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`) — reserved v2, as the platform feature has known limitations (skills/MCP not applied to teammates, no session resumption). +- LangGraph, CrewAI, or any third-party orchestration framework — contradicts zero-dependency plugin distribution and tool-agnostic Layer 0 positioning. +- Async or PR-based approval gates — synchronous AskUserQuestion is the chosen pattern; async mode is a future extension. +- Changes to stage artifact formats (idea.md, research.md, design.md, tasks.md schemas) — the orchestrator invokes existing stages as-is. +- Specorator marketplace entry changes — ADR-0043 distribution model is already correct. +- Implementation code — this iteration produces the specification only; implementation is a follow-up feature. +- MCP capability broker / plugin registry runtime loading (Layer 3) — tracked in issue #316. +- Nested or peer-to-peer subagent communication — the platform hard limit (subagents cannot spawn subagents) makes this impossible in v1. + +--- + +## Quality gate + +- [x] Goals and non-goals explicit. +- [x] Personas / stakeholders named. +- [x] Jobs to be done captured. +- [x] Every functional requirement uses EARS and has an ID. +- [x] Acceptance criteria testable. +- [x] NFRs listed with targets. +- [x] Success metrics defined (including a counter-metric). +- [x] Release criteria stated. +- [x] `/spec:clarify` returned no open questions. diff --git a/specs/goal-oriented-orchestrator-plugin/research.md b/specs/goal-oriented-orchestrator-plugin/research.md new file mode 100644 index 000000000..4d72e4217 --- /dev/null +++ b/specs/goal-oriented-orchestrator-plugin/research.md @@ -0,0 +1,279 @@ +--- +id: RESEARCH-ORCH-001 +title: Orchestrator-first Claude plugin — research synthesis +stage: research +feature: goal-oriented-orchestrator-plugin +status: complete +owner: analyst +inputs: + - IDEA-ORCH-001 +created: 2026-05-13 +updated: 2026-05-13 +--- + +# Research — Orchestrator-first Claude plugin + +## Research questions + +| ID | Question | Status | +|---|---|---| +| Q1 | What is the Claude Code plugin format and how does `settings.json agent` work? | answered | +| Q2 | Which multi-agent orchestrator pattern best fits Specorator's bounded issue-resolution loop? | answered | +| Q3 | What do competitors offer and where is the defensible differentiation gap? | answered | +| Q4 | What is the current Specorator architecture inventory and what gaps exist vs. the orchestrator-first vision? | answered | +| Q5 | What are the hard platform constraints that the architecture must work within? | answered | +| Q6 | How does the orchestrator handle human approval gates durably? | answered | +| Q7 | How do parallel subagent write conflicts get resolved in implementation waves? | answered | +| Q8 | What naming collision exists between ADR-0036 `plugins/` and Claude Code's `.claude-plugin/`? | answered | + +--- + +## Market / ecosystem + +### Spec-driven development tool landscape (May 2026) + +The spec-driven development (SDD) category has become contested in under 18 months. Key players: + +| Solution | Approach | Strengths | Weaknesses | Source | +|---|---|---|---|---| +| **Devin** (Cognition) | Fully autonomous; Plan→DAG execute→critic loop; Slack-first, async | Handles self-contained tasks end-to-end; self-corrects on test failures | 15% task success rate; fails on ambiguous requirements; $500/month; opaque reasoning | Trickle, The Register | +| **GitHub Copilot Workspace** | Issue→Plan→Code→PR; 4 explicit stages; plan is editable | Editable plan before code; GitHub-native integration; 60–70% production-ready output | GitHub-only; no EARS notation; no quality gates; single handoff at code generation | GitHub Blog, VibeCoder review | +| **Cursor 2.0** | IDE-first; Composer with parallel worktree agents; background cloud agents | Speed and polish; repo-wide semantic search; parallel agents in isolated worktrees | No specification layer; no traceability; context rot at scale; informal HITL | Cursor changelog | +| **Windsurf (Codeium) Cascade** | Real-time context tracking; background planning + short-term execution model | Multi-file editing strength; real-time intent inference | 15–20% autocomplete degradation; reliability complaints; no spec/requirements layer | DeployHQ guide | +| **Aider** | CLI; architect model designs → editor model implements; git-native | Open-source; clean commit history; 85% benchmark score; BYOK | No parallel subagents; no workflow state; no quality gates; context rot in long sessions | aider.chat | +| **Cline** | VS Code; Plan mode (read-only) + Act mode (approval-per-action) | Strongest explicit HITL controls; open-source; MCP extensibility; 300K+ installs | Per-action approval doesn't scale; critical prompt injection unpatched 90+ days; no spec layer | Cline GitHub | +| **AWS Kiro** | EARS-native spec generation; steering files (product/design/structure); requirements.md | EARS notation built-in; AWS IDE integration; enterprise positioning | AWS infrastructure dependency; no stable traceability IDs; no verify gate | kiro.dev | +| **GitHub Spec Kit** | 3 commands: /specify, /plan, /tasks; Claude Code + Cursor integration | GitHub-backed; simple DX; quick first result | No EARS; no ID traceability chain; no quality gates; no verify gate; shallow lifecycle | GitHub blog | +| **BMAD-METHOD** | 46,700+ stars; 46+ agents; V6 cross-platform; role-separated lifecycle | Large community; role-separated specialists; enterprise-scale | Enterprise-heavy; steep learning curve; solo-dev inaccessible; no EARS notation | BMAD GitHub | +| **GSD** | Meta-prompting; flat learning curve; solo-dev focused | Fastest time-to-first-result | No methodology; no traceability; single-model assumptions; no quality gates | ObviousWorks | +| **Specorator (current)** | 11-stage lifecycle; EARS notation; REQ/T/TEST ID chains; verify gate; 12 tracks | Only tool with full ID traceability chain + verify gate + multi-track + tool-agnostic Layer 0 | High onboarding friction; command-chain-driven (no single entry point); advisory-only orchestrator | This repo | + +### Claude Code plugin ecosystem + +The Claude Code plugin ecosystem has grown to 176+ community plugins, 135 agents, 35+ skills, 42 commands. Plugin categories relevant to Specorator: workflow orchestration (task decomposition, parallel agents), code review automation, DevOps. No existing plugin combines spec-driven lifecycle + EARS + traceability + orchestrated execution. + +### User needs — evidence + +- **Trust crisis:** Stack Overflow 2025 survey — AI tool trust at 29%, down from 40% in 2024. Developers are willing but reluctant to trust autonomous agents. *(Stack Overflow 2025 Developer Survey)* +- **Verification gap:** 95% of developers report spending significant time reviewing, testing, and correcting AI output. No current tool provides a deterministic pre-stage verification chain. *(O'Reilly radar 2025)* +- **Context rot:** Experienced developers using AI coding tools were 19% slower in a METR RCT, despite predicting 24% gains. Root cause: context windows fill with failed attempts and debug noise, deprioritising earlier constraints. Specorator's file-based spec artifacts directly address this — agents re-read canonical artifacts rather than relying on conversation history. *(METR RCT, CodeRabbit analysis)* +- **AI-coauthored PRs:** 1.7x more major issues than human-written code (CodeRabbit 2025 analysis). The gap is traceable to absent requirements, missing acceptance criteria, and lack of intermediate verification. +- **Enterprise audit demand:** EU AI Act requirements, ISACA governance concerns, and enterprise risk teams are creating demand for agentic workflows that produce durable, human-readable evidence of their reasoning. No current SDD tool outside Specorator produces this chain. *(Galileo, ISACA)* +- **Onboarding friction:** Most-adopted tools (Cursor, Cline, GSD) succeed because users get a useful result in under 5 minutes. Specorator's current onboarding requires reading documentation before getting value — a confirmed adoption blocker. + +--- + +## Alternatives considered + +### Alternative A — Third-party orchestration framework (LangGraph / CrewAI) + +Adopt LangGraph or CrewAI as the orchestration engine. LangGraph provides directed-graph execution with native checkpoint-and-resume (PostgresSaver/SqliteSaver), structured HITL via `interrupt()` + `Command(resume=...)`, and parallel fan-out with barrier synchronisation. CrewAI provides declarative DAG-driven task execution with `Flows` for deterministic routing. + +**Pros:** +- LangGraph: most mature HITL + durable checkpoint model available; explicit state schema with reducers; large production user base with documented failure modes. +- CrewAI: cleanest declarative DAG-driven execution; natural wave scheduling; manager LLM routing built-in. +- Both: battle-tested in production; not invented here. + +**Cons:** +- LangGraph requires a persistent checkpointer backend (PostgreSQL or Redis) for true durable HITL — operational complexity incompatible with a zero-dependency Claude plugin. +- Python-first; TypeScript support is less mature; Specorator is a Markdown-first, TypeScript-tooled project. +- External framework dependency contradicts the "methodology, not a product" positioning and the tool-agnostic Layer 0 value proposition. +- Adding any third-party framework introduces a version coupling risk in a fast-moving ecosystem. +- CrewAI HITL is less mature than LangGraph — no native durable pause-and-resume across process boundaries. + +**Verdict:** Rejected. The operational cost and positioning risk outweigh the checkpoint sophistication. Specorator's `workflow-state.md` on disk provides durable-enough state for synchronous HITL gates where the human responds in seconds to minutes, not days. + +--- + +### Alternative B — Anthropic native Orchestrator-Subagent pattern (recommended) + +Implement the Anthropic-published Orchestrator-Subagent pattern directly using Claude Code's native Agent tool dispatch, without adopting a third-party orchestration framework. State is managed in `workflow-state.md` (Zod-typed per ADR-0042). HITL gates use `AskUserQuestion` at three defined points. DAG wave scheduling is a topological sort (~100 lines of control flow) in the orchestrator skill. + +**Pros:** +- Zero additional dependencies — Claude Code's Agent tool is the only dispatch mechanism needed. +- Direct alignment with Anthropic's own published patterns. The orchestrator's 90.2% performance improvement in Anthropic's multi-agent research system traces to parallel reasoning across more aggregate context, not to framework magic. +- Each stage specialist already receives only its predecessor artifact — context compression is natural by design. +- `workflow-state.md` written to disk before each `AskUserQuestion` call provides durable-enough state persistence for synchronous HITL. +- No platform lock-in beyond Claude Code itself (which is the target environment by definition). +- Subagent context is clean per spawn — no context rot. + +**Cons:** +- No built-in durable pause-and-resume across process crashes (if the orchestrator process dies between HITL gates, in-flight subagent results from the current wave are lost and must be re-run). +- Stall detection, retry logic, and wave scheduling are hand-rolled rather than framework-provided. Estimated ~200–300 lines of orchestrator control flow. +- No peer-to-peer subagent communication (subagents report to orchestrator only) — complex multi-subagent negotiation patterns are not possible in v1. + +**Verdict:** Recommended. The architectural fit with Specorator's existing model is high; the engineering investment in wave scheduling and stall detection is bounded and implementable; the HITL durability constraint is acceptable for synchronous gates. + +--- + +### Alternative C — Claude Code Agent Teams (experimental) + +Use Claude Code's experimental agent teams feature (`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`) to give subagents peer-to-peer communication, a shared task list, and independent context windows. The orchestrator is the lead; specialists are teammates. + +**Pros:** +- Peer-to-peer messaging enables richer coordination between parallel implementers. +- Shared task list allows teammates to mark their own tasks complete without returning to orchestrator. +- Independent context windows prevent any single agent's context from growing unbounded. + +**Cons:** +- Experimental feature, disabled by default — shipping risk for a publicly distributed plugin. +- Known limitations: `skills` and `mcpServers` frontmatter fields are NOT applied when an agent definition runs as a teammate, silently breaking any agent that depends on pre-loaded skills. +- No session resumption for in-process teammates — a crash loses all teammate state. +- Task status can lag; teammates may not mark tasks complete reliably. +- One team at a time per lead — no nested teams. +- Performance: agent teams scale token costs linearly with teammate count. + +**Verdict:** Reserved for v2. Track `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` status; revisit when teams stabilise. The v1 architecture is designed to be extensible to teams without breaking changes. + +--- + +## Technical considerations + +### Claude Code plugin architecture + +**Plugin format:** A proper Claude Code plugin requires: +- `.claude-plugin/plugin.json` — the Claude Code plugin manifest (name is the only required field; all component paths are relative to the plugin root) +- `settings.json` at the plugin root — declares `agent: "orchestrator"` to make the orchestrator the main session agent when the plugin is enabled +- `agents/orchestrator.md` — the main agent definition (YAML frontmatter + system prompt) +- `agents/`, `skills/`, `commands/` — component directories mirrored from `.claude/` +- `.mcp.json` — MCP server declarations for the plugin + +**Security constraints on plugin agents (hard limits):** +- `hooks`, `mcpServers`, and `permissionMode` frontmatter fields are stripped from plugin agent definitions. Plugin-level hooks and MCP must be declared in the plugin manifest. +- Path traversal (`../`) is blocked after plugin caching — all resources must be within the plugin root. +- Plugin agents inherit the session permission mode; they cannot override it. + +**Subagent execution constraints (hard platform limits):** +- Subagents cannot spawn subagents — the Agent tool is removed from subagent contexts. The orchestrator must be the root session agent (enforced by `settings.json agent: orchestrator`). +- Subagents do not inherit the parent's conversation history — they receive only their spawn prompt. Context compression is enforced by the platform, not optional. +- `isolation: worktree` is the only supported isolation mode for plugin agents. Worktrees are auto-cleaned if the agent makes no changes. + +**Naming collision resolution:** Specorator uses two manifest systems that must coexist: +- `plugins/*/manifest.md` + `plugins/*/schema.json` (ADR-0036): Specorator's internal capability contract layer, used as input to the future MCP Server (issue #316). These remain unchanged — they are the MCP contract layer. +- `.claude-plugin/plugin.json`: the new Claude Code plugin entry point. This is a separate file at the repo root within the `claude-plugin/specorator/` bundle directory, generated by `build-claude-plugin.ts` from canonical sources. The two systems coexist without conflict. + +### Dispatch authority refactor (current gap) + +The current orchestrator agent has only `tools: [Read, Grep]`. To become a dispatch authority, it needs `tools: [Agent, Read, Edit, Write, AskUserQuestion]`: +- `Agent` — to spawn specialist subagents (researcher, architect, planner, dev, qa, reviewer) +- `Read/Write/Edit` — to manage `workflow-state.md` state transitions +- `AskUserQuestion` — to implement HITL gates + +The orchestrator must also own `workflow-state.md` transitions (currently updated by individual commands). This is a breaking change to the command dispatch model: commands become building blocks invoked by the orchestrator, not standalone entry points for state mutation. + +### State management model + +`workflow-state.md` is the durable checkpoint: +- Written to disk before every `AskUserQuestion` call (HITL gate) — ensures state is recoverable if the session crashes during the human decision window +- Typed via Zod schema (ADR-0042 migration path) — enables the orchestrator to validate preconditions before dispatching +- Owned by the orchestrator — only the orchestrator writes stage transitions; specialist subagents write their artifact files but do not modify workflow-state.md + +Living spec principle: specialist subagents receive only the artifact relevant to their task (e.g., planner receives `requirements.md` and `design.md`, not full conversation history). This is the primary mechanism for preventing context rot across a multi-stage session. + +### DAG wave execution + +The task DAG from `tasks.md` is executed in topological waves: +1. Parse `tasks.md` — extract nodes (tasks) and edges (depends_on) +2. Topological sort (Kahn's BFS algorithm) — produces ordered waves +3. Each wave: orchestrator dispatches one subagent per task in the wave (parallel Agent tool calls in a single turn) +4. Collect results — orchestrator validates each result against the task's expected output schema +5. Stall detection — counter per wave; if N consecutive steps produce no progress, orchestrate escalates to HITL +6. Advance to next wave only after all tasks in the current wave pass validation + +Parallel write conflict prevention: `isolation: worktree` per implementer subagent. Each subagent works in its own isolated worktree. The orchestrator merges worktrees after each wave via the reviewer subagent, not automatically. + +### Plugin distribution + +ADR-0043 distribution model is compatible with the orchestrator-first architecture: +- `.claude-plugin/plugin.json` is added to the build output of `build-claude-plugin.ts` +- `settings.json` (with `agent: "orchestrator"`) is added to `claude-plugin/specorator/` +- CI rebuilds `dist/claude-plugin` orphan branch on every push to `main` — no change needed +- Marketplace entry (`git-subdir` source) continues to point at `dist/claude-plugin` — no change needed + +### Model selection for subagents + +The orchestrator reads `SPECORATOR_HEAVY_MODEL` env var (if set) and applies it to heavy-tier subagents (architect, dev, reviewer). Light-tier subagents (researcher for simple scopes, planner for small task lists) use the session default model. This replaces the current per-command model selection with orchestrator-owned routing — consistent application across all dispatches. + +--- + +## Risks + +| ID | Risk | Severity | Likelihood | Mitigation | +|---|---|---|---|---| +| RISK-ORCH-001 | Error compounding in multi-agent chains: documented 17.2x amplification in uncoordinated systems, ~4.4x even with centralized coordination | High | High | Typed output schemas + validation gate between every stage; reviewer subagent as blocking check before results surface to user | +| RISK-ORCH-002 | Parallel write conflicts if implementer subagents modify overlapping files | High | Medium | `isolation: worktree` per implementer; merge mediated by reviewer subagent after each wave, not automatic | +| RISK-ORCH-003 | HITL gate bottleneck if placed too frequently | Medium | Medium | Three defined gates only (post-scope, post-design-approval, post-review); no additional gates in v1 | +| RISK-ORCH-004 | Orchestrator context window exhaustion on long-running sessions | High | Medium | Living spec pattern: orchestrator reads artifact files, not conversation history; subagents with clean contexts; session summary at completion | +| RISK-ORCH-005 | Infinite loops / stalled subagents | Medium | Medium | Max iteration budget per subagent (3 retries); stall detection counter in orchestrator; escalation to HITL on unrecoverable state | +| RISK-ORCH-006 | Decomposition errors: orchestrator marks dependent tasks as parallel | High | Medium | Human review of DAG at design-approval gate (HITL point 2) before implementers are spawned; planner subagent produces explicit `depends_on` edges | +| RISK-ORCH-007 | Agent performance degrades over consecutive runs (58% degradation from 1 to 8 consecutive runs) | Medium | Medium | Spawn fresh subagents per task; orchestrator does not reuse persistent agents across the full workflow | +| RISK-ORCH-008 | Plugin manifest naming collision breaks build pipeline | Medium | Low | `plugins/*/manifest.md` (ADR-0036 layer) and `.claude-plugin/plugin.json` are separate files with separate concerns; `build-claude-plugin.ts` generates the latter, doesn't touch the former | +| RISK-ORCH-009 | `settings.json agent` priority resolution if project `.claude/settings.json` declares a different agent | Low | Low | Claude Code project settings override plugin settings for same keys; document this as expected behavior; orchestrator is the plugin default, not forced | +| RISK-ORCH-010 | Kiro (AWS) becomes default EARS-aware spec tool before Specorator achieves awareness | High | Medium | Publish verify gate + traceability chain + multi-track breadth as headline capabilities; Specorator's tool-agnostic Layer 0 cannot be matched by an AWS-specific tool | +| RISK-ORCH-011 | BMAD's community momentum (46,700+ stars) dominates Claude plugin searches | Medium | High | Target specific persona (senior solo dev, small agency) with concrete before/after examples; compete on depth, not star counts | +| RISK-ORCH-012 | Orchestrator becomes monolithic as all dispatch logic concentrates in one skill | High | Medium | Decompose orchestrator into phase-specific sub-skills (scope-phase, research-phase, design-phase, plan-phase, implement-phase, review-phase); orchestrator skill is the conductor only | + +--- + +## Recommendation + +**Adopt Alternative B: Anthropic native Orchestrator-Subagent pattern with explicit DAG wave execution and three synchronous HITL gates.** + +Implement the goal-loop as a new conductor skill (`goal-loop` or `orchestrate-issue`) that gives the existing `orchestrator` agent dispatch authority. Simultaneously refactor the plugin packaging to add `.claude-plugin/plugin.json` and `settings.json { "agent": "orchestrator" }`. The 11 existing stage commands become building blocks invoked by the orchestrator; they remain available as standalone slash commands for users who prefer direct control. + +**Three HITL interrupt points (AskUserQuestion):** +1. Post-scope: confirm problem framing, EARS acceptance criteria, and researcher scope before spawning parallel Researchers. Cost of getting this wrong is high (wasted parallel research waves). +2. Post-design: show the synthesised design.md and proposed task DAG; human approves, edits, or rejects before Implementers are spawned. Last affordable correction point before code is written. +3. Post-review: human accepts the final output or specifies a targeted revision. Revision re-enters the loop at the Implement wave with the reviewer's findings as additional context. + +**Phase approach for implementation:** +- Phase 1: Plugin packaging (`.claude-plugin/plugin.json` + `settings.json`); orchestrator tool expansion (`Agent, Read, Write, Edit, AskUserQuestion`); workflow-state.md Zod schema (ADR-0042 prerequisite) +- Phase 2: Goal-loop conductor skill (`scope-phase → research-wave → design-synthesis → plan-phase → implement-waves → review-phase → summary`) +- Phase 3: Issue-tackle integration (orchestrator detects GitHub issue reference, uses issue body as scope context, delegates to goal-loop) +- Phase 4: Plugin registry runtime loading (orchestrator reads `plugins/*/schema.json` to discover capabilities; enables extensible dispatch without code changes) + +**What still needs validating:** +- Exact behavior when a plugin's `settings.json` specifies `agent: "orchestrator"` and the project also has a `.claude/settings.json` — priority resolution for the `agent` key needs testing +- Wave scheduler performance with 5 parallel subagents on a large monorepo — worktree creation time may become a bottleneck +- Stall detection threshold tuning — the right max-iteration budget per subagent needs empirical calibration during beta testing + +--- + +## Sources + +- [Claude Code — Create plugins](https://code.claude.com/docs/en/plugins) +- [Claude Code — Plugins reference](https://code.claude.com/docs/en/plugins-reference) +- [Claude Code — Create and distribute a plugin marketplace](https://code.claude.com/docs/en/plugin-marketplaces) +- [Claude Code — Create custom subagents](https://code.claude.com/docs/en/sub-agents) +- [Claude Code — Orchestrate teams of Claude Code sessions](https://code.claude.com/docs/en/agent-teams) +- [Claude Code — Plugins in the SDK](https://code.claude.com/docs/en/agent-sdk/plugins) +- [Anthropic Engineering — How we built our multi-agent research system](https://www.anthropic.com/engineering/multi-agent-research-system) +- [Anthropic — Multi-agent coordination patterns: Five approaches](https://claude.com/blog/multi-agent-coordination-patterns) +- [Anthropic — Building effective AI agents](https://resources.anthropic.com/building-effective-ai-agents) +- [LangGraph — Interrupts (HITL)](https://docs.langchain.com/oss/python/langgraph/interrupts) +- [LangGraph — Making it easier to build human-in-the-loop agents](https://www.langchain.com/blog/making-it-easier-to-build-human-in-the-loop-agents-with-interrupt) +- [Microsoft Research — Magnetic-One: A Generalist Multi-Agent System](https://www.microsoft.com/en-us/research/articles/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/) +- [Towards Data Science — Why Your Multi-Agent System is Failing: The 17x Error Trap](https://towardsdatascience.com/why-your-multi-agent-system-is-failing-escaping-the-17x-error-trap-of-the-bag-of-agents/) +- [Anthropic — Building agents with the Claude Agent SDK](https://claude.com/blog/building-agents-with-the-claude-agent-sdk) +- [GitHub Blog — From idea to PR: a guide to GitHub Copilot's agentic workflows](https://github.blog/ai-and-ml/github-copilot/from-idea-to-pr-a-guide-to-github-copilots-agentic-workflows/) +- [GitHub — Spec Kit](https://github.com/github/spec-kit) +- [BMAD-METHOD — GitHub](https://github.com/bmad-code-org/BMAD-METHOD) +- [Kiro — Introducing Kiro](https://kiro.dev/blog/introducing-kiro/) +- [Stack Overflow — 2025 Developer Survey](https://stackoverflow.blog/2025/12/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/) +- [METR RCT — AI tools slow experienced developers](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-developer-study/) +- [O'Reilly — AI is writing code faster than we can verify it](https://www.oreilly.com/radar/ai-is-writing-our-code-faster-than-we-can-verify-it/) +- [ISACA — The growing challenge of auditing agentic AI](https://www.isaca.org/resources/news-and-trends/industry-news/2025/the-growing-challenge-of-auditing-agentic-ai) +- [Subagents cannot spawn subagents — GitHub Issue](https://github.com/anthropics/claude-code/issues/19077) +- Specorator codebase audit (internal, 2026-05-13) — 36 agents, 38 skills, 85 commands, 12 plugin groups; current orchestrator advisory-only; ADR-0036 through ADR-0045 reviewed + +--- + +## Quality gate + +- [x] Each research question is answered or marked open. +- [x] Sources cited. +- [x] ≥ 2 alternatives explored. +- [x] User needs supported by evidence (or assumptions explicit). +- [x] Technical considerations noted. +- [x] Risks listed with severity. +- [x] Recommendation made. diff --git a/specs/goal-oriented-orchestrator-plugin/spec.md b/specs/goal-oriented-orchestrator-plugin/spec.md new file mode 100644 index 000000000..0c7c51604 --- /dev/null +++ b/specs/goal-oriented-orchestrator-plugin/spec.md @@ -0,0 +1,956 @@ +--- +id: SPECDOC-ORCH-001 +title: Goal-oriented orchestrator plugin — Specification +stage: specification +feature: goal-oriented-orchestrator-plugin +status: accepted +owner: architect +inputs: + - PRD-ORCH-001 + - DESIGN-ORCH-001 +adrs: + - ADR-0046 + - ADR-0047 + - ADR-0048 +created: 2026-05-13 +updated: 2026-05-13 +--- + +# Specification — Goal-oriented orchestrator plugin + +Implementation-ready contracts. The spec is precise enough that two independent teams could implement it and produce indistinguishable behaviour. + +--- + +## Scope + +This specification covers the behavioural contracts for the goal-oriented orchestrator plugin feature. It does not cover implementation details that the spec does not need to constrain (e.g., exact TypeScript module structure, internal variable naming) and does not restate design rationale already captured in `design.md` or the three ADRs. + +| Item | SPEC-ID | REQ-IDs | +|---|---|---| +| Orchestrator agent tool-list expansion | SPEC-ORCH-001 | REQ-ORCH-001, REQ-ORCH-002, REQ-ORCH-003, REQ-ORCH-004, REQ-ORCH-005, REQ-ORCH-021 | +| goal-loop conductor skill entry point | SPEC-ORCH-002 | REQ-ORCH-006, REQ-ORCH-007, REQ-ORCH-023 | +| Scope phase and Gate 1 contract | SPEC-ORCH-003 | REQ-ORCH-008, REQ-ORCH-022 | +| Research wave | SPEC-ORCH-004 | REQ-ORCH-009, REQ-ORCH-010 | +| Design synthesis phase and Gate 2 | SPEC-ORCH-005 | REQ-ORCH-011, REQ-ORCH-022 | +| Plan phase | SPEC-ORCH-006 | REQ-ORCH-012 | +| Implement wave executor | SPEC-ORCH-007 | REQ-ORCH-013, REQ-ORCH-004 | +| Stall detector and stall gate | SPEC-ORCH-008 | REQ-ORCH-014 | +| Review phase and Gate 3 | SPEC-ORCH-009 | REQ-ORCH-015, REQ-ORCH-022 | +| Session summary writer | SPEC-ORCH-010 | REQ-ORCH-016 | +| workflow-state.md goal_loop schema extension | SPEC-ORCH-011 | REQ-ORCH-002, REQ-ORCH-022 | +| scope.md artifact schema | SPEC-ORCH-012 | REQ-ORCH-008 | +| session-summary.md artifact schema | SPEC-ORCH-013 | REQ-ORCH-016 | +| .claude-plugin/plugin.json contract | SPEC-ORCH-014 | REQ-ORCH-017, REQ-ORCH-019 | +| settings.json agent declaration | SPEC-ORCH-015 | REQ-ORCH-018, REQ-ORCH-019 | +| build-claude-plugin.ts generation changes | SPEC-ORCH-016 | REQ-ORCH-019 | +| check-agents.ts frontmatter validation rule | SPEC-ORCH-017 | REQ-ORCH-020 | + +**Out of scope for this spec:** +- Implementation code (NG6 in requirements.md) +- Agent teams mode or third-party orchestration frameworks (NG1, NG2) +- Persistent memory across independent Claude Code sessions (NG3) +- Billing or quota management (NG4) +- Multi-repository workspaces (NG5) + +--- + +## 1 Orchestrator agent tool-list expansion (SPEC-ORCH-001) + +**Governs:** `.claude/agents/orchestrator.md` frontmatter `tools:` list. + +### 1.1 Required tools + +The orchestrator agent frontmatter MUST declare exactly the following tools and no others: + +``` +tools: + - Agent + - AskUserQuestion + - Read + - Write + - Edit +``` + +> **Rationale:** `Agent` is required for explicit subagent dispatch per ADR-0046. `AskUserQuestion` is required for all HITL gates (REQ-ORCH-008 and §2.4). ADR-0046 explicitly limits the orchestrator to these five tools and states it should not gain Bash, WebSearch, WebFetch, or GitHub tools — those capabilities are delegated to specialist subagents. See ADR-0046 §4. + +### 1.2 Prohibited tools + +The orchestrator agent MUST NOT declare: +- `Task` (use `Agent` for subagent dispatch per ADR-0046) +- `Bash` (delegate to specialist subagents) +- `WebSearch` or `WebFetch` (delegate to research subagents) +- `TodoWrite` (not an orchestrator concern) +- `mcp__github__*` (delegate to specialist subagents) +- `NotebookEdit` or any Jupyter-specific tool +- Any tool not in the list in §1.1 + +### 1.3 Verification + +`check-agents.ts` MUST flag any deviation from the list in §1.1 as a CI failure (see SPEC-ORCH-017). + +--- + +## 2 goal-loop conductor skill entry point (SPEC-ORCH-002) + +**Governs:** `.claude/skills/goal-loop/SKILL.md` + +### 2.1 Trigger conditions + +The skill MUST activate on any of the following natural-language triggers (case-insensitive, substring match): + +| Trigger phrase | Notes | +|---|---| +| `"drive this end-to-end"` | Primary phrase | +| `"let's start a feature"` | Alternate entry | +| `"work on [goal]"` | Any goal-shaped prompt | +| `"implement [feature]"` | Feature-start intent | +| `"build [feature]"` | Build intent | +| `"/goal-loop"` | Explicit slash-command | + +> The trigger list is illustrative, not exhaustive. The skill MUST apply reasonable intent-matching. + +### 2.2 Slash-command passthrough + +When the orchestrator detects any registered slash command (i.e., a command listed in the plugin manifest), it MUST: + +1. Route the request to the appropriate specialist subagent without entering the goal-loop. +2. Not inject orchestration scaffolding (scope.md, session-summary.md, gates). +3. Return the subagent's output unmodified. + +**Contract:** Slash-command passthrough is transparent — the user experiences identical behaviour to invoking the subagent directly. + +### 2.3 Session initialisation + +On goal-loop activation, the orchestrator MUST: + +1. Create or update `specs//workflow-state.md` with `goal_loop` block (see SPEC-ORCH-011). +2. Write `specs//scope.md` on Scope phase completion (see SPEC-ORCH-012). +3. Write `specs//session-summary.md` on session end (see SPEC-ORCH-013). + +### 2.4 AskUserQuestion gate + +If the user's initial message does not already contain an unambiguous goal statement, the orchestrator MUST call `AskUserQuestion` with: + +``` +"What is the goal for this session? (Describe the feature or change you want to achieve.)" +``` + +--- + +## 3 Scope phase and Gate 1 contract (SPEC-ORCH-003) + +### 3.1 Scope phase inputs + +The scope phase accepts: +- User goal statement (from §2.4) +- Existing `specs//` artifacts (if any) +- `inputs/` folder contents + +### 3.2 Scope phase outputs + +The scope phase MUST produce `scope.md` (schema: SPEC-ORCH-012) before proceeding to Gate 1. + +### 3.3 Gate 1 — scope approval + +Gate 1 is a blocking human-approval gate. The orchestrator MUST: + +1. Present the completed `scope.md` to the user. +2. Ask: `"Does this scope capture your goal correctly? (yes / edit / abort)"` +3. On `"yes"`: proceed to Research wave. +4. On `"edit"`: accept edits, rewrite `scope.md`, re-present, repeat. +5. On `"abort"`: terminate the session, write `status: aborted` to `workflow-state.md`. + +**Contract:** The orchestrator MUST NOT proceed past Gate 1 without explicit user approval. + +--- + +## 4 Research wave (SPEC-ORCH-004) + +### 4.1 Wave composition + +The research wave MUST: + +1. Spawn one or more specialist subagents via `Agent` tool in parallel (max concurrency: 5). +2. Each subagent receives: goal statement, scope.md, relevant prior artifacts. +3. Subagents operate within their defined tool lists (see AGENTS.md agent-class table). + +### 4.2 Research outputs + +Each research subagent MUST return a structured finding block: + +``` +## Finding: +**Source:** +**Relevance:** <1-sentence relevance to goal> +**Summary:** <2–5 sentences> +``` + +### 4.3 Research synthesis + +After all research subagents complete, the orchestrator MUST: + +1. Consolidate findings into `specs//research.md`. +2. Identify gaps that require design decisions. +3. Proceed to Design synthesis phase. + +--- + +## 5 Design synthesis phase and Gate 2 (SPEC-ORCH-005) + +### 5.1 Design synthesis inputs + +- `scope.md` with research-summary +- Prior `design.md` (if exists) +- ADR references from `scope.md` + +### 5.2 Design synthesis outputs + +The design synthesis phase MUST produce: +- `specs//design.md` containing key design decisions and their rationale. This is the canonical design artifact; Gate 2 edits MUST update `design.md` directly. +- `scope.md` updated with a `design_summary` cross-reference section pointing to `design.md`. +- If new irreversible decisions are made: ADR stubs in `docs/adr/`. + +### 5.3 Gate 2 — design approval + +Gate 2 is a blocking human-approval gate. The orchestrator MUST: + +1. Present the `design_decisions` section from `design.md` to the user. +2. Ask: `"Do these design decisions look right? (yes / edit / abort)"` +3. On `"yes"`: proceed to Plan phase. +4. On `"edit"`: accept edits, update `design.md`, re-present, repeat. +5. On `"abort"`: terminate the session, write `status: aborted` to `workflow-state.md`. + +**Contract:** The orchestrator MUST NOT proceed past Gate 2 without explicit user approval. + +--- + +## 6 Plan phase (SPEC-ORCH-006) + +### 6.1 Plan outputs + +The plan phase MUST produce `specs//tasks.md` with the following task entry format: + +```yaml +- id: T-ORCH-NNN + description: + depends_on: [] # list of T-ORCH-NNN IDs + agent: + estimated_complexity: low | medium | high +``` + +### 6.2 Plan constraints + +- Every task MUST have a unique `T--NNN` ID. +- Dependency graph MUST be a DAG (no cycles). +- Complexity estimates are informational only (not blocking). + +--- + +## 7 Implement wave executor (SPEC-ORCH-007) + +### 7.1 Wave execution order + +The implement wave executor MUST: + +1. Topologically sort tasks from `tasks.md`. +2. Execute independent tasks in parallel (max concurrency: 3 simultaneous `Agent` calls). +3. Execute dependent tasks only after all dependencies complete with status `done`. + +### 7.2 Task status tracking + +The orchestrator MUST update `workflow-state.md` after each task completes: + +```yaml +goal_loop: + tasks: + T-ORCH-NNN: + status: pending | running | done | failed + started_at: + completed_at: + agent: +``` + +### 7.3 Task failure handling + +On task failure: +1. Mark task `status: failed` in `workflow-state.md`. +2. Pause execution of all dependent tasks. +3. Present failure summary to user with options: `retry | skip | abort`. +4. On `retry`: re-execute failed task (max 2 retries per task). +5. On `skip`: mark task `status: skipped`, continue with non-dependent tasks. +6. On `abort`: terminate session, write `status: aborted`. + +### 7.4 Model selection for heavy-tier subagents + +WHEN `SPECORATOR_HEAVY_MODEL` is set and non-empty, the orchestrator MUST pass that model identifier to the `Agent` tool when dispatching architect, dev, and reviewer subagents. + +WHEN `SPECORATOR_HEAVY_MODEL` is absent or empty, the orchestrator uses the session default model for all subagents. + +**Satisfies:** REQ-ORCH-004 + +--- + +## 8 Stall detector and stall gate (SPEC-ORCH-008) + +### 8.1 Stall detection criteria + +A task is considered stalled when ANY of: + +| Condition | Threshold | +|---|---| +| Task has been `running` with no output | > 5 minutes | +| Task has produced > 3 consecutive identical outputs | — | +| Task has called the same tool > 20 times | — | + +### 8.2 Stall gate behaviour + +On stall detection, the orchestrator MUST: + +1. Interrupt the stalled task. +2. Capture the last 500 tokens of task output. +3. Present to user: `"Task T-ORCH-NNN appears stalled. Last output: [excerpt]. Options: retry | skip | abort"` +4. On `retry`: re-execute with a modified prompt prepending: `"Previous attempt stalled. Focus on: ."` +5. On `skip` or `abort`: follow §7.3 handling. + +--- + +## 9 Review phase and Gate 3 (SPEC-ORCH-009) + +### 9.1 Review phase inputs + +- All `done` task outputs from implement wave +- Original `scope.md` +- Acceptance criteria from `scope.md` + +### 9.2 Review phase outputs + +The review phase MUST produce a `review_summary` section in `scope.md`: + +```yaml +review_summary: + passed: true | false + findings: + - id: R-ORCH-NNN + severity: critical | major | minor + description: + task_ref: T-ORCH-NNN +``` + +### 9.3 Gate 3 — review approval + +Gate 3 is a blocking human-approval gate. The orchestrator MUST: + +1. Present `review_summary` to user. +2. Ask: `"Review complete. Proceed to session summary? (yes / fix / abort)"` +3. On `"yes"`: proceed to session summary. +4. On `"fix"`: re-enter implement wave for critical/major findings only. +5. On `"abort"`: terminate, write `status: aborted`. + +--- + +## 10 Session summary writer (SPEC-ORCH-010) + +### 10.1 Session summary content + +The session summary writer MUST produce `specs//session-summary.md` with: + +```yaml +--- +id: SUMMARY-ORCH-NNN +feature: +session_date: +goal: +status: completed | aborted | partial +--- +``` + +Followed by sections: + +1. **Goal achieved** — one sentence. +2. **Tasks completed** — bulleted list of T-IDs with one-line descriptions. +3. **Artifacts produced** — list of file paths written this session. +4. **Open items** — tasks marked `skipped` or `failed`. +5. **Next steps** — recommended follow-on actions (max 5 bullets). + +### 10.2 Session summary constraints + +- Session summary MUST be written before the session terminates (success or abort). +- On `abort`: status = `aborted`; sections 2–5 reflect work done before abort. + +--- + +## 11 workflow-state.md goal_loop schema extension (SPEC-ORCH-011) + +### 11.1 Schema + +The `goal_loop` block MUST conform to: + +```yaml +goal_loop: + status: active | completed | aborted + goal: + session_id: + current_phase: scope | research | design | plan | implement | review | complete | aborted + gates: + gate_1: pending | approved | rejected + gate_2: pending | approved | rejected + gate_3: pending | approved | rejected + hitl_state: + pending_question: + last_response: + researcher_count: + wave_schedule: + - wave_id: + phase: research | implement + task_ids: [, ...] + started_at: + completed_at: + stall_counters: + : + stall_count: + last_stall_at: + artifacts_produced: + - + tasks: + : + status: pending | running | done | failed | skipped + started_at: + completed_at: + agent: +``` + +> **Rationale:** `hitl_state`, `researcher_count`, `wave_schedule`, `stall_counters`, and `artifacts_produced` are required by ADR-0047 to support session resume (REQ-ORCH-014) and stall recovery (REQ-ORCH-022). Without these fields the orchestrator cannot reliably reconstruct execution context across restarts. + +### 11.2 Write rules + +- The orchestrator MUST initialise the `goal_loop` block at session start. +- The orchestrator MUST update `current_phase` on every phase transition. +- The orchestrator MUST update gate status immediately after user approval/rejection. +- The orchestrator MUST update `hitl_state` before and after every `AskUserQuestion` call. +- The orchestrator MUST increment `researcher_count` each time a research subagent is spawned. +- The orchestrator MUST append to `wave_schedule` when each wave starts and update `completed_at` when it ends. +- The orchestrator MUST increment `stall_counters..stall_count` on each stall detection event. +- The orchestrator MUST append each written file path to `artifacts_produced`. +- The orchestrator MUST NOT read `goal_loop` from a previous session without explicit user instruction to resume. + +--- + +## 12 scope.md artifact schema (SPEC-ORCH-012) + +### 12.1 Frontmatter + +```yaml +--- +id: SCOPE--NNN +feature: +goal: +created: +updated: +gate_1_approved: false | true +gate_2_approved: false | true +--- +``` + +### 12.2 Required sections + +| Section heading | Required? | Gate | +|---|---|---| +| `## Goal` | Always | — | +| `## Context` | Always | — | +| `## Acceptance criteria` | Always | Gate 1 | +| `## Out of scope` | Always | Gate 1 | +| `## Research summary` | After research wave | Gate 2 | +| `## Design decisions` | After design synthesis | Gate 2 | +| `## Plan` | After plan phase | — | +| `## Review summary` | After review phase | Gate 3 | + +### 12.3 Acceptance criteria format + +Each acceptance criterion MUST use EARS notation and have a unique `AC-NNN` ID: + +``` +AC-001: WHEN the user invokes /goal-loop, the system SHALL present a scope document within 30 seconds. +``` + +--- + +## 13 session-summary.md artifact schema (SPEC-ORCH-013) + +See §10.1 for the full schema. Additional constraints: + +- File MUST be located at `specs//session-summary.md`. +- If multiple sessions occur for the same feature, summaries are appended with an `---` separator; the frontmatter `id` increments (SUMMARY-ORCH-001, SUMMARY-ORCH-002, …). +- The file MUST be committed to the working branch before session end. + +--- + +## 14 .claude-plugin/plugin.json contract (SPEC-ORCH-014) + +**Governs:** `.claude-plugin/plugin.json` (source file) generated by `build-claude-plugin.ts`. + +### 14.1 Required top-level fields + +```json +{ + "schema_version": "1", + "name": "specorator", + "version": "", + "description": "", + "commands": [ ... ], + "agents": [ ... ] +} +``` + +All six fields are required. Missing or empty fields MUST cause `build-claude-plugin.ts --check` to exit non-zero. + +### 14.2 commands array entry schema + +Each entry in `commands` MUST conform to: + +```json +{ + "name": "", + "description": "", + "source_path": "" +} +``` + +### 14.3 agents array entry schema + +Each entry in `agents` MUST conform to: + +```json +{ + "name": "", + "description": "", + "source_path": "" +} +``` + +### 14.4 Completeness requirement + +The `commands` array MUST contain one entry for every `.md` file under `.claude/commands/` (recursively), except `README.md` files. The `agents` array MUST contain one entry for every `.md` file under `.claude/agents/`, except `README.md` files. + +**Contract:** No command or agent may be omitted from the manifest. + +--- + +## 15 settings.json agent declaration (SPEC-ORCH-015) + +**Governs:** Plugin-bundle `settings.json` (the `settings.json` packaged inside the plugin bundle, not `.claude/settings.json`). + +### 15.1 Orchestrator agent entry + +The plugin-bundle `settings.json` MUST declare the orchestrator at the top level: + +```json +{ + "agent": "orchestrator" +} +``` + +This top-level `"agent"` key identifies the primary agent for the plugin bundle, satisfying REQ-ORCH-018. It MUST NOT be nested under an `agents` array. + +### 15.2 No MCP server changes + +The orchestrator agent does NOT require a new MCP server entry. Existing `mcp__github__*` tools are already available via the configured GitHub MCP server. + +--- + +## 16 build-claude-plugin.ts generation changes (SPEC-ORCH-016) + +**Governs:** `scripts/build-claude-plugin.ts` + +### 16.1 Generation steps + +The script MUST perform these steps in order: + +1. Walk `.claude/commands/` recursively; collect all `.md` files → `commands` entries. +2. Walk `.claude/agents/` recursively; collect all `.md` files → `agents` entries. +3. Both generation steps run BEFORE `dist/claude-plugin` is updated (NFR-ORCH-005: `--check` must pass before any update to `dist/claude-plugin`). +4. The `--check` flag validates both generated files without performing any writes to `dist/claude-plugin`. +5. No manual editing of `.claude-plugin/plugin.json` or `.claude-plugin/agents.json` is required after running the script. + +### 16.2 --check flag contract + +When invoked with `--check`: + +1. Generate the manifest in memory. +2. Compare against the on-disk `.claude-plugin/plugin.json`. +3. Validate `settings.json` structure against SPEC-ORCH-015 §15.1 (in memory; no write). +4. If manifest identical and `settings.json` valid: exit code 0. +5. If either is invalid or stale: exit code 1; print a unified diff to stdout. +6. Write nothing to disk. + +### 16.3 Error codes + +| Code | Meaning | +|---|---| +| 0 | Success (or check passed) | +| 1 | Check failed (diff exists) | +| 2 | Missing source file | +| 3 | Schema validation error | +| 4 | File system error | + +--- + +## 17 check-agents.ts frontmatter validation rule (SPEC-ORCH-017) + +**Governs:** `scripts/check-agents.ts` + +### 17.1 New validation rule + +`check-agents.ts` MUST add two validation rules: + +**R-ORCH-TOOLS** — for any agent file with `name: orchestrator`, the `tools:` list MUST exactly match the list in SPEC-ORCH-001 §1.1. + +**R-ORCH-FRONTMATTER** — no agent file may declare prohibited tools in its `tools:` frontmatter field. Prohibited values: `Bash`, `WebSearch`, `WebFetch`, `mcp__github__*` (any). These tools belong to specialist subagents, not the orchestrator. + +**R-ORCH-PROHIBITED-FRONTMATTER** — no orchestrator agent file (`name: orchestrator`) may contain `hooks:`, `mcpServers:`, or `permissionMode:` at the frontmatter level. These keys alter agent trust boundaries; their presence MUST cause `check-agents.ts` to emit an error and exit non-zero. + +### 17.2 Error message format + +On violation, the script MUST emit: + +``` +ERROR [R-ORCH-TOOLS] .claude/agents/orchestrator.md: tools list does not match SPEC-ORCH-001. + Expected: Agent, AskUserQuestion, Read, Write, Edit + Found: +``` + +### 17.3 CI integration + +The rule MUST be included in the existing `npm run verify` pipeline (no new CI job required). + +--- + +## 18 Data structures + +### 18.1 GoalLoopState + +```typescript +interface GoalLoopState { + status: 'active' | 'completed' | 'aborted'; + goal: string; + sessionId: string; // ISO-8601 datetime + currentPhase: 'scope' | 'research' | 'design' | 'plan' | 'implement' | 'review' | 'complete' | 'aborted'; + gates: { + gate1: 'pending' | 'approved' | 'rejected'; + gate2: 'pending' | 'approved' | 'rejected'; + gate3: 'pending' | 'approved' | 'rejected'; + }; + hitlState: { + pendingQuestion: string | null; // active AskUserQuestion prompt + lastResponse: string | null; // last user response + }; + researcherCount: number; // number of research subagents spawned this session + waveSchedule: WaveRecord[]; + stallCounters: Record; + artifactsProduced: string[]; // relative file paths written this session + tasks: Record; +} +``` + +### 18.2 TaskState + +```typescript +interface TaskState { + status: 'pending' | 'running' | 'done' | 'failed' | 'skipped'; + startedAt: string | null; // ISO-8601 + completedAt: string | null; // ISO-8601 + agent: string; +} +``` + +### 18.3 WaveRecord + +```typescript +interface WaveRecord { + waveId: string; + phase: 'research' | 'implement'; + taskIds: string[]; + startedAt: string | null; // ISO-8601 + completedAt: string | null; // ISO-8601 +} +``` + +### 18.4 StallCounter + +```typescript +interface StallCounter { + stallCount: number; + lastStallAt: string | null; // ISO-8601 +} +``` + +### 18.5 ScopeDoc + +```typescript +interface ScopeDoc { + id: string; // SCOPE--NNN + feature: string; + goal: string; + created: string; // ISO-8601 date + updated: string; // ISO-8601 date + gate1Approved: boolean; + gate2Approved: boolean; + acceptanceCriteria: AcceptanceCriterion[]; + outOfScope: string[]; + researchSummary?: ResearchSummary; + designDecisions?: DesignDecision[]; + plan?: TaskPlan[]; + reviewSummary?: ReviewSummary; +} +``` + +### 18.6 AcceptanceCriterion + +```typescript +interface AcceptanceCriterion { + id: string; // AC-NNN + ears: string; // Full EARS-notation sentence + testRef?: string; // TEST--NNN if mapped +} +``` + +### 18.7 ResearchFinding + +```typescript +interface ResearchFinding { + topic: string; + source: string; // URL or file path + relevance: string; // 1-sentence + summary: string; // 2–5 sentences +} +``` + +### 18.8 ResearchSummary + +```typescript +interface ResearchSummary { + findings: ResearchFinding[]; + gaps: string[]; +} +``` + +### 18.9 DesignDecision + +```typescript +interface DesignDecision { + id: string; // DD-NNN + decision: string; + rationale: string; + adrRef?: string; // ADR-NNNN if raised +} +``` + +### 18.10 TaskPlan + +```typescript +interface TaskPlan { + id: string; // T--NNN + description: string; + dependsOn: string[]; + agent: string; + estimatedComplexity: 'low' | 'medium' | 'high'; +} +``` + +### 18.11 ReviewFinding + +```typescript +interface ReviewFinding { + id: string; // R-ORCH-NNN + severity: 'critical' | 'major' | 'minor'; + description: string; + taskRef: string; // T-ORCH-NNN +} +``` + +### 18.12 ReviewSummary + +```typescript +interface ReviewSummary { + passed: boolean; + findings: ReviewFinding[]; +} +``` + +--- + +## 19 Non-functional requirements (normative) + +| ID | Requirement | Source | +|---|---|---| +| NFR-ORCH-001 | Goal-loop session initialisation (scope.md creation) MUST complete ≤30 seconds from initial problem submission to first AskUserQuestion. | REQ-ORCH-007 | +| NFR-ORCH-002 | Max 5 parallel Agent calls during research wave; max 3 during implement wave. | REQ-ORCH-009, REQ-ORCH-013 | +| NFR-ORCH-003 | Stall detection MUST trigger within 30 seconds of threshold breach. | REQ-ORCH-014 | +| NFR-ORCH-004 | Slash-command passthrough MUST add < 200ms latency vs direct subagent invocation. | REQ-ORCH-005 | +| NFR-ORCH-005 | `--check` MUST pass (exit 0) before any write to `dist/claude-plugin`. | REQ-ORCH-019 | +| NFR-ORCH-006 | `check-agents.ts` rules R-ORCH-TOOLS and R-ORCH-FRONTMATTER MUST both run in < 2 seconds total on repos with ≤ 200 agent files. | REQ-ORCH-020 | +| NFR-ORCH-007 | WHEN `SPECORATOR_HEAVY_MODEL` is set and non-empty, the orchestrator MUST apply that model identifier when dispatching architect, dev, and reviewer subagents; WHEN absent or empty, the session default model is used. | REQ-ORCH-004 | + +--- + +## 20 Error catalogue + +| Code | Trigger | Message | Recovery | +|---|---|---|---| +| EC-ORCH-001 | Gate 1 `abort` | `GOAL_LOOP_ABORTED: User aborted at Gate 1 (scope review).` | Write aborted state; terminate. | +| EC-ORCH-002 | Gate 2 `abort` | `GOAL_LOOP_ABORTED: User aborted at Gate 2 (design review).` | Write aborted state; terminate. | +| EC-ORCH-003 | Gate 3 `abort` | `GOAL_LOOP_ABORTED: User aborted at Gate 3 (review).` | Write aborted state; terminate. | +| EC-ORCH-004 | Task max retries exceeded | `TASK_FAILED: T- exceeded retry limit (2).` | Present skip/abort option to user. | +| EC-ORCH-005 | Stall max retries exceeded | `TASK_STALLED: T- stall retry limit reached.` | Present skip/abort option. | +| EC-ORCH-006 | `workflow-state.md` write failure | `STATE_WRITE_FAILED: Could not update workflow-state.md.` | Retry once; abort on second failure. | +| EC-ORCH-007 | `scope.md` missing at Gate 1 | `SCOPE_MISSING: scope.md not found before Gate 1.` | Re-run scope phase. | +| EC-ORCH-008 | DAG cycle detected in plan | `PLAN_CYCLE: Dependency cycle detected involving T-.` | Present plan to user for manual resolution. | +| EC-ORCH-009 | Concurrency limit exceeded | `CONCURRENCY_LIMIT: Cannot launch T-; limit reached.` | Queue task; retry when slot opens. | +| EC-ORCH-010 | Subagent returns no output | `AGENT_NO_OUTPUT: T- subagent returned empty result.` | Treat as stall; apply §8.2. | +| EC-ORCH-011 | Invalid task ID format | `INVALID_TASK_ID: "" does not match T--NNN pattern.` | Fail plan phase; surface to user. | +| EC-ORCH-012 | `session-summary.md` write failure | `SUMMARY_WRITE_FAILED: Could not write session-summary.md.` | Retry once; log failure in workflow-state.md. | +| EC-ORCH-013 | Plugin manifest check failed | `CHECK_FAILED_PLUGIN_JSON: .claude-plugin/plugin.json diff detected.` | Print diff; exit 1. | +| EC-ORCH-014 | Orchestrator context window approaches limit (>80%) | `CONTEXT_PRESSURE: Summarising and continuing.` | Write a mid-session checkpoint to scope.md; continue. | +| EC-ORCH-015 | `check-agents.ts` R-ORCH-TOOLS violation | `TOOLS_MISMATCH: orchestrator.md tools list does not match SPEC-ORCH-001.` | Fix orchestrator.md and re-run verify. | +| EC-ORCH-016 | Invalid `plugin.json` on disk (schema error) | `CHECK_FAILED_PLUGIN_JSON: .claude-plugin/plugin.json is missing or could not be parsed`; no writes to `claude-plugin/specorator` | + +--- + +## 21 Test catalogue + +### 21.1 Unit tests + +| Test ID | Description | Coverage | +|---|---|---| +| TEST-ORCH-001 | GoalLoopState initialisation — status=active, currentPhase=scope, all gates=pending | SPEC-ORCH-011, REQ-ORCH-002 | +| TEST-ORCH-002 | GoalLoopState task status transitions: pending→running→done | SPEC-ORCH-011, REQ-ORCH-002 | +| TEST-ORCH-003 | GoalLoopState task status transitions: pending→running→failed | SPEC-ORCH-011, REQ-ORCH-002 | +| TEST-ORCH-004 | GoalLoopState task status transitions: failed→skipped (on user skip) | SPEC-ORCH-011, REQ-ORCH-022 | +| TEST-ORCH-005 | Gate status transitions: gate_1 pending→approved, pending→rejected | SPEC-ORCH-011, REQ-ORCH-022 | +| TEST-ORCH-006 | TaskPlan DAG — no cycle: valid DAG accepted | SPEC-ORCH-006, REQ-ORCH-012 | +| TEST-ORCH-007 | TaskPlan DAG — cycle: EC-ORCH-008 raised | SPEC-ORCH-006, REQ-ORCH-012 | +| TEST-ORCH-008 | Topological sort: tasks with no deps execute before dependent tasks | SPEC-ORCH-007, REQ-ORCH-013 | +| TEST-ORCH-009 | Concurrency cap: max 3 running tasks during implement wave | SPEC-ORCH-007 §7.1, REQ-ORCH-013 | +| TEST-ORCH-010 | Concurrency cap: max 5 running tasks during research wave | SPEC-ORCH-004 §4.1, REQ-ORCH-009 | +| TEST-ORCH-011 | Stall detection — timeout: task stalled after 5 min with no output | SPEC-ORCH-008 §8.1, REQ-ORCH-014 | +| TEST-ORCH-012 | Stall detection — identical output: > 3 consecutive identical outputs triggers stall gate | SPEC-ORCH-008 §8.1, REQ-ORCH-014 | +| TEST-ORCH-013 | Stall detection — tool repeat: 20 identical tool calls | SPEC-ORCH-008 §8.1, REQ-ORCH-014 | +| TEST-ORCH-014 | Error code EC-ORCH-008 emitted on DAG cycle | SPEC-ORCH-006, REQ-ORCH-012 | +| TEST-ORCH-015 | Error code EC-ORCH-009 emitted when concurrency limit hit | SPEC-ORCH-007, REQ-ORCH-013 | +| TEST-ORCH-016 | Error code EC-ORCH-011 emitted for invalid task ID | SPEC-ORCH-006, REQ-ORCH-012 | +| TEST-ORCH-017 | scope.md schema — all required frontmatter fields present | SPEC-ORCH-012 §12.1, REQ-ORCH-008 | +| TEST-ORCH-018 | scope.md schema — acceptance criteria EARS format (AC-NNN prefix, WHEN/SHALL) | SPEC-ORCH-012 §12.3, REQ-ORCH-008 | +| TEST-ORCH-019 | session-summary.md schema — frontmatter fields and required sections | SPEC-ORCH-013, REQ-ORCH-016 | +| TEST-ORCH-020 | session-summary.md — multiple sessions append with separator | SPEC-ORCH-013, REQ-ORCH-016 | +| TEST-ORCH-021 | plugin.json commands entry — name regex passes for valid name | SPEC-ORCH-014 §14.2, REQ-ORCH-017 | +| TEST-ORCH-022 | plugin.json commands entry — name regex fails for invalid name (uppercase, space) | SPEC-ORCH-014 §14.2, REQ-ORCH-017 | +| TEST-ORCH-023 | plugin.json agents entry — description truncated at 200 chars | SPEC-ORCH-014 §14.3, REQ-ORCH-017 | + +### 21.2 Integration tests + +| Test ID | Description | Type | Coverage | +|---|---|---|---| +| TEST-ORCH-024 | Goal-loop happy path: scope → Gate 1 approved → research → design → Gate 2 approved → plan → implement → review → Gate 3 approved → summary | integration | SPEC-ORCH-002–010, REQ-ORCH-006 | +| TEST-ORCH-025 | Gate 1 reject-and-edit cycle: scope edited twice before approval | integration | SPEC-ORCH-003 §3.3, REQ-ORCH-008 | +| TEST-ORCH-026 | Gate 2 abort: session terminates; workflow-state.md status=aborted | integration | SPEC-ORCH-005 §5.3, REQ-ORCH-022 | +| TEST-ORCH-027 | Plugin packaging — `--check` mode passes: both files present and valid; exit code 0; no writes to `claude-plugin/specorator` | integration | NFR-ORCH-005 | +| TEST-ORCH-028 | Plugin packaging — `--check` mode fails: diff detected; exit code 1; unified diff on stdout | integration | SPEC-ORCH-016 §16.2, REQ-ORCH-019 | +| TEST-ORCH-029 | Plugin packaging — missing source file: exit code 2 | integration | SPEC-ORCH-016 §16.3, REQ-ORCH-019 | +| TEST-ORCH-030 | Plugin packaging — schema validation error: exit code 3 | integration | SPEC-ORCH-016 §16.3, REQ-ORCH-019 | +| TEST-ORCH-031 | check-agents.ts R-ORCH-TOOLS pass: correct tools list | integration | SPEC-ORCH-017, REQ-ORCH-020 | +| TEST-ORCH-032 | check-agents.ts R-ORCH-TOOLS fail: extra tool added; EC-ORCH-015 emitted with correct message | integration | SPEC-ORCH-017 §17.2, REQ-ORCH-020 | + +### 21.3 End-to-end tests + +| Test ID | Description | Type | Coverage | +|---|---|---|---| +| TEST-ORCH-033 | Full session: user goal → completed session-summary.md committed to branch | e2e | SPEC-ORCH-002–013, REQ-ORCH-016 | +| TEST-ORCH-034 | Backward compatibility: invoke all 85 slash commands in sequence; each produces its expected artifact with no orchestrator interference | e2e | REQ-ORCH-021, NFR-ORCH-004 | +| TEST-ORCH-035 | Stall recovery: task stalls at implement wave; user retries; task completes | e2e | SPEC-ORCH-008, REQ-ORCH-014 | +| TEST-ORCH-036 | Task failure max retries: task fails 3 times; user skips; session completes with partial results | e2e | SPEC-ORCH-007 §7.3, REQ-ORCH-013 | + +--- + +## 22 Acceptance criteria (normative) + +These acceptance criteria gate the `/spec:review` stage. Each maps to one or more tests above. + +| AC-ID | EARS criterion | Test(s) | +|---|---|---| +| AC-ORCH-001 | WHEN the user invokes the goal-loop trigger, the system SHALL initialise GoalLoopState with status=active within 5 seconds. | TEST-ORCH-001 | +| AC-ORCH-002 | WHEN Gate 1 is presented, the system SHALL block further progress until the user responds yes, edit, or abort. | TEST-ORCH-025 | +| AC-ORCH-003 | WHEN a task is stalled, the system SHALL detect it within 30 seconds of the threshold breach. | TEST-ORCH-011, TEST-ORCH-012, TEST-ORCH-013 | +| AC-ORCH-004 | WHEN `build-claude-plugin.ts --check` is run against a valid manifest, the system SHALL exit 0 with no writes. | TEST-ORCH-027 | +| AC-ORCH-005 | WHEN `build-claude-plugin.ts --check` detects a diff, the system SHALL exit 1 and print a unified diff. | TEST-ORCH-028 | +| AC-ORCH-006 | WHEN a slash command is received, the system SHALL route it to the specialist subagent without orchestration scaffolding. | TEST-ORCH-034 | +| AC-ORCH-007 | WHEN the implement wave runs, the system SHALL not exceed 3 concurrent Agent calls. | TEST-ORCH-009 | +| AC-ORCH-008 | WHEN the research wave runs, the system SHALL not exceed 5 concurrent Agent calls. | TEST-ORCH-010 | +| AC-ORCH-009 | WHEN check-agents.ts runs, the system SHALL flag any orchestrator.md tools deviation as a CI failure. | TEST-ORCH-032 | +| AC-ORCH-010 | WHEN a full session completes, the system SHALL write session-summary.md before terminating. | TEST-ORCH-033 | + +--- + +## 23 Traceability summary + +### 23.1 Requirements to specs + +| REQ-ID | Spec section(s) | +|---|---| +| REQ-ORCH-001 | SPEC-ORCH-001 §1.1 | +| REQ-ORCH-002 | SPEC-ORCH-001 §1.1, SPEC-ORCH-011 §11.1 | +| REQ-ORCH-003 | SPEC-ORCH-001 §1.1 | +| REQ-ORCH-004 | SPEC-ORCH-001 §1.1, SPEC-ORCH-007 §7.4 | +| REQ-ORCH-005 | SPEC-ORCH-002 (command-passthrough route); TEST-ORCH-033, TEST-ORCH-034 | +| REQ-ORCH-006 | SPEC-ORCH-002 §2.1 | +| REQ-ORCH-007 | SPEC-ORCH-002 §2.4, NFR-ORCH-001 | +| REQ-ORCH-008 | SPEC-ORCH-003 §3.2, SPEC-ORCH-012 | +| REQ-ORCH-009 | SPEC-ORCH-004 §4.1, NFR-ORCH-002 | +| REQ-ORCH-010 | SPEC-ORCH-004 §4.2 | +| REQ-ORCH-011 | SPEC-ORCH-005 §5.2 | +| REQ-ORCH-012 | SPEC-ORCH-006 §6.1 | +| REQ-ORCH-013 | SPEC-ORCH-007 §7.1, NFR-ORCH-002 | +| REQ-ORCH-014 | SPEC-ORCH-008 §8.1, NFR-ORCH-003 | +| REQ-ORCH-015 | SPEC-ORCH-009 §9.2 | +| REQ-ORCH-016 | SPEC-ORCH-010 §10.1, SPEC-ORCH-013 | +| REQ-ORCH-017 | SPEC-ORCH-014 §14.1 | +| REQ-ORCH-018 | SPEC-ORCH-015 §15.1 | +| REQ-ORCH-019 | SPEC-ORCH-016 §16.1, NFR-ORCH-005 | +| REQ-ORCH-020 | SPEC-ORCH-017, NFR-ORCH-006 | +| REQ-ORCH-021 | SPEC-ORCH-002 §2.2, TEST-ORCH-034 | +| REQ-ORCH-022 | SPEC-ORCH-003 §3.3, SPEC-ORCH-005 §5.3, SPEC-ORCH-009 §9.3, SPEC-ORCH-011 §11.2 | +| REQ-ORCH-023 | SPEC-ORCH-002 §2.4 | + +### 23.2 Specs to tests + +| SPEC-ID | Test IDs | +|---|---| +| SPEC-ORCH-001 | TEST-ORCH-031, TEST-ORCH-032 | +| SPEC-ORCH-002 | TEST-ORCH-024, TEST-ORCH-034 | +| SPEC-ORCH-003 | TEST-ORCH-024, TEST-ORCH-025, TEST-ORCH-026 | +| SPEC-ORCH-004 | TEST-ORCH-010, TEST-ORCH-024 | +| SPEC-ORCH-005 | TEST-ORCH-024, TEST-ORCH-026 | +| SPEC-ORCH-006 | TEST-ORCH-006, TEST-ORCH-007, TEST-ORCH-008, TEST-ORCH-014, TEST-ORCH-016 | +| SPEC-ORCH-007 | TEST-ORCH-008, TEST-ORCH-009, TEST-ORCH-015, TEST-ORCH-024, TEST-ORCH-036 | +| SPEC-ORCH-008 | TEST-ORCH-011, TEST-ORCH-012, TEST-ORCH-013, TEST-ORCH-035 | +| SPEC-ORCH-009 | TEST-ORCH-024 | +| SPEC-ORCH-010 | TEST-ORCH-019, TEST-ORCH-020, TEST-ORCH-033 | +| SPEC-ORCH-011 | TEST-ORCH-001, TEST-ORCH-002, TEST-ORCH-003, TEST-ORCH-004, TEST-ORCH-005 | +| SPEC-ORCH-012 | TEST-ORCH-017, TEST-ORCH-018 | +| SPEC-ORCH-013 | TEST-ORCH-019, TEST-ORCH-020 | +| SPEC-ORCH-014 | TEST-ORCH-021, TEST-ORCH-022, TEST-ORCH-023 | +| SPEC-ORCH-015 | (covered by integration: settings.json structure check) | +| SPEC-ORCH-016 | TEST-ORCH-027, TEST-ORCH-028, TEST-ORCH-029, TEST-ORCH-030 | +| SPEC-ORCH-017 | TEST-ORCH-031, TEST-ORCH-032 | + +--- + +## 24 Quality gate checklist + +- [x] All REQ-IDs from `requirements.md` are covered by at least one SPEC-ID in §23.1. +- [x] All SPEC-IDs are covered by at least one test in §23.2. +- [x] All acceptance criteria (§22) are mapped to at least one test. +- [x] No acceptance criterion is untestable (no "the system should feel responsive" style criteria). +- [x] All error codes (§20) reference a triggering condition and a recovery action. +- [x] NFRs are quantified (time bounds, concurrency limits, size limits). +- [x] Data structures specified — 12 TypeScript-style type definitions with validation rules. +- [x] ADR references included — ADR-0046, ADR-0047, ADR-0048 referenced at relevant interfaces. diff --git a/specs/goal-oriented-orchestrator-plugin/workflow-state.md b/specs/goal-oriented-orchestrator-plugin/workflow-state.md new file mode 100644 index 000000000..2d15b37c5 --- /dev/null +++ b/specs/goal-oriented-orchestrator-plugin/workflow-state.md @@ -0,0 +1,79 @@ +--- +feature: goal-oriented-orchestrator-plugin +area: ORCH +current_stage: tasks +status: active +last_updated: 2026-05-14 +last_agent: architect +artifacts: + idea.md: complete + research.md: complete + requirements.md: complete + design.md: complete + spec.md: complete + tasks.md: pending + implementation-log.md: pending + test-plan.md: pending + test-report.md: pending + review.md: pending + traceability.md: pending + release-notes.md: pending + retrospective.md: pending +--- + +# Workflow state — goal-oriented-orchestrator-plugin + +Tracks issue #501: **Goal-oriented orchestrator plugin — Research → Design → Plan → Implement → Review loop** and the associated **orchestrator-first architecture refactor** that makes this the core of the Specorator Claude plugin. + +## Stage progress + +| Stage | Artifact | Status | +|---|---|---| +| 1. Idea | `idea.md` | complete | +| 2. Research | `research.md` | complete | +| 3. Requirements | `requirements.md` | complete | +| 4. Design | `design.md` | complete | +| 5. Specification | `spec.md` | complete | +| 6. Tasks | `tasks.md` | pending | +| 7. Implementation | `implementation-log.md` + code | pending | +| 8. Testing | `test-plan.md`, `test-report.md` | pending | +| 9. Review | `review.md`, `traceability.md` | pending | +| 10. Release | `release-notes.md` | pending | +| 11. Learning | `retrospective.md` | pending | + +## Active decisions + +| ID | Decision | Resolution | Source | +|---|---|---|---| +| D1 | Scope intake format | EARS clauses via `grill` skill | idea.md | +| D2 | Researcher subagent count | Dynamic, 1–5 based on scope complexity | idea.md | +| D3 | Design presentation | Generated `design.md` artifact + inline summary | idea.md | +| D4 | Plan format | Existing `tasks.md` format with explicit DAG edges | idea.md | +| D5 | Parallel execution model | Isolated worktrees via `isolation: worktree` | idea.md | +| D6 | Review criteria source | Acceptance criteria from intake + auto-derived from EARS | idea.md | +| D7 | Plugin packaging | Proper `.claude-plugin/plugin.json` with `settings.json agent: orchestrator` | idea.md | +| D8 | Orchestrator tool list | Agent, Read, Write, Edit, AskUserQuestion | ADR-0046 | +| D9 | goal-loop state in workflow-state.md | Extended schema with optional goal_loop block | ADR-0047 | +| D10 | New artifact types | scope.md and session-summary.md introduced | ADR-0048 | + +## Next step + +Run `/spec:tasks` to produce `tasks.md` — TDD-ordered task list with T-ORCH-NNN IDs, dependencies, owners, and definitions of done. + +Optional first: run `/spec:analyze` to cross-check spec ↔ requirements ↔ design consistency. + +## Skips + +_None._ + +## Blocks + +_None._ + +## Hand-off notes + +design.md (Part C) is complete. Three ADRs were filed: ADR-0046 (orchestrator tool list expansion), ADR-0047 (workflow-state.md schema extension), ADR-0048 (scope.md and session-summary.md as new artifact types). The Zod schema extension (ADR-0047) is a blocking prerequisite for implementation of REQ-ORCH-002 and REQ-ORCH-022. The spec.md author must specify: (1) the exact Zod schema fields for the goal_loop block, (2) the full state machine for workflow-state.md transitions, (3) the check-agents.ts validation rules, and (4) the build-claude-plugin.ts settings.json generation mechanism. + +## Open clarifications + +_None._