diff --git a/README.md b/README.md index 626e3497e..598200502 100644 --- a/README.md +++ b/README.md @@ -1,153 +1,63 @@ # Spacedock -Spacedock runs agent work through defined stages, so you can delegate in batches and make only the calls that need your judgment. +Spacedock runs agent work through defined stages so you can delegate in batches and only weigh in on the calls that need your judgment. -The first officer coordinates the flow: it dispatches workers to advance each work item and surfaces approval-worthy decisions to you, the captain, so batches move forward without pulling you into every session. +You queue up the work, the agents move each item through its stages, and you get pulled in at approval gates with a stage report (findings, verdicts, artifacts, anomalies) ready for a yes or a no. No raw output dumps, no babysitting one chat at a time. -**You want Spacedock if:** +## Is this for me? -- **You're a human tired of context-switching** between agent sessions to make approval decisions. Spacedock batches the decisions an agent wants to hand back to you and presents each with evidence, so you approve or redirect without re-loading context. -- **You're an agent delegating repeatable work** and want a structured place to queue up approval-worthy decisions for your human without interrupting them for every tiny step. +**You triage Gmail every morning.** Spacedock can fetch your inbox, sort receipts toward a tax folder, archive newsletters, and surface anything that smells like customer support back to you with a proposed reply. You approve the batch; it executes. -## What's Different +**You are planning two weeks in Japan.** Spacedock can research neighborhoods, draft an itinerary, and stop at the decisions that need you (which hotel in Kyoto, which day trip out of Tokyo). Once bookings are locked, the next stage produces a packing list and a daily run sheet. -- **Approval gates with structured evidence.** Every gate comes with a stage report: findings, verdicts, artifacts, anomalies. You approve, redirect, or bounce back faster than sifting through raw output or a sprawling log. -- **Adversarial review gates.** Review stages can be configured to push back rather than rubber-stamp. They target sycophancy, thin evidence, and work that looks busy without proving its claim. Work clears the gate when it survives the challenge. -- **Plan in batches, decide as work flows back.** Queue multiple work items at once; agents advance each through its stages independently while you handle approvals as they surface. -- **The workflow learns with you.** The first officer helps you adjust it when patterns emerge: a stage that never fires, a gate that keeps bouncing the same issue back, a schema field that always ends up empty. -- **Isolation when needed.** Stages that touch shared state run in their own git worktree; lightweight stages run inline. You declare which is which, and the first officer enforces it. -- **Work doesn't die at the context limit.** When an agent runs out of context, Spacedock swaps in a successor that carries forward what's in flight. Nothing gets lost in the handoff. +**Your inbound code-review queue keeps piling up.** Spacedock can pull each open PR, run an adversarial review, queue the verdict for a thumbs up or down, and post the approved review to GitHub. -## Quick Start +## What is Spacedock? -**Prerequisites:** Claude Code or Codex CLI. +A workflow is a directory of markdown work item files plus a README that defines the stages, the schema, and the gates. There are three roles: Captain (you), First Officer (orchestrator), Ensign (worker). The First Officer reads the workflow README, dispatches Ensigns for items ready to advance, and pauses at gates to ask the Captain to approve or reject. (Rejection can bounce the item back to an earlier stage for revision; details in `docs/USAGE.md`.) -### Claude Code +Spacedock is not a chat agent and not a single-skill loop. Gates present structured evidence so the Captain decides on findings, not transcript. Review gates can be adversarial: they push back instead of rubber-stamping. The Captain queues many work items and decides as each surfaces, instead of running one session at a time. When a pattern emerges (a stage that never fires, a gate that keeps bouncing), `/spacedock:refit` adjusts the workflow without losing local mods. Stages that touch shared state run in their own git worktree; lighter stages run inline. When an Ensign hits the context limit, a successor picks up the in-flight state from the markdown files and carries on. -1. Install the plugin: - - ```bash - claude plugin marketplace add clkao/spacedock && claude plugin install spacedock - ``` - -2. Commission a workflow with your own mission prompt: - - ```bash - claude --agent spacedock:first-officer "/commission " - ``` - -3. Or start from one of these example workflows — copy and run: - - **Email triage:** - ```bash - claude --agent spacedock:first-officer "/commission Email triage: fetch, categorize, and act on Gmail inbox. Entity: a batch of up to 50 emails. Stages: intake (use gws-cli, triage in:inbox and read email body if necessary, categorize, propose action per email, output as table) → approval (Captain reviews proposal) -> execute (carry out approved actions, do not mark as read). Use gws-cli (https://github.com/googleworkspace/cli/tree/main/skills/gws-gmail), GOOGLE_WORKSPACE_CLI_CONFIG_DIR=~/.config/gws/ for different accounts. Walk me through gws-cli setup if not already done." - ``` - - **[Superpowers](https://github.com/obra/superpowers)-style dev task workflow:** - ```bash - claude --agent spacedock:first-officer "/commission Dev task workflow: superpowers-style design → plan → implement → review with ## Design and ## Implementation Plan inlined in the entity body (no separate spec/plan files), implement on isolated worktrees with strict TDD, design and review gated for approval." - ``` - -### Codex CLI - -1. Clone Spacedock and start Codex from the repo root: - - ```bash - git clone https://github.com/clkao/spacedock.git /path/to/spacedock - cd /path/to/spacedock - codex --enable multi_agent - ``` +## Quick start -2. Restart Codex if it was already open, then open `/plugins` and install **Spacedock** from the repo-local marketplace entry. +Prerequisites: [Claude Code](https://docs.claude.com/en/docs/claude-code) installed (Anthropic's CLI; runs on macOS, Linux, and Windows via WSL). For the email example: a working `gws-cli` for Gmail (setup notes: https://github.com/googleworkspace/cli/tree/main/skills/gws-gmail; this includes a one-time Google OAuth flow). - The authoritative Codex plugin manifest is `.codex-plugin/plugin.json`, and the authoritative local catalog is `.agents/plugins/marketplace.json`. That catalog points to `./plugins/spacedock`, which is a checked-in symlink to the repository root so Codex loads the real plugin package directly. +Once those are in place, the steps below take about five minutes. With a clean machine and no tools installed, plan on twenty minutes total. Nothing runs against your inbox until the First Officer pauses at a gate and you approve. -3. Prompt Codex to use the first-officer skill and commission your workflow: +1. Install the plugin: ```bash - Use the spacedock:first-officer skill to run /commission in this directory. + claude plugin marketplace add clkao/spacedock && claude plugin install spacedock ``` - Legacy compatibility: older Codex setups can still expose `~/.agents/skills/spacedock` directly: +2. Commission an email triage workflow: ```bash - mkdir -p ~/.agents/skills - ln -s /path/to/spacedock/skills ~/.agents/skills/spacedock + claude --agent spacedock:first-officer "/spacedock:commission Email triage: fetch, categorize, and act on Gmail inbox. Entity: an email batch (up to 50 messages). Stages: intake (use gws-cli, triage in:inbox and read email body if necessary, categorize, propose action per email, output as table) then approval (Captain reviews proposal) then execute (carry out approved actions, do not mark as read). Use gws-cli (https://github.com/googleworkspace/cli/tree/main/skills/gws-gmail), GOOGLE_WORKSPACE_CLI_CONFIG_DIR=~/.config/gws/ for different accounts. Walk me through gws-cli setup if not already done." ``` - The `.claude-plugin/plugin.json` and `.claude-plugin/marketplace.json` files remain synchronized legacy mirrors of the Codex-first metadata for migration compatibility. - -> Codex multi-agent is experimental. The Claude Code path is the primary supported surface. +The First Officer commissions the workflow, dispatches an Ensign to gather your inbox, then pauses with a categorized proposal and waits for your approval before touching anything. -## What a Work Item Looks Like +### If you are a developer -```yaml ---- -id: 054 -title: Session debrief command -status: done ---- +Same install line. Commission with this mission instead: -Problem statement, design notes, acceptance criteria, and stage reports -all live in the body of this file as the work moves through stages. +```bash +claude --agent spacedock:first-officer "/spacedock:commission Dev task workflow: superpowers-style design then plan then implement then review with ## Design and ## Implementation Plan inlined in the entity body (no separate spec/plan files), implement on isolated worktrees with strict TDD, design and review gated for approval." ``` -See [a completed example](https://github.com/clkao/spacedock/blob/main/docs/plans/_archive/session-debrief.md) from Spacedock's own workflow. - -## Concepts - -| Concept | What it is | -|---------|------------| -| **Mission** | The purpose of the workflow: what it processes and what it delivers. | -| **Work item** | A single markdown file describing one thing being worked on: an email batch, a dev task, a draft. | -| **Workflow** | A directory of work items plus the README that defines stages, schema, and gates. | -| **Stage** | A named step a work item passes through (e.g. design, implement, review). | -| **Gate** | A pause point at a stage boundary where the captain approves, redirects, or bounces the work back. | - -*"I am the master of my fate, I am the captain of my soul."* -- William Ernest Henley, *Invictus* - -| Role | Who | -|---------|------------| -| **Captain** | You. You define the mission and make the calls at approval gates. | -| **First Officer** | The orchestrator agent that manages the workflow and reports to you at gates. | -| **Ensign** | The worker agent that moves a single item forward through one stage. | - -## How It Works - -The first officer reads the workflow README, checks work item statuses, and dispatches ensigns for items ready to advance. Stages that need isolation (typically implementation work with commits) run inside their own git worktree; lightweight stages (design, review, triage) run inline. At approval gates the first officer pauses and presents the ensign's stage report for your review: approve, redo with feedback, or reject. Rejected work automatically bounces back for revision in a fresh round of the earlier stage, with a hard cap so you never get stuck in an infinite loop. When you end a session, `/spacedock:debrief` captures what happened (commits, task state changes, decisions, open issues) into a record the next session picks up automatically (see [an example debrief](https://github.com/clkao/spacedock/blob/main/docs/plans/_debriefs/2026-04-09-01.md) from a real session). - -## What Gets Generated - -When you run `/spacedock:commission`, the following files are added to your workflow directory: - -- **`{dir}/README.md`**: workflow schema, stage definitions, and work item template -- **`{dir}/*.md`**: seed work item files -- **`{dir}/_mods/`**: local modifications carried across refits - -**Shipped by the Spacedock plugin:** - -- **`spacedock:first-officer`**: the orchestrator agent that reads workflow state and dispatches ensigns -- **`spacedock:ensign`**: the worker agent dispatched to do stage work -- **`skills/commission/bin/status`**: read and advance workflow state without switching to a separate tracking tool - -The generated workflow README is the single source of truth. The first officer reads it to know what stages exist, what quality criteria to enforce, and when to pause for your review. - -Workflows can extend their own behavior via markdown mod files (`_mods/*.md`) that declare hook handlers for lifecycle events like startup, idle, or merge. For example, the [`pr-merge` mod](docs/plans/_mods/pr-merge.md) opens a pull request automatically when a completed worktree branch is ready to land. - -When a new Spacedock release is available, use `/spacedock:refit` to upgrade your workflow scaffolding while keeping local modifications. +The First Officer commissions a generic dev workflow with four stages, opens a worktree for implementation, and pauses at design and review for your call. For deeper dev shapes (PR review queue, Linear ticket ship, cross-repo upgrade), see [`docs/EXAMPLES.md`](docs/EXAMPLES.md). -## Tips +## Codex CLI -- **Run Spacedock inside a sandbox.** Recommended: [agent-safehouse](https://github.com/eugene1g/agent-safehouse) (macOS), [packnplay](https://github.com/obra/packnplay), a devcontainer, or a VM. -- **Talk directly to an ensign.** Claude Code supports agent team chat: while a dispatched ensign is running, you can `Shift+Up` / `Shift+Down` to switch panes and give the ensign feedback directly instead of routing everything through the first officer. +The Codex CLI path is supported but experimental. See the Codex section of [`docs/USAGE.md`](docs/USAGE.md) for the setup steps and the plugin manifest layout. -## Use Cases +## Where to go next -- **Email triage**: classify and route incoming messages with AI agents, escalate to a human at review gates -- **Dev task workflow**: [superpowers](https://github.com/obra/superpowers)-style design -> plan -> implement -> review with approval gates -- **Content publishing**: manage drafts through editing, review, and publication stages -- **Research workflows**: process papers or data through analysis, synthesis, and validation -- **Dogfooding Spacedock's own development.** Spacedock is self-hosted. Its own development runs on a plain text workflow at [`docs/plans/`](docs/plans/). Run `skills/commission/bin/status --workflow-dir docs/plans` to see the current state. +- [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md): a guided first run, end to end. +- [`docs/USAGE.md`](docs/USAGE.md): the mental model, the YAML schema, and stage flags. +- [`docs/EXAMPLES.md`](docs/EXAMPLES.md): eight worked examples across household, knowledge work, and development. +- [`docs/PROMPTS.md`](docs/PROMPTS.md): the fill-in-the-blank Initiating Prompt template and persona variants. ## License diff --git a/docs/EXAMPLES.md b/docs/EXAMPLES.md new file mode 100644 index 000000000..e08df70ac --- /dev/null +++ b/docs/EXAMPLES.md @@ -0,0 +1,274 @@ +# Examples + +This cookbook has eight worked examples. Each one names the audience, the recurring pain it removes, the mission string to paste, the stages, the gates, and what success looks like after two weeks of use. + +Pick the closest example, adapt the mission text to your situation, and paste it into the First Officer. + +## 1. Email triage + +**Who this is for**: anyone with a Gmail inbox that needs daily attention. + +**Recurring pain it removes**: opening Gmail every morning to triage by hand, missing important messages, and replying to the same kinds of emails over and over. + +### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Email triage: fetch, categorize, and act on Gmail inbox. Entity: an email batch (up to 50 messages). Stages: intake (use gws-cli, triage in:inbox and read email body if necessary, categorize, propose action per email, output as table) then approval (Captain reviews proposal) then execute (carry out approved actions, do not mark as read). Use gws-cli (https://github.com/googleworkspace/cli/tree/main/skills/gws-gmail), GOOGLE_WORKSPACE_CLI_CONFIG_DIR=~/.config/gws/ for different accounts. Walk me through gws-cli setup if not already done." +``` + +### Stages + +| Stage | What the Ensign does | What the gate decides | +| --- | --- | --- | +| `intake` | Pulls `in:inbox`, reads bodies where the subject is ambiguous, categorizes, and writes a proposed action per email into a table. | None. Stage exit is automatic. | +| `approval` | Renders the proposal table. | `gate: true`, `feedback-to: intake`. Captain approves; to send rows back, edit the work-item file and reject. | +| `execute` | Carries out the approved actions (file, archive, draft reply). Does not mark as read. | Terminal. | + +### What success looks like + +Morning triage drops to one approval pass per batch. Receipts get routed to tax folders without manual sorting, newsletters get archived, and only the messages that need a real response remain in the inbox. If you keep correcting categorizations on rejection, the workflow's prompt evolves with your edits and the approval pass shortens. + +## 2. Trip planning + +**Who this is for**: someone planning a multi-week or complex trip. + +**Recurring pain it removes**: research scattered across browser tabs, the itinerary buried in a doc that never gets reviewed, and bookings done in a rush at the last minute. + +### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Trip planning: shape one trip per entity (destination plus dates). Stages: research (collect neighborhoods, sights, transit notes, weather windows into the entity body) then itinerary (draft a day-by-day plan with decision points called out) then decisions (gate: Captain picks lodging, day trips, and dining priorities) then booking (parked: the Captain executes bookings off-platform; mark which bookings to make, do not actually book) then packing (generate a packing list from the locked itinerary). Use the entity body as the working document; do not create side files." +``` + +### Stages + +| Stage | What the Ensign does | Flags and gate | +| --- | --- | --- | +| `research` | Gathers neighborhoods, sights, transit, and weather notes. Writes them into the entity body. | None. | +| `itinerary` | Drafts a day-by-day plan with decision points highlighted. | None. | +| `decisions` | Surfaces the lodging and day-trip choices. | `gate: true`. Captain picks. | +| `booking` | Lists what to book (links, times, confirmation numbers field empty). | `parked: true` (captain-facing marker). Captain books off-platform, pastes confirmations back, then transitions the entity to `packing`. | +| `packing` | Generates a packing list keyed off climate windows and the locked itinerary. | Terminal. | + +### What success looks like + +The itinerary is finalized in two short sessions instead of three weeks of tab-juggling. Decisions are made on evidence (neighborhood notes, transit times) instead of guesswork. The packing list is automatic and aware of weather, dress codes, and travel days. + +## 3. Tax and finance prep + +**Who this is for**: a freelancer or household preparing tax filings or a quarterly finance review. + +**Recurring pain it removes**: receipts and statements scattered across email and folders, categorizing transactions is mind-numbing, and deductions get missed. + +### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Tax and finance prep: one entity per tax year or quarter. Stages: intake (collect documents from a designated folder, list what is present and flag what is missing; stay parked while missing documents trickle in) then categorize (Ensign categorizes line items into expense buckets; Captain corrects edge cases inline) then deductions-review (gate: Captain reviews the proposed deductions list with rationale per item; rejection bounces back to categorize) then summary (produce a clean export bundle for the accountant). Inputs live in ~/Documents/tax/; outputs go into the entity body." +``` + +### Stages + +| Stage | What the Ensign does | Flags and gate | +| --- | --- | --- | +| `intake` | Lists every document found in the year folder, names what is missing (W-2, 1099-NEC, brokerage statements, charitable receipts). | `parked: true` (captain-facing marker). Captain re-runs intake as documents arrive and transitions to `categorize` when the list is complete. | +| `categorize` | Bins line items into expense categories with confidence notes on edge cases. | None. | +| `deductions-review` | Proposes deductions with one-line rationale per item. | `gate: true`, `feedback-to: categorize`. Rejection bounces to `categorize`. | +| `summary` | Builds a clean accountant-ready export (CSV plus a one-pager). | Terminal. | + +### What success looks like + +Filing season collapses from a marathon weekend into three approval passes spread across two weeks. Nothing falls through the cracks because the workflow knows exactly what is missing and parks itself until the document shows up. The accountant gets a tidy bundle instead of a shoebox. + +## 4. Content publishing + +**Who this is for**: anyone who publishes regularly (a newsletter, a blog, an internal update). + +**Recurring pain it removes**: drafts stall mid-edit, fact-checking gets skipped under deadline, and the publishing checklist lives in someone's head. + +### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Content publishing: one entity per piece (essay or newsletter issue). Stages: idea (capture the angle and source notes in the entity body) then draft (Ensign produces a first draft from the notes) then edit (Captain edits in the entity body) then fact-check (gate: Ensign verifies claims and flags anything unsourced; rejection bounces back to edit) then publish (Captain hits publish; Ensign prepares social posts and updates the entity to terminal). Working text lives in the entity body." +``` + +### Stages + +| Stage | What the Ensign does | Flags and gate | +| --- | --- | --- | +| `idea` | Captures the angle, the audience, and source notes. | None. | +| `draft` | Produces a first draft from the idea notes. | None. | +| `edit` | Captain rewrites in the entity body. Ensign is idle. | None. | +| `fact-check` | Verifies claims, flags unsourced statements. | `gate: true`, `feedback-to: edit`. Rejection routes back to `edit`. | +| `publish` | Captain publishes. Ensign drafts social posts. | Terminal. | + +### What success looks like + +The regular cadence sticks because nothing is in the Captain's head. Fact errors are caught before publish instead of after. The backlog of half-finished drafts shrinks because every piece is in a known stage with a known next move. + +## 5. Research synthesis + +**Who this is for**: a researcher or analyst ingesting papers, transcripts, or interview notes. + +**Recurring pain it removes**: source material piles up, synthesis happens once at the end and badly, and cross-references between sources are missed. + +### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Research synthesis: one entity per research thread (a question plus its sources). Stages: intake (list sources, attach abstracts and provenance) then summarize (Ensign produces a summary per source with quoted evidence) then cross-reference (Ensign identifies overlaps, agreements, and contradictions across sources) then synthesis (gate: Captain reviews the synthesis; rejection routes back to cross-reference with one-line feedback) then write-up (Ensign drafts a handoff-ready write-up). Working notes live in the entity body." +``` + +### Stages + +| Stage | What the Ensign does | Flags and gate | +| --- | --- | --- | +| `intake` | Lists every source, attaches abstracts and citation info. | None. | +| `summarize` | One summary per source with quoted evidence so claims are traceable. | None. | +| `cross-reference` | Surfaces overlaps and contradictions between sources. | None. | +| `synthesis` | Pulls the cross-reference notes into a single argument. | `gate: true`, `feedback-to: cross-reference`. Rejection routes back to `cross-reference`. | +| `write-up` | Drafts a handoff-ready write-up the Captain can pass on. | Terminal. | + +### What success looks like + +A question that used to take a quiet weekend gets shaped over a week of approval passes. Contradictions surface before the write-up, not after a reviewer points them out. The final write-up traces every claim back to a source. + +## 6. Household admin + +**Who this is for**: someone running a household: bills, renewals, kids' school paperwork, appointments. + +**Recurring pain it removes**: things slip until they become urgent, the same items recur every year but nothing tracks them, and appointments stack on the same Tuesday. + +### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Household admin: one entity per admin item (a renewal, an appointment, a form). Stages: intake (Captain or an inbox mod creates items) then triage (Ensign proposes priority and deadline) then action (gate: Captain approves the proposed action) then follow-up (parked until a date or a reply) then closed (terminal). Keep the entity body short; this is a tracker, not a doc." +``` + +### Stages + +| Stage | What the Ensign does | Flags and gate | +| --- | --- | --- | +| `intake` | Captain or a mod creates new items. | None. | +| `triage` | Proposes priority and a deadline based on the item type. | None. | +| `action` | Lists the proposed action (call, file, schedule). | `gate: true`. Captain approves. | +| `follow-up` | Waits for a reply or a date. | `parked: true` (captain-facing marker). Captain transitions to `closed` when the item is resolved. | +| `closed` | Item resolved. | Terminal. | + +### What success looks like + +The household runs on the workflow instead of on memory. Renewals get handled a week early because the workflow surfaces them on its own clock. Appointments stop colliding because triage proposes a deadline up front instead of letting items pile onto a single day. + +## 7. Job search + +**Who this is for**: someone running an active job search across one or several open roles. + +**Recurring pain it removes**: resumes and cover letters get written in a panic, follow-ups are forgotten, and interviewing momentum is lost between weeks. + +### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Job search: one entity per role (a company plus a job posting). Stages: intake (capture posting text, contact name, deadline) then tailor (Ensign drafts a resume and cover letter tuned to the posting; Captain edits in the entity body) then apply (gate: Captain confirms send) then follow-up (parked until response or a follow-up date) then interview (Captain logs notes per round into the entity body) then outcome (terminal: offer, rejection, or withdrawn). Working text lives in the entity body." +``` + +### Stages + +| Stage | What the Ensign does | Flags and gate | +| --- | --- | --- | +| `intake` | Captures posting, contact, deadline. | None. | +| `tailor` | Drafts a resume and cover letter tuned to the posting. Captain edits. | None. | +| `apply` | Final review of the materials. | `gate: true`. Captain confirms send. | +| `follow-up` | Waits for a reply or a follow-up date. | `parked: true` (captain-facing marker). Captain transitions to `interview` on response or back to `intake` when the trail goes cold. | +| `interview` | Captain logs notes round by round into the entity body. | None. | +| `outcome` | Offer, rejection, or withdrawn. | Terminal. | + +### What success looks like + +The search runs in parallel across many roles without dropping any. Per-role materials accumulate as a library you can reuse on the next round. Follow-ups happen on time because the workflow surfaces parked items on their follow-up date. + +## 8. Software development + +Three developer workflows. They share a shape: the entity is one unit of work, the implementation stages run on isolated worktrees, and review is a fresh adversarial pass instead of self-review. + +All three assume `gh` (GitHub CLI) is installed and authenticated; the PR-review queue uses `gh pr review`, the ticket-ship workflow uses the shipped `pr-merge` mod which calls `gh pr create`, and the cross-repo workflow uses both. The `pr-merge` mod will not push or open a PR without explicit Captain approval; expect a confirmation prompt before any write to GitHub. + +### PR review queue + +**Who this is for**: a developer who is regularly added as a requested reviewer on GitHub PRs. + +**Recurring pain it removes**: the queue piles up silently, reviews end up rubber-stamped under time pressure, and rejected PRs do not get a real second pass. + +#### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission PR review queue for PRs where I am set as a requested reviewer. Entity: a PR awaiting my review. Auto-intake is provided by a hand-authored mod at _mods/pr-review-intake.md. The mod runs on the First Officer's idle hook with a self-imposed 20-minute minimum between GitHub polls, creates entities for new PRs, and auto-archives entities whose PRs are merged, closed, converted to draft, or whose review request was removed. Stages: intake (auto-populated by the mod; multiple entities can sit here simultaneously while waiting their turn) then review (concurrency: 1, only one PR is reviewed at a time; run an adversarial review skill; assume the worst, look for hidden brittleness, verify test coverage; output severity-tagged findings into the entity body) then verdict (gate: Captain approves the verdict APPROVE or REQUEST_CHANGES or NEEDS_DEEPER_REVIEW; on rejection bounce back to review with one-line feedback for a fresh adversarial pass) then posted (terminal: an Ensign here posts the approved review to GitHub via gh pr review). Use worktree on review for branch inspection. Set id-style to slug so entity filenames can be {owner}-{repo}-pr-{number}. Decline the pr-merge mod when offered; this workflow does not create PRs." +``` + +> Heads up: commission cannot scaffold new mods. It only copies pre-shipped ones. The `pr-review-intake.md` mod referenced above has to be authored by hand and dropped into `{workflow-dir}/_mods/` after commission finishes. Order does not matter; the First Officer re-scans `_mods/` on every loop. + +#### Stages + +| Stage | What the Ensign does | Flags and gate | +| --- | --- | --- | +| `intake` | Mod-populated. Many PRs can sit here. | None. | +| `review` | Runs an adversarial review skill, writes severity-tagged findings into the entity body. | `worktree: true`, `concurrency: 1`. | +| `verdict` | Surfaces the proposed verdict for Captain approval. | `gate: true`, `feedback-to: review`. Rejection bounces to `review` with feedback for a fresh pass. | +| `posted` | An Ensign posts the approved review to GitHub via `gh pr review --approve` or `--request-changes`. | Terminal. | + +#### What success looks like + +The review queue clears on a daily pass. Adversarial re-runs happen automatically on rejection instead of by hand. Nothing sits in your queue silently because the mod auto-archives PRs that no longer need you. + +### Linear ticket ship workflow + +**Who this is for**: a developer shipping Linear tickets end to end. + +**Recurring pain it removes**: tickets stretch across multiple sessions, the Linear ticket stays in Todo while you ship code locally, and review feels stale because it ran in the same context as implementation. + +#### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Linear ticket ship workflow: one entity per Linear ticket assigned to me. Auto-intake is provided by a hand-authored mod at _mods/linear-intake.md. Stages: intake (mod-populated, captain-curated, gate, concurrency: 100; the mod creates the entity but never auto-promotes) then triage (gate: classify the ticket, pick the affected repo, escalate if cross-repo) then design (gate: write Design and Acceptance Criteria into the entity body) then implement (worktree, concurrency: 1, TDD; mod transitions Linear to In Progress on stage entry) then review (worktree, fresh, gate, feedback-to: implement; dispatch a separate Ensign for an adversarial review) then ship (parked: open the PR; mod transitions Linear to In Review when the PR field is set) then merged (terminal; pr-merge mod advances when the PR lands on main; mod transitions Linear to Done). Accept the pr-merge mod when offered." +``` + +> Heads up: the `linear-intake.md` mod is hand-authored, like the PR review intake mod above. Commission only copies pre-shipped mods (today that means `pr-merge` only). Drop your `linear-intake.md` into `{workflow-dir}/_mods/` after commission finishes. + +#### Stages + +| Stage | Role | Flags and gate | +| --- | --- | --- | +| `intake` | Mod creates entities from Linear. | `gate: true`, captain-curated. (Concurrency is unbounded in practice: intake entities have no worktree, so the `concurrency` flag is a documentation marker, not enforced.) | +| `triage` | Classify, pick the affected repo. | `gate: true`. | +| `design` | Write Design and Acceptance Criteria into the entity body. | `gate: true`. | +| `implement` | TDD on an isolated branch. | `worktree: true`, `concurrency: 1`. Mod sets Linear to In Progress. | +| `review` | A fresh Ensign runs an adversarial review. | `worktree: true`, `fresh: true`, `gate: true`, `feedback-to: implement`. | +| `ship` | Open the PR. | `parked: true` (captain-facing marker). The pr-merge mod transitions the entity to `merged` when the PR merges; the mod is what does the work, not the `parked` flag. | +| `merged` | PR merged. | Terminal. Mod sets Linear to Done. | + +#### What success looks like + +Tickets ship and Linear stays in sync because the mod updates the ticket state on every stage entry. Review reads the diff cold on a fresh worktree, so it catches problems an in-session reviewer would skim past. Multiple PRs can be in flight and the workflow holds the state for each. + +### Cross-repo upgrade coordination + +**Who this is for**: a developer coordinating a dependency or framework upgrade that touches an upstream package and one or more downstream consumers. + +**Recurring pain it removes**: pairing notes get lost across sessions, the consumer breaks because the upstream PR was not actually published, and verification ends up running in the same context as implementation. + +#### Mission + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Cross-repo upgrade coordination: one entity per upgrade initiative (for example MUI v7 to v9, axios to fetch, Jest to Vitest). Stages: scope (gate: list every consumer call site, propose a phased plan) then upstream (worktree in the OSS package repo; implement, ship a PR, must merge and publish before consumer work begins) then downstream (worktree in the consumer repo; pull the new version, fix breakages, ship a paired PR) then verify (gate, fresh: run full test suites in both repos; rejection routes back to downstream) then done (terminal). Park between upstream merge and downstream start to wait on publish. Accept the pr-merge mod when offered for both implementation stages." +``` + +#### Stages + +| Stage | What happens | Flags and gate | +| --- | --- | --- | +| `scope` | List call sites, propose a phased plan. | `gate: true`. | +| `upstream` | Implement and ship in the OSS package repo. Must merge and publish before `downstream` starts. | `worktree: true`. The pr-merge mod opens the PR and advances this entity to the next stage when the PR merges. | +| `downstream` | Pull the new version, fix breakages, ship a paired PR. Captain holds this stage open until the upstream package version is live on the registry. | `worktree: true`, `parked: true`. The Captain transitions out when ready. | +| `verify` | Run full test suites in both repos. | `gate: true`, `fresh: true`, `feedback-to: downstream`. Rejection routes to `downstream`. | +| `done` | Both PRs merged, both suites green. | Terminal. | + +#### What success looks like + +Pairing notes live in the entity body, so they survive sessions and context limits. Consumer work waits for the upstream package to publish because the downstream stage is parked and the Captain only transitions out when the version is live. Verification reads both repos cold because it dispatches a fresh Ensign. diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md new file mode 100644 index 000000000..5e6c9fa7a --- /dev/null +++ b/docs/GETTING_STARTED.md @@ -0,0 +1,135 @@ +# Getting started + +This guide has two complete walkthroughs (email triage and pull request review). Pick whichever fits your work. Each walkthrough takes about five minutes once your tools are already installed; budget twenty minutes total if you also need to set up `gws-cli` or fresh-clone a repo. The mental model lives in [`USAGE.md`](USAGE.md); the cookbook of more examples lives in [`EXAMPLES.md`](EXAMPLES.md). + +A few terms used below before they get formal definitions in `USAGE.md`: a "stage" is a named step a work item passes through; a "gate" is a pause point at a stage boundary where you decide approve or reject; a "worktree" is an isolated git working directory the agent uses for code-bearing work; a "mod" is an optional markdown file in `_mods/` that extends a workflow with lifecycle hooks. + +## Before you start + +- Claude Code installed. See https://docs.claude.com/en/docs/claude-code. +- A sandbox is recommended. On macOS try [agent-safehouse](https://github.com/clkao/agent-safehouse); elsewhere a devcontainer or a VM works. +- For the email walkthrough: a configured `gws-cli` for the Gmail account you want triaged. Setup notes at https://github.com/googleworkspace/cli/tree/main/skills/gws-gmail. +- For the developer walkthrough: a git repository to point Spacedock at. + +## Walkthrough 1: Email triage + +### Step 1: Install the plugin + +```bash +claude plugin marketplace add clkao/spacedock && claude plugin install spacedock +``` + +### Step 2: Commission the workflow + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Email triage: fetch, categorize, and act on Gmail inbox. Entity: an email batch (up to 50 messages). Stages: intake (use gws-cli, triage in:inbox and read email body if necessary, categorize, propose action per email, output as table) then approval (Captain reviews proposal) then execute (carry out approved actions, do not mark as read). Use gws-cli (https://github.com/googleworkspace/cli/tree/main/skills/gws-gmail), GOOGLE_WORKSPACE_CLI_CONFIG_DIR=~/.config/gws/ for different accounts. Walk me through gws-cli setup if not already done." +``` + +The mission describes the entity (a batch of up to 50 emails), the stages (intake, approval, execute), the tool to use (`gws-cli`), and the constraint that execute must not mark messages as read. Commission turns that prose into a workflow directory plus a README that the First Officer will read on every loop. Nothing executes against your inbox at this point. The workflow files appear on disk, and that is it until the First Officer dispatches the first Ensign. + +### Step 3: Watch the First Officer start up + +The First Officer reads the new workflow README, prints the stage list it found, scaffolds the work-item file for the first batch, and dispatches an Ensign to the intake stage. You will see output that names the workflow, lists the stages it found, and announces the dispatch. The exact wording varies by Claude Code version; an illustrative shape: + +``` +First Officer (illustrative) + workflow: email-triage + stages: intake then approval then execute + dispatching ensign to intake (entity: batch-001) +``` + +The Ensign then runs `gws-cli`, walks your inbox, and writes its findings into the batch markdown file. + +### Step 4: Your first gate + +When intake finishes, the First Officer pauses at the approval gate and shows you the proposal the Ensign produced. A trimmed example: + +``` +batch-001 intake report + +| from | subject | category | proposed action | +|-------------------|--------------------------|----------|-----------------------| +| stripe@stripe.com | Receipt #4421 | archive | move to Receipts 2026 | +| pat@acme.co | RE: contract redlines | reply | draft a 3-line reply | +| security@aws.com | Unusual sign-in detected | escalate | surface to Captain | + +approve or reject? +``` + +You answer with `approve` or `reject`. To recategorize a row before approving, edit the batch markdown file directly and then approve. To send the batch back for a fresh pass with your feedback, reject; the next intake Ensign reads your feedback note from the work-item file. + +### Step 5: Approve and execute + +On approval, the First Officer dispatches an Ensign to the execute stage. That Ensign runs each approved action through `gws-cli`: archive the receipt, draft the reply in your Drafts folder, leave the escalation untouched. The batch file gets a closing report (what ran, what skipped, any failures), and the entity moves to the terminal stage. + +For the batch to bounce back to intake on rejection (rather than exit), the workflow's `approval` stage needs `feedback-to: intake` in its YAML. If commission did not add it, edit the generated `{workflow-dir}/README.md` and add the flag. The next intake Ensign reads your feedback note from the work-item file and produces a revised proposal instead of starting fresh. + +### Step 6: End the session + +When you are done for the day: + +``` +/spacedock:debrief +``` + +Debrief captures the commits, state transitions, decisions, and any open issues into a structured record. Tomorrow, opening the same workflow picks up exactly where you left off because the state lives in the markdown files, not in chat history. + +## Walkthrough 2: Pull request review + +### Step 1: Install the plugin + +```bash +claude plugin marketplace add clkao/spacedock && claude plugin install spacedock +``` + +### Step 2: Commission the workflow + +```bash +claude --agent spacedock:first-officer "/spacedock:commission Dev task workflow: superpowers-style design then plan then implement then review with ## Design and ## Implementation Plan inlined in the entity body (no separate spec/plan files), implement on isolated worktrees with strict TDD, design and review gated for approval." +``` + +The mission asks for four stages (design, plan, implement, review), inlined design and plan sections in each entity, isolated worktrees for implementation, and approval gates around design and review. + +### Step 3: Watch the First Officer start up + +The First Officer scaffolds the workflow directory, prints the stage list, and waits for you to seed work-item files (one per ticket or PR you want shipped). You can drop a markdown file into the entities directory by hand, or ask the First Officer to create one from a Linear ticket or PR URL. Once an entity exists, the First Officer dispatches an Ensign to the design stage. + +``` +First Officer (illustrative) + workflow: dev-task + stages: design then plan then implement then review + no entities yet; seed one to begin +``` + +### Step 4: Your first gate + +When the design Ensign finishes, the First Officer pauses at the design gate and shows you the proposed `## Design` section inlined in the entity body: the problem statement, the chosen approach, the tradeoffs considered, and the test contracts the implement stage will be held to. You answer `approve` or `reject`. To push back on a tradeoff before approving, edit the entity body and then approve. To send the design back for a fresh pass with your feedback, reject. + +### Step 5: Approve and execute + +On approval, the entity advances through plan and then into implement. The implement stage runs inside an isolated git worktree so the working tree of your main checkout is untouched. The Ensign writes failing tests first, makes them pass, and commits in small increments. When implement finishes, the review gate fires: an adversarial review Ensign reads the diff and either signs off or files specific objections you decide on. + +If you reject review, the entity bounces back to implement with the objection text baked in, and the next Ensign starts from there. This works because the dev commission writes `feedback-to: implement` on the review gate by default; verify the YAML if you want to be sure, since rejection without `feedback-to:` has no bounce target. + +### Step 6: End the session + +``` +/spacedock:debrief +``` + +Same flow as the email walkthrough: the next session reads the markdown and resumes whichever entities were mid-flight. + +## Common first-run gotchas + +- The first commission does not actually execute work; it scaffolds the workflow directory and README. Work starts on the next loop or when you re-invoke the First Officer. +- If a stage that should run in a worktree complains about uncommitted changes, commit or stash them first; Spacedock will not silently overwrite local edits. +- Approval gates pause the First Officer; the workflow does not advance until the Captain answers. +- If you want to bounce a stage back with feedback, reject (do not approve), and Spacedock will re-dispatch the previous stage with your feedback baked in. +- Commission cannot scaffold custom mods. It can only copy pre-shipped ones (currently just `pr-merge`). Custom mods are authored by hand in `_mods/`. +- The plugin is the source of truth for stage flags; the generated `{workflow-dir}/README.md` controls per-workflow behavior. If commission gets the YAML flags wrong, edit the YAML. A running First Officer holds the workflow in memory from when it booted, so close and reopen the session to pick up your edit. + +## Where to go next + +- [`USAGE.md`](USAGE.md) for the mental model and the YAML schema. +- [`EXAMPLES.md`](EXAMPLES.md) for the remaining workflows: six non-developer examples (trip planning, taxes, content publishing, research synthesis, household admin, job search) plus the developer cluster (PR review queue, Linear ticket ship, cross-repo upgrade coordination). +- [`PROMPTS.md`](PROMPTS.md) for an Initiating Prompt template that asks Claude to look at your recurring work and propose workflows shaped to it. diff --git a/docs/PROMPTS.md b/docs/PROMPTS.md new file mode 100644 index 000000000..4c6c3d903 --- /dev/null +++ b/docs/PROMPTS.md @@ -0,0 +1,255 @@ +# Initiating prompts + +Paste one of the prompts in this doc into Claude Code where Spacedock is checked out. Claude reads the project, asks you about your recurring work, and proposes two or three Spacedock workflows shaped to your work. For the mental model, see `USAGE.md`. For copy-paste workflow examples, see `EXAMPLES.md`. + +A heads up before pasting: prompts that mine your local Claude history (the developer variant) point Claude at `~/.claude/projects/`. Those directories hold every project session you have ever run. If you are using a hosted or shared Claude environment with logging, treat that history as content you are sharing with whoever sees the session. Skip the history paragraph if that is a concern. + +## The template (fill in the blanks) + +```markdown +I have Spacedock checked out locally at ``. Please read its +`README.md` and `docs/USAGE.md` so you understand what it is and how it works. +(If you are running in an environment without local filesystem access, ask me +to paste the contents of those two files into the chat instead.) + +Here is the recurring work I want help with. List three to six items, each one +or two sentences with enough detail to be useful (volumes, tools, sensitivities): + + +- +- +- + + + +You may mine my local Claude Code session history for patterns. Claude stores +per-project session histories under `~/.claude/projects/` by default. Look at +directories that match my active work: +- +- + +Limit the scan to the last three months. Skip this paragraph entirely if I left +it blank. + + +Based on what you read and what I told you, please: + +1. Tell me which of my recurring items would actually pay off as a Spacedock + workflow, and which would not. +2. Propose two or three example commissions as full copy-paste mission strings + I can hand to `/spacedock:commission`. +3. Discuss which one to start with and why. +4. Call out anything that should NOT be a Spacedock workflow (one-shot work, + single skill calls, things that do not have natural pause points). +5. End with one concrete next step I can do in the next ten minutes. +``` + +## Notes on making this work + +1. Be specific about your recurring work. "Email" is not enough. "Triage 60 to 100 work emails every morning, route receipts to a tax folder, escalate customer-support smell to myself with a proposed reply" is. +2. Name constraints. Time budget per session, sensitivity rules (do not actually book things, do not actually file taxes), tools you already have set up (`gws-cli`, `gh`, Linear MCP, Notion MCP). +3. If you have local history, point Claude at it. Otherwise skip that paragraph; the prompt still works without it. +4. Ask Claude to start small. One workflow run for two weeks beats four workflows on day one. +5. Read `EXAMPLES.md` after Claude proposes a mission. Compare its mission string against the example closest to your persona to sanity-check stage names and flags. +6. Tell Claude to spell out `feedback-to:` on every gate that should bounce back on rejection. Without that flag, a rejection has no defined bounce target and the work item stalls. + +## Variant: Developer + +```markdown +I have Spacedock checked out locally at ``. Please read its +`README.md` and `docs/USAGE.md` first. + +My recurring developer work: +- Feature development on two or three active repos. +- Code refactoring passes when a module gets unwieldy. +- Code quality improvements (typing, tests, lint debt). +- Pull request reviews, both mine and others'. + +Mine my local Claude history for patterns. Claude Code stores per-project +session histories under `~/.claude/projects/`. Look at directories matching my +active repos. The directory names encode the absolute path with slashes +replaced by dashes; keep the prefix that matches my layout: + +- `~/.claude/projects/-Users--repos--` +- `~/.claude/projects/-Users--repos--` +- `~/.claude/projects/-Users--repos--` + +Or, if my active repos are: , find the matching directories +yourself. + +Limit the scan to the last three months. I care about what kinds of work I +actually do repeatedly, not one-off sessions. + +Then please: + +1. Give me directions on how to use Spacedock effectively for this kind of work. +2. Propose two or three example commissions as full copy-paste mission strings. +3. Specifically propose at least one workflow that uses worktrees for isolation + on a code-bearing stage, and at least one that uses an adversarial review + gate (`fresh: true` + `gate: true` + `feedback-to: `) so review + does not run in the same context as implementation. +4. On every gate that should bounce on rejection, name `feedback-to: ` + explicitly so the rejection routes back instead of exiting. +5. Suggest which workflow to start with and why. +6. Call out anything I do that should stay a one-shot skill call, not a workflow. +7. End with one concrete next step. +``` + +## Variant: Email triager + +```markdown +I have Spacedock checked out locally at ``. Please read its +`README.md` and `docs/USAGE.md` first. + +My recurring work: +- Triage Gmail every morning. Roughly messages, mix of work, customer + support, vendors, and newsletters. +- Reply to a long tail of customer-support emails that need a real answer. +- Archive newsletters I do not act on. +- Sort receipts into a folder for monthly bookkeeping. + +Tools I have set up: `gws-cli` for Gmail. + +Constraints: +- Do not auto-reply. Drafts only. +- Surface a proposal for the Captain (me) to approve before anything is sent or + archived in bulk. +- Sensitive senders (named contacts, my manager, anything tagged Important) go + to a manual queue, not the auto-archive. +- On every gate that should bounce on rejection, name `feedback-to: ` + explicitly. A rejection without `feedback-to:` has no bounce target. + +Please: + +1. Propose one commission as a full copy-paste mission string to start with. +2. Propose one optional second commission for later if the first works. +3. Tell me which stages should have gates and why. +4. End with one concrete next step. +``` + +## Variant: Trip planner + +```markdown +I have Spacedock checked out locally at ``. Please read its +`README.md` and `docs/USAGE.md` first. + +My recurring work: +- Plan multi-week trips a few times a year. +- Research destinations (neighborhoods, transit, food, day trips). +- Draft itineraries that the first day of the trip will not immediately break. +- Identify the booking decisions that need to happen and in what order. +- Build per-trip packing lists from the locked itinerary. + +Constraints: +- Do not actually book anything. Ever. +- Collect options with prices and tradeoffs, then surface a decision pass for + the Captain (me) to make the call. +- Keep one work-item file per trip so I can pick the same trip up next weekend. +- On every gate that should bounce on rejection, name `feedback-to: ` + explicitly so rejection has a defined bounce target. + +Please: + +1. Propose one commission as a full copy-paste mission string for my next + planned trip: . +2. Propose a small variation of the same workflow for shorter trips (a long + weekend), since the booking-decision stage probably collapses. +3. Tell me which stages should be gates. +4. End with one concrete next step. +``` + +## Variant: Household and finance + +```markdown +I have Spacedock checked out locally at ``. Please read its +`README.md` and `docs/USAGE.md` first. + +My recurring work: +- Track recurring bills and subscription renewals so nothing surprises me. +- Categorize transactions for year-end tax prep. +- Manage kids' school paperwork, forms, and appointments. +- Review household admin once a week so the backlog does not grow. + +Constraints: +- Do not pay anything. Do not file anything. +- Produce a clear summary I review before I act. +- Anything financial goes through an explicit Captain-approval gate. +- On every gate that should bounce on rejection, name `feedback-to: ` + explicitly so rejection has a defined bounce target. + +Please: + +1. Propose one commission as a full copy-paste mission string focused on the + next concrete pain: . +2. Tell me which stages should be gates and which should be auto-advance. +3. Call out anything in my list that should stay a one-shot, not a workflow. +4. End with one concrete next step I can do in ten minutes. +``` + +## Variant: Content creator + +```markdown +I have Spacedock checked out locally at ``. Please read its +`README.md` and `docs/USAGE.md` first. + +My recurring work: +- Capture ideas as they come in (notes, voice memos, links). +- Draft pieces from captured ideas. +- Edit drafts (structure pass, line edit, tighten). +- Fact-check claims and links before publishing. +- Publish to . +- Post to social after publishing. + +My publishing cadence: . + +Constraints: +- Do not publish without an approval gate. The Captain (me) signs off on the + final draft before anything goes live. +- Fact-check stage runs against a clean Ensign so it cannot rubber-stamp the + drafting Ensign's claims. +- On every gate that should bounce on rejection, name `feedback-to: ` + explicitly so rejection has a defined bounce target. + +Please: + +1. Propose one commission as a full copy-paste mission string that fits my + actual cadence. +2. Tell me which stages need gates and which need a fresh Ensign. +3. End with one concrete next step. +``` + +## Variant: Researcher + +```markdown +I have Spacedock checked out locally at ``. Please read its +`README.md` and `docs/USAGE.md` first. + +My recurring work: +- Ingest papers and transcripts on an active research thread. +- Summarize per source: claims, methods, evidence quality, where it fits. +- Cross-reference across sources to find agreements, conflicts, and gaps. +- Draft write-ups for handoff (to coauthors, to a blog post, to a memo). + +My current research thread: . + +Constraints: +- The synthesis pass must be an explicit Captain-approval gate. Do not let the + workflow ship a write-up without me reviewing the cross-reference output. +- Per-source summaries can auto-advance; synthesis cannot. +- On every gate that should bounce on rejection, name `feedback-to: ` + explicitly so rejection has a defined bounce target. + +Please: + +1. Propose one commission as a full copy-paste mission string for my current + thread. +2. Tell me which stages need gates and which benefit from a fresh Ensign. +3. End with one concrete next step. +``` + +## After Claude responds + +1. Read `EXAMPLES.md` for the closest worked example. Compare Claude's proposed mission against it side by side and adjust stage names or flags that drift. The Household and finance variant straddles two examples; compare against both example 3 (Tax and finance prep) and example 6 (Household admin). +2. Commission the one workflow Claude recommends. Run it for two weeks before adding a second. +3. Edit the generated `{workflow-dir}/README.md` directly if a flag is wrong. A running First Officer holds the workflow in memory from boot, so close the session and reopen to pick up the edit. diff --git a/docs/USAGE.md b/docs/USAGE.md new file mode 100644 index 000000000..29fe3bf3f --- /dev/null +++ b/docs/USAGE.md @@ -0,0 +1,186 @@ +# How Spacedock works + +Spacedock has three roles. The Captain is you. The First Officer is the orchestrator agent that reads the workflow and decides what to do next. The Ensign is a worker agent dispatched to move a single work item through one stage. The basic loop: the First Officer reads the workflow, dispatches Ensigns to advance work items, and pauses at gates to ask the Captain for a call. + +## When Spacedock helps and when it does not + +Spacedock is a batch and approval layer that sits on top of skills. It does not replace skills. It pays off when work has natural pause points where you would want to glance at output before letting an agent move on, when work spans sessions so you come back tomorrow to the same item, or when you would otherwise re-run the same skill manually several times against your own output (the adversarial re-review pattern). + +For one-shots, keep using ordinary skills. Looking up a Slack thread, creating a worktree, managing plugins, running `/clear` between thoughts: none of these need a workflow. Reach for Spacedock when there is a stream of similar work items moving through the same shape, or when a single item has enough phases that you want a record of what happened at each one. + +## Vocabulary + +| Concept | Plain English | +|---------|---------------| +| Mission | The purpose of the workflow: what it processes and what it delivers. | +| Work item | A single markdown file representing one thing being worked on (an email batch, a trip, a ticket, a draft). | +| Workflow | A directory of work items plus a README that defines stages, schema, and gates. | +| Stage | A named step a work item passes through (intake, design, review, etc.). | +| Gate | A pause point at a stage boundary where the Captain approves or rejects. | + +| Role | Who | +|------|-----| +| Captain | You. Defines the mission and makes the calls at approval gates. | +| First Officer | Orchestrator agent. Reads the workflow, dispatches Ensigns, surfaces gates. | +| Ensign | Worker agent. Moves a single work item through one stage. | + +## The work-item file + +A work-item file is markdown with YAML frontmatter. Here is a concrete example: + +```yaml +--- +id: 054 +title: Session debrief command +status: design +worktree: +pr: +verdict: +--- + +## Context +Background, links, what brought this in. + +## Design +Sketched approach, alternatives considered, the choice. + +## Acceptance criteria +- What "done" looks like. +- What must NOT regress. + +## Captain feedback +(filled in when a gate rejects and bounces back) + +## Stage reports +(Ensigns append their work summaries here as the item moves through stages) +``` + +The `status` field drives the stage. The First Officer reads it to know what to dispatch next and which gate, if any, applies. + +The body grows as the item moves through stages. Each Ensign appends to it. Nothing is lost across sessions because the file holds the state, not the agent's memory. + +## Stages and the YAML schema + +Each workflow's `README.md` has a YAML block defining stages. Here is a representative block: + +```yaml +stages: + defaults: + worktree: false + concurrency: 2 + states: + - name: backlog + initial: true + - name: design + gate: true + - name: implementation + worktree: true + concurrency: 1 + - name: validation + worktree: true + fresh: true + feedback-to: implementation + - name: ship + parked: true + - name: done + terminal: true +``` + +| Flag | What it does | Default | +| --- | --- | --- | +| `initial: true` | Where new work items land when created. | false | +| `gate: true` | First Officer pauses and asks the Captain to approve or reject before advancing. | false | +| `worktree: true` | Stage runs inside an isolated git worktree. | false | +| `concurrency: N` | Maximum simultaneously-active worktree dispatches into this stage. Has no effect on stages without `worktree: true`. | 2 | +| `fresh: true` | Dispatches a brand-new Ensign with no prior session context (the manual `/clear` between phases). | false | +| `feedback-to: ` | On rejection at a gate, status routes back to the named stage with the Captain's feedback included in the next Ensign's prompt. | absent | +| `parked: true` | Captain-facing convention marking a stage that is expected to wait on an external signal (PR merge, reply, an out-of-band action). The runtime does not enforce parking; a parked stage advances when the Captain or a mod transitions the entity out of it. | false | +| `terminal: true` | End of the workflow. | false | +| `agent: ` | Override the default agent (`spacedock:ensign`) for this stage. Useful for routing a stage to a specialized agent like `spacedock:first-officer` for orchestration work. | `spacedock:ensign` | +| `model: ` | Force a specific Claude model for the Ensign at this stage (e.g. `haiku`, `sonnet`, `opus`). Inherits from `stages.defaults.model` if set, otherwise uses the Ensign's default. | inherits | + +The YAML is the artifact. The commission mission string is the spec. Running `/spacedock:commission` writes the YAML from the mission. If commission gets a flag wrong, edit the YAML by hand. The First Officer reads the workflow README at startup; a running session uses its in-memory copy of the workflow, so hand edits take effect on the next First Officer boot (close the session and reopen). The status binary always re-reads from disk, so `status --boot` and friends pick up the edit immediately. + +Set `feedback-to:` on any gate that should bounce work back to an earlier stage on rejection. Without `feedback-to:`, a rejection has no defined bounce target. On rejection, the Captain gives a one-line reason at the gate prompt; longer feedback goes in the entity body under `## Captain feedback` before rejecting. The next Ensign reads both. The runtime caps feedback cycles at three rounds per stage; after that the entity escalates rather than looping forever. + +The workflow README also carries an `id-style:` frontmatter field, set by commission, that chooses how new work items get their IDs: `sequential` (zero-padded numbers, the default), `sd-b32` (short collision-resistant IDs for collaborative or worktree-heavy workflows), or `slug` (kebab-case derived from titles or external identifiers like a Linear ticket or GitHub PR number). Stage names must match `[a-z0-9][a-z0-9-]*[a-z0-9]` (lowercase, kebab-case, no underscores); `status --validate` enforces this. + +The workflow directory itself is wherever you ran `/spacedock:commission` from. It is a normal directory inside your project; you can move it, copy it, commit it, or delete it. Worktrees live at `.worktrees/-` under the repo root and clean up on terminal merge. + +## Approval gates and adversarial review + +When a stage has `gate: true`, the First Officer pauses, presents the Ensign's stage report (findings, verdicts, artifacts, anomalies), and asks the Captain to approve or reject. You have three responses: + +1. Approve as-is. The next stage runs. +2. Edit the entity body, then approve. Your edits carry forward and the next stage uses them. +3. Reject. If `feedback-to: ` is set on this gate, the item routes back to that prior stage with your one-line gate-prompt reason and any `## Captain feedback` you added to the entity body. Without `feedback-to:`, the rejection has no defined bounce target. + +Adversarial review is a stage configured to push back instead of confirm. Combine `gate: true`, `fresh: true`, and `feedback-to:` on a review stage. A clean Ensign reads the work cold, the Captain can challenge thin evidence, and rejection re-dispatches with a stronger frame. The intent is to replace the manual loop of rerunning a review skill with progressively stronger language: one stage, three flags, repeatable. + +## Refit and iteration + +Workflows are not write-once. Run a workflow for two weeks. Note which stages never fire and which gates keep bouncing the same kind of issue back. Then either edit the README YAML by hand or run `/spacedock:refit` for a guided pass. + +A few tips: + +- Use `gate: true` sparingly. Only at decision points where the agent has actually been wrong (verdicts, classifications, scope), not for things you would rubber-stamp. +- Keep stage names as buckets, not verbs. Good: `review`, `validation`, `merged`. Not good: `reviewing_now`, `awaiting_validation`. +- Four to six stages per workflow is the sweet spot. TDD does not need to be split into red, green, refactor stages. A single `implement` stage is fine. + +## Sessions, debrief, and context limits + +State lives in the work-item markdown files, not in the Ensign's session. When an Ensign runs out of context, Spacedock dispatches a successor that picks up from the file. + +At the end of a working session, run `/spacedock:debrief` to record what happened: commits, status changes, decisions, open issues. The next session reads the debrief and continues from there. + +The work item, not the session, is the unit of state. You can come back next week and the workflow still knows what is in flight. + +## Mods at a glance + +Mods are markdown files in `{workflow-dir}/_mods/` that declare hook handlers for lifecycle events like startup, idle, or merge. The canonical example is `mods/pr-merge.md`, which opens a pull request automatically when a completed worktree branch is ready to land. Mods extend a workflow without changing the core. Heads up: `/spacedock:commission` cannot scaffold new mods. It only copies pre-shipped mods from the plugin into `{workflow-dir}/_mods/`. Custom mods (Linear sync, GitHub PR intake, and so on) are authored by hand. See the PR review queue and Linear ticket ship examples in `EXAMPLES.md` for the patterns. + +## Codex CLI + +Spacedock works in Codex CLI through the multi-agent path, which is currently experimental. The Claude Code path is the primary supported surface. + +```bash +git clone https://github.com/clkao/spacedock.git /path/to/spacedock +cd /path/to/spacedock +``` + +Then start Codex with multi-agent support enabled, and install Spacedock from the repo-local marketplace entry. The catalog lives at `.agents/plugins/marketplace.json` and points at `./plugins/spacedock`, which is a checked-in symlink to the repository root so Codex loads the real plugin package directly. The authoritative plugin manifest is `.codex-plugin/plugin.json`. The exact Codex install command varies by version; see your Codex docs for the current plugin install path. + +Legacy fallback: older Codex setups that predate the repo-local marketplace can still expose Spacedock by manually symlinking the skills directory: + +```bash +mkdir -p ~/.agents/skills +ln -s /path/to/spacedock/skills ~/.agents/skills/spacedock +``` + +The `.claude-plugin/plugin.json` and `.claude-plugin/marketplace.json` files remain as synchronized legacy mirrors of the Codex-first metadata for migration compatibility. + +Once installed, prompt Codex to use the first-officer skill: + +``` +Use the spacedock:first-officer skill to run /spacedock:commission in this directory. +``` + +## Running Spacedock safely + +- Run Spacedock inside a sandbox. Recommended options: `agent-safehouse` (macOS), `packnplay`, a devcontainer, or a VM. +- Approve at gates with care. Approval is irreversible: the next stage executes as soon as you say yes. If you are not sure, reject; the bounce flow is the recovery mechanism. +- Run `git status` before approving a stage that ran in a worktree if you suspect uncommitted local changes. +- Stop a running workflow by closing the Claude Code session (Ctrl-D or `/quit`). The First Officer halts; work-item files keep the in-flight state for next time. + +## Cost, data, and recovery + +- **Cost.** Each Ensign dispatch is a Claude session. A workflow that runs many Ensigns (intake every few minutes, fresh adversarial review, successors on context limit) will spend tokens at the rate of those sessions. Use `model: haiku` on lightweight stages to cap cost; reserve Sonnet or Opus for stages that need it. +- **Data.** Inbox-, calendar-, and document-touching workflows send the data they read to Claude as part of the Ensign's session. Treat anything in a work-item file or in a stage report as something that has been read by Claude. If your organization restricts what may be sent to third-party LLMs, scope your workflows accordingly. +- **OAuth scopes.** Tools like `gws-cli` ask for Google OAuth scopes during their own setup, not through Spacedock. Read the tool's setup notes before authorising. Revocation is in your Google account's third-party access settings, not in Spacedock. +- **Mistakes.** Ensigns can be wrong. Build in protection at the workflow level: keep the destructive stage (send, file, book, publish) gated; let the Ensign propose the action and only execute on the Captain's approval. Email triage in `EXAMPLES.md` is shaped this way: `intake` writes a proposal, `approval` gates, only `execute` touches Gmail. If an action you approved turns out to be wrong, recover with the touched system's own tools (Gmail trash, `git revert`, the upstream service's undo). +- **Multiple workflows.** Spacedock does not orchestrate across workflows. Each is its own directory with its own First Officer session. Open one at a time. + +## Where to go next + +- `EXAMPLES.md` for eight worked examples (household, knowledge work, and three developer workflows). +- `PROMPTS.md` for an Initiating Prompt template that asks Claude to look at your recurring work and propose workflows shaped to it. diff --git a/docs/superpowers/specs/2026-05-20-readme-refactor-newcomer-friendly.md b/docs/superpowers/specs/2026-05-20-readme-refactor-newcomer-friendly.md new file mode 100644 index 000000000..0e60f837b --- /dev/null +++ b/docs/superpowers/specs/2026-05-20-readme-refactor-newcomer-friendly.md @@ -0,0 +1,137 @@ +--- +title: README refactor for newcomers (developer and non-developer) +date: 2026-05-20 +status: approved +--- + +# Goal + +Replace the current developer-leaning `README.md` with a doc set that lets a first-time reader (developer or not) answer two questions fast: + +1. What is Spacedock? +2. How do I use it? + +The current README assumes the reader is a working developer already comfortable with Claude Code, plugins, and the Star Trek metaphor. New users repeatedly trip on "what is this actually for" before they ever reach the install step. Spacedock is general purpose. Email triage, trip planning, tax and finance prep, content publishing, research synthesis, household admin, and job search all fit. The docs should show that range up front. + +# Audience + +Lead with non-developer framing; cover developers second. Both audiences get worked examples. The Star Trek terms (Captain, First Officer, Ensign) stay; they get explained once in the README and then used freely. + +# Doc set + +Five files. Cross-linked. Each one has a single clear job. + +## `README.md` + +Target length: roughly 120 lines. + +Sections (in order): + +1. One-sentence positioning that does not assume the reader is a developer. +2. "Is this for me?" with three short scenarios: a household example, a knowledge-worker example, a developer example. The point is to broadcast range. +3. "What is Spacedock?" Two paragraphs. Introduces the Captain / First Officer / Ensign metaphor exactly once with the plain-English equivalent in parentheses. Names what is different about it (approval gates with evidence, adversarial review, batching, learning workflow, isolation, work surviving context limits). +4. Five-minute quick start. Install Claude Code plugin, run one commission example, see output. The default example is the universal email triage case (works for almost anyone). Below that, a developer quick start for users who want to skip ahead. +5. Where to go next: GETTING_STARTED, USAGE, EXAMPLES, PROMPTS. +6. Licence. + +What the README does NOT do: explain stage flags, explain the YAML schema, list every concept, walk through Codex setup, document mods. Those live in USAGE. + +## `docs/GETTING_STARTED.md` + +Target length: roughly 180 lines. + +A walkthrough for the very first run. Picks one universal example (email triage) and one developer example (PR review) and shows them work end to end, including: + +- Install +- Commission with the example mission string (copy-paste) +- What you should see in the terminal as the First Officer starts +- The first gate (Captain decision point) with sample output +- What happens after approval, after rejection +- How to end the session and resume it tomorrow (`/spacedock:debrief`) +- Common first-run gotchas + +The whole point is: someone runs through this once, they have done it, they have value. No mental model required first. + +## `docs/USAGE.md` + +Target length: roughly 250 lines. + +The mental model and reference. Sections: + +1. When Spacedock helps and when it does not (paraphrased from the Notion design guide). +2. Vocabulary: Mission, Work item, Workflow, Stage, Gate. Captain, First Officer, Ensign. Plain-English first, jargon second. +3. The work-item file. What goes in the frontmatter, what goes in the body, how it evolves through stages. +4. Stages and the YAML schema. Each stage flag explained with one concrete sentence: `gate`, `worktree`, `fresh`, `feedback-to`, `parked`, `terminal`, `initial`, `concurrency`. Real example block at the end. +5. Approval gates and the adversarial review pattern. How rejection feedback flows back to the previous stage. +6. Refit and iteration. Workflows are not write-once. After two weeks of use, edit the YAML by hand or run `/spacedock:refit`. +7. Sessions, debrief, and context limits. Why work does not die at the context limit. +8. Mods at a glance. Pointer to the pr-merge mod as the canonical example. Note that mods are author-by-hand only; commission does not generate them. +9. Codex CLI path (short). + +## `docs/EXAMPLES.md` + +Target length: roughly 400 lines. + +The cookbook. Eight worked examples. Each example has the same shape: + +- Who this is for (one sentence) +- The recurring pain it removes +- The mission string (copy-paste, fenced) +- The stages and what each gate decides +- What success looks like after two weeks of use + +Examples in order: + +1. Email triage (Gmail via gws-cli, escalate-to-human gate) +2. Trip planning (research, itinerary draft, booking checkpoint, packing list) +3. Tax and finance prep (document intake, categorize, deductions review, summary for accountant) +4. Content publishing (idea capture, draft, edit, fact-check gate, publish) +5. Research synthesis (paper or source ingest, summarize, cross-reference, write-up) +6. Household admin (recurring bills, renewals, appointments, parental-school paperwork) +7. Job search (role intake, tailor materials, apply, follow-up cadence) +8. Developer track: PR review queue, Linear ticket ship workflow, cross-repo upgrade coordination (each presented compactly; deep-linked to the Notion design guide content) + +## `docs/PROMPTS.md` + +Target length: roughly 200 lines. + +Three parts: + +1. The fill-in-the-blank Initiating Prompt template. Designed to be pasted into Claude Code (or Codex). Asks Claude to read the local Spacedock repo, ask discovery questions about the user's recurring work, and recommend two or three workflows tailored to that work. +2. Notes on how to make it produce good answers (be specific about the recurring work, give Claude permission to look at recent history if the user keeps logs, name constraints like time budget). +3. Six worked variants. Each variant is a complete copy-paste prompt that personalises the template for: developer (the original Notion variant, sanitised), email triager, trip planner, household and finance, content creator, researcher. + +# Style guardrails + +These apply to every doc. Reviewer #2 enforces them. + +- Zero em-dashes (the `—` character). Use a period, a comma, parentheses, or a colon. +- No emoji in body copy. +- ASCII quotes (`'` and `"`), not curly quotes. +- No `->` arrow where the word `to` works. +- Sentence case headings. Not Title Case Everywhere. +- Banned filler words: `robust`, `leverage`, `utilize`, `delve`, `in essence`, `comprehensive`, `seamless`, `powerful`, `cutting-edge`, `unlock`, `empower`, `streamline`, `harness`, `realm`, `landscape`, `journey`, `navigate the`, `dive deep`, `at the end of the day`. +- No reflexive `However` / `Moreover` / `Furthermore` paragraph openers. +- No closing summary paragraph that restates the section. +- Show, do not claim. If something is "easy" or "powerful," demonstrate it instead of saying so. +- Star Trek terms (Captain, First Officer, Ensign) introduced once in `README.md` with their plain-English equivalents in parentheses, then used freely in every doc afterwards. + +# Branching and PR + +- Branch: `docs/readme-refactor-newcomer-friendly` (already created from `main`). +- PR targets the `gcko/spacedock` fork, not the `clkao/spacedock` upstream. + +# Process + +1. Five writer agents in parallel, each handed: this spec, the relevant Notion excerpt, the existing README, and the style guardrails. Each writes one of the five docs. +2. Three review agents in parallel, each handed: all five drafts. + - Clarity reviewer (does a first-time reader get it? is TTV under five minutes?) + - AI-tell hygiene reviewer (em-dash sweep, banned word sweep, structural tells) + - Accuracy reviewer (technical claims cross-checked against the repo) +3. Integrate review fixes. Manual `grep -n "—"` final sweep across all five files to verify zero em-dashes. Commit. + +# Out of scope + +- Rewriting the existing in-repo docs under `docs/superpowers/`, `docs/plans/`, `docs/research/`. They are working artefacts, not user-facing. +- Updating CONTRIBUTING.md or AGENTS.md (none of those changes are needed for the new-user path). +- Building a website. Just markdown in the repo. diff --git a/tests/test_codex_plugin_packaging.py b/tests/test_codex_plugin_packaging.py index 2b6a8e3d5..cb475aeb8 100644 --- a/tests/test_codex_plugin_packaging.py +++ b/tests/test_codex_plugin_packaging.py @@ -99,15 +99,19 @@ def test_release_script_uses_codex_files_as_authority_and_updates_legacy_mirrors def test_docs_and_skill_surfaces_describe_codex_authority_and_legacy_compatibility(): readme = read_text("README.md") + usage = read_text("docs/USAGE.md") + user_docs = readme + "\n" + usage commission = read_text("skills/commission/SKILL.md") refit = read_text("skills/refit/SKILL.md") debrief = read_text("skills/debrief/SKILL.md") - assert ".codex-plugin/plugin.json" in readme - assert ".agents/plugins/marketplace.json" in readme - assert "plugins/spacedock" in readme - assert "~/.agents/skills/spacedock" in readme - assert "legacy" in readme.lower() + # README hands off Codex setup detail to docs/USAGE.md, so the + # required strings can live in either user-facing doc. + assert ".codex-plugin/plugin.json" in user_docs + assert ".agents/plugins/marketplace.json" in user_docs + assert "plugins/spacedock" in user_docs + assert "~/.agents/skills/spacedock" in user_docs + assert "legacy" in user_docs.lower() for text in (commission, refit, debrief): assert ".codex-plugin/plugin.json" in text