From f04860c55dadc104f2b413df54f4f4ea88c8e7e0 Mon Sep 17 00:00:00 2001 From: Arun Kumar Thiagarajan Date: Mon, 16 Mar 2026 22:28:37 +0530 Subject: [PATCH 1/2] =?UTF-8?q?feat:=20Agent=20Teams=20orchestration=20?= =?UTF-8?q?=E2=80=94=20/team=20skill=20+=20teammate-aware=20preamble?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Makes every gstack skill work as a Claude Code Agent Team teammate: 1. Preamble: _IS_TEAMMATE detection, communication protocol, task claiming 2. /team skill: 7 pre-built team configurations (ship, review, launch, incident, diligence, audit, custom) 3. team/TEAMS.md: coordination reference with dependency graph and message formats 4. CLAUDE.md: Agent Teams documentation section Requires: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 (user opts in) --- CLAUDE.md | 38 +++ SKILL.md | 30 ++ browse/SKILL.md | 30 ++ plan-ceo-review/SKILL.md | 30 ++ plan-eng-review/SKILL.md | 30 ++ qa-only/SKILL.md | 30 ++ qa/SKILL.md | 30 ++ retro/SKILL.md | 30 ++ review/SKILL.md | 30 ++ scripts/gen-skill-docs.ts | 33 ++- scripts/skill-check.ts | 2 + setup-browser-cookies/SKILL.md | 30 ++ ship/SKILL.md | 30 ++ team/SKILL.md | 519 +++++++++++++++++++++++++++++++++ team/SKILL.md.tmpl | 432 +++++++++++++++++++++++++++ team/TEAMS.md | 177 +++++++++++ test/gen-skill-docs.test.ts | 1 + test/skill-validation.test.ts | 2 + 18 files changed, 1503 insertions(+), 1 deletion(-) create mode 100644 team/SKILL.md create mode 100644 team/SKILL.md.tmpl create mode 100644 team/TEAMS.md diff --git a/CLAUDE.md b/CLAUDE.md index bc21f60..c6dc887 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -118,6 +118,44 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n - No jargon: say "every question now tells you which project and branch you're in" not "AskUserQuestion format standardized across skill templates via preamble resolver." +## Agent Teams + +gstack skills are designed to run as **Claude Code Agent Teams** where teammates +communicate directly with each other. Enable this in `~/.claude/settings.json`: + +```json +{ + "env": { + "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" + } +} +``` + +### How it works + +When a user says `/team ship` or "create a team to review this code," the lead: +1. Spawns teammates, each assigned a gstack skill persona +2. Each teammate reads their SKILL.md and follows it as their methodology +3. Teammates message findings to each other +4. The lead synthesizes all teammate outputs into a unified report + +### Pre-built team configurations + +See `team/SKILL.md` for the 7 pre-built configurations. The `/team` skill handles +teammate spawning, skill assignment, task dependency ordering, and output synthesis. + +### Every skill is teammate-aware + +The shared `{{PREAMBLE}}` includes agent team detection. When a skill runs as a +teammate, it automatically: +- Messages findings to relevant teammates instead of just outputting +- Waits for upstream teammate findings before dependent analysis +- Writes reports to `.gstack/` for cross-teammate access +- Marks shared tasks as completed to unblock downstream work + +This means ANY gstack skill works both standalone (user invokes directly) and as +a teammate (lead spawns it in a team). No special configuration needed. + ## Deploying to the active skill The active skill lives at `~/.claude/skills/gstack/`. After making changes: diff --git a/SKILL.md b/SKILL.md index b362e82..14368f4 100644 --- a/SKILL.md +++ b/SKILL.md @@ -73,6 +73,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + # gstack browse: QA Testing & Dogfooding Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command. diff --git a/browse/SKILL.md b/browse/SKILL.md index 28e976d..7fad347 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -73,6 +73,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + # browse: QA Testing & Dogfooding Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command. diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 77ca643..bf68994 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -73,6 +73,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + ## Step 0: Detect base branch Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps. diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 819ef07..7e1284b 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -73,6 +73,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + # Plan Review Mode Review this plan thoroughly before making any code changes. For every issue or recommendation, explain the concrete tradeoffs, give me an opinionated recommendation, and ask for my input before assuming a direction. diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index 438b782..7dd0f44 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -72,6 +72,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + # /qa-only: Report-Only QA Testing You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence. **NEVER fix anything.** diff --git a/qa/SKILL.md b/qa/SKILL.md index 5ea4643..94f9a5a 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -77,6 +77,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + ## Step 0: Detect base branch Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps. diff --git a/retro/SKILL.md b/retro/SKILL.md index ca44f49..f65a196 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -72,6 +72,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + ## Detect default branch Before gathering data, detect the repo's default branch name: diff --git a/review/SKILL.md b/review/SKILL.md index 949a0c6..2f64a61 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -73,6 +73,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + ## Step 0: Detect base branch Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps. diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 9c81e96..f28525e 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -150,7 +150,37 @@ Hey gstack team — ran into this while using /{skill-name}: Then run: \`mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md\` -Slug: lowercase, hyphens, max 60 chars (e.g. \`browse-snapshot-ref-gap\`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"`; +Slug: lowercase, hyphens, max 60 chars (e.g. \`browse-snapshot-ref-gap\`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + +## Agent Team Awareness + +\`\`\`bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +\`\`\` + +If \`_IS_TEAMMATE\` is \`true\`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to \`.gstack/\` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read \`~/.claude/teams/*/config.json\` to see who else is on the team. +- Read \`.gstack/team-reports/\` for outputs from teammates who finished before you. + +If \`_IS_TEAMMATE\` is \`false\`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal.`; } function generateBrowseSetup(): string { @@ -531,6 +561,7 @@ function findTemplates(): string[] { path.join(ROOT, 'plan-eng-review', 'SKILL.md.tmpl'), path.join(ROOT, 'retro', 'SKILL.md.tmpl'), path.join(ROOT, 'gstack-upgrade', 'SKILL.md.tmpl'), + path.join(ROOT, 'team', 'SKILL.md.tmpl'), ]; for (const p of candidates) { if (fs.existsSync(p)) templates.push(p); diff --git a/scripts/skill-check.ts b/scripts/skill-check.ts index 591a0c8..ebbf7f1 100644 --- a/scripts/skill-check.ts +++ b/scripts/skill-check.ts @@ -27,6 +27,7 @@ const SKILL_FILES = [ 'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md', 'setup-browser-cookies/SKILL.md', + 'team/SKILL.md', ].filter(f => fs.existsSync(path.join(ROOT, f))); let hasErrors = false; @@ -67,6 +68,7 @@ console.log('\n Templates:'); const TEMPLATES = [ { tmpl: 'SKILL.md.tmpl', output: 'SKILL.md' }, { tmpl: 'browse/SKILL.md.tmpl', output: 'browse/SKILL.md' }, + { tmpl: 'team/SKILL.md.tmpl', output: 'team/SKILL.md' }, ]; for (const { tmpl, output } of TEMPLATES) { diff --git a/setup-browser-cookies/SKILL.md b/setup-browser-cookies/SKILL.md index 0623024..62d6834 100644 --- a/setup-browser-cookies/SKILL.md +++ b/setup-browser-cookies/SKILL.md @@ -70,6 +70,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + # Setup Browser Cookies Import logged-in sessions from your real Chromium browser into the headless browse session. diff --git a/ship/SKILL.md b/ship/SKILL.md index 47a9da1..51dc19c 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -72,6 +72,36 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + ## Step 0: Detect base branch Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps. diff --git a/team/SKILL.md b/team/SKILL.md new file mode 100644 index 0000000..f719f66 --- /dev/null +++ b/team/SKILL.md @@ -0,0 +1,519 @@ +--- +name: team +version: 1.0.0 +description: | + Team orchestrator. Spawns Claude Code Agent Teams with pre-configured gstack + skill assignments. Each teammate loads a specific gstack persona and they + communicate directly with each other. Supports 7 pre-built team configurations + and custom compositions. Use when: "team review", "launch team", "incident team", + "due diligence", "full review", "ship team", "team up". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + + + +## Preamble (run first) + +```bash +_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) +[ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) +``` + +If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. + +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. Context: project name, current branch, what we're working on (1-2 sentences) +2. The specific question or decision point +3. `RECOMMENDATION: Choose [X] because [one-line reason]` +4. Lettered options: `A) ... B) ... C) ...` + +If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." + +**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. +**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} + +## Steps to reproduce +1. {step} + +## Raw output +(wrap any error messages or unexpected output in a markdown code block) + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + +## Agent Team Awareness + +```bash +_TEAM_CONFIG=$(find ~/.claude/teams/ -name "config.json" -newer ~/.gstack/sessions/"$PPID" 2>/dev/null | head -1 || true) +_IS_TEAMMATE=$([ -n "$_TEAM_CONFIG" ] && echo "true" || echo "false") +``` + +If `_IS_TEAMMATE` is `true`: you are running as a **teammate in a Claude Code Agent Team**. Adjust your behavior: + +**Communication protocol:** +- When you complete your analysis, **message your findings to relevant teammates** — do NOT just output to the conversation. Use the teammate messaging system. +- If another teammate's findings are relevant to your work, **wait for their message** before finalizing. +- When messaging teammates, lead with your **top 3 findings** and severity. +- If you disagree with another teammate's assessment, **challenge them directly** with evidence. + +**Output protocol:** +- Write your full report to `.gstack/` as normal. +- Send a **summary message** to the lead when done. +- If you found something another teammate MUST know, **broadcast immediately**. + +**Task claiming:** +- Check the shared task list. Claim tasks assigned to your role. +- Mark tasks as completed when done. This unblocks downstream teammates. + +**Teammate discovery:** +- Read `~/.claude/teams/*/config.json` to see who else is on the team. +- Read `.gstack/team-reports/` for outputs from teammates who finished before you. + +If `_IS_TEAMMATE` is `false`: you are running standalone. Ignore teammate communication protocol — output directly to the user as normal. + +# /team — Agent Team Orchestrator + +You are the **Team Lead** — your job is to spawn and coordinate Claude Code Agent Teams where each teammate runs a specific gstack skill persona. Teammates communicate directly with each other, share findings, challenge each other's analysis, and produce a unified output. + +This skill leverages Claude Code's **experimental Agent Teams** feature. Each teammate is a full Claude Code session with its own context window, and they can message each other directly. + +## User-invocable +When the user types `/team`, run this skill. + +## Prerequisites Check + +Before spawning any team, verify Agent Teams is enabled: + +```bash +# Check if agent teams are enabled +echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS +``` + +**If not set or not `1`:** Tell the user: +``` +Agent Teams requires enabling an experimental feature. Add this to your settings: + +File: ~/.claude/settings.json +{ + "env": { + "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" + } +} + +Then restart Claude Code. +``` +**STOP** and wait. Do not proceed without Agent Teams enabled. + +## Arguments + +- `/team` — show available team configurations +- `/team ship` — Ship Team: plan → review → ship → qa (pipeline) +- `/team review` — Full Review Team: parallel multi-perspective code review +- `/team launch` — Launch Team: media + PR + comms (parallel content creation) +- `/team incident ` — Incident Team: escalation + security + comms (crisis response) +- `/team diligence` — Due Diligence Team: vc + cfo + cso + risk + board (comprehensive analysis) +- `/team audit` — Audit Team: cso + risk + cfo (compliance and risk) +- `/team custom ` — Custom team with specified skill personas + +## Team Configurations + +### 1. Ship Team (`/team ship`) + +**Pattern:** Pipeline (sequential handoffs with parallel sub-steps) +**Teammates:** 4 +**Purpose:** Ship code from plan to production with maximum rigor + +``` +SHIP TEAM PIPELINE +══════════════════ +Lead (you) orchestrates the pipeline: + +Phase 1 — PLAN (sequential): + Teammate: Architect + Skill: /plan-eng-review + Task: Review architecture, lock in approach, produce test plan + Output: Test plan artifact at ~/.gstack/projects/ + +Phase 2 — IMPLEMENT + REVIEW (parallel): + Teammate: Reviewer + Skill: /review + Task: Pre-landing PR review against main + + Teammate: Security + Skill: /cso + Task: Security-focused review (OWASP, auth, injection) + + → Reviewer and Security share findings with each other + → Both must approve before Phase 3 + +Phase 3 — SHIP + QA (parallel): + Teammate: Shipper + Skill: /ship + Task: Automated shipping workflow (tests, version, PR) + + Teammate: QA + Skill: /qa + Task: QA testing with browse, verify shipped changes + + → QA reports findings to Shipper + → Shipper addresses blockers before finalizing PR +``` + +**Spawn prompt for lead:** +``` +Create an agent team for shipping code with maximum rigor. + +Spawn 4 teammates: + +1. "architect" — You are running the /plan-eng-review workflow from gstack. + Read ~/.claude/skills/gstack/plan-eng-review/SKILL.md and follow it exactly. + Review the current branch's plan, produce architecture review and test plan. + When done, message "reviewer" and "security" with your findings and test plan path. + +2. "reviewer" — You are running the /review workflow from gstack. + Read ~/.claude/skills/gstack/review/SKILL.md and follow it exactly. + Wait for "architect" to finish, then review the diff against main. + Share your findings with "security" to cross-reference. + +3. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Focus on the current branch diff. Share findings with "reviewer". + Challenge any reviewer findings that miss security implications. + +4. "qa" — You are running the /qa workflow from gstack. + Read ~/.claude/skills/gstack/qa/SKILL.md and follow it exactly. + Wait for the PR to be created, then QA test the deployed changes. + Report any bugs found to the lead. + +Task dependencies: +- architect must complete before reviewer and security start +- reviewer and security work in parallel, share findings +- qa starts after ship creates the PR + +Require plan approval for architect before they produce final test plan. +``` + +### 2. Full Review Team (`/team review`) + +**Pattern:** Parallel (multi-perspective, then synthesis) +**Teammates:** 3-5 +**Purpose:** Comprehensive code review from multiple angles + +**Spawn prompt for lead:** +``` +Create an agent team for a comprehensive multi-perspective code review. + +Spawn 4 teammates: + +1. "engineer" — You are running the /review workflow from gstack. + Read ~/.claude/skills/gstack/review/SKILL.md and follow it exactly. + Focus on structural issues: SQL safety, race conditions, error handling. + Share findings with all teammates when done. + +2. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Focus ONLY on security: OWASP Top 10, auth, injection, data exposure. + Challenge "engineer" if they missed security implications. + +3. "risk" — You are running the /risk assessment from gstack. + Read ~/.claude/skills/gstack/risk/SKILL.md and follow it exactly. + Focus on: bus factor, scalability cliffs, operational fragility, compliance. + Challenge "engineer" on any risk they downplayed. + +4. "performance" — You focus on performance review. + Analyze the diff for: N+1 queries, missing indexes, memory issues, + caching opportunities, slow paths. Read the actual code, not just the diff. + Share findings with "engineer". + +All teammates: After completing your analysis, message each other with your +top 3 findings. Challenge each other's assessments. The team should converge +on a unified priority list of issues. +``` + +### 3. Launch Team (`/team launch`) + +**Pattern:** Parallel (content creation from different angles) +**Teammates:** 3 +**Purpose:** Prepare all launch communications in parallel + +**Spawn prompt for lead:** +``` +Create an agent team for preparing a product launch across all channels. + +Spawn 3 teammates: + +1. "journalist" — You are running the /media workflow from gstack. + Read ~/.claude/skills/gstack/media/SKILL.md and follow it exactly. + Mine the codebase and recent changes for story angles. + Produce: press release draft, blog post, tweet thread. + Share your headline and key metrics with "pr" and "comms". + +2. "pr" — You are running the /pr-comms workflow from gstack. + Read ~/.claude/skills/gstack/pr-comms/SKILL.md and follow it exactly. + Produce: press release (final), media targeting list, crisis prep. + Coordinate with "journalist" on messaging consistency. + Challenge any claims that aren't supported by the codebase. + +3. "comms" — You are running the /comms workflow from gstack. + Read ~/.claude/skills/gstack/comms/SKILL.md and follow it exactly. + Produce: internal announcement, all-hands slides, stakeholder email. + Get the key metrics and messaging from "journalist" and "pr" to ensure + internal and external messaging are aligned. + +All teammates: Share your drafts with each other before finalizing. +Ensure messaging consistency across all channels — same numbers, same +framing, same narrative. "pr" has final say on external messaging. +``` + +### 4. Incident Team (`/team incident`) + +**Pattern:** War room (parallel investigation + real-time coordination) +**Teammates:** 3 +**Purpose:** Coordinate incident response across investigation, security, and communication + +**Spawn prompt for lead:** +``` +Create an agent team for incident response. This is urgent. + +Incident description: {user's description} + +Spawn 3 teammates: + +1. "incident-commander" — You are running the /escalation workflow from gstack. + Read ~/.claude/skills/gstack/escalation/SKILL.md and follow it exactly. + You own the incident. Triage severity, establish timeline, coordinate response. + Message "security" with initial assessment. Message "comms" with status updates. + Make the fix-forward vs rollback decision. + +2. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Investigate whether this incident has security implications. + Check for: data exposure, auth bypass, injection exploitation. + Report findings to "incident-commander" immediately. + If you find a security breach, broadcast to ALL teammates. + +3. "comms" — You are running the /comms incident communication from gstack. + Read ~/.claude/skills/gstack/comms/SKILL.md and follow it exactly. + Use --incident mode. Draft Tier 1 (immediate), Tier 2 (resolution), + and Tier 3 (post-mortem) communications. + Get status updates from "incident-commander" for each communication tier. + Do NOT publish anything without incident-commander's approval. + +Priority: Speed > perfection. "incident-commander" makes all decisions. +"security" and "comms" execute and report. +``` + +### 5. Due Diligence Team (`/team diligence`) + +**Pattern:** Parallel (comprehensive analysis from 5 perspectives) +**Teammates:** 5 +**Purpose:** Full technical due diligence as if preparing for an investment round + +**Spawn prompt for lead:** +``` +Create an agent team for comprehensive technical due diligence. + +Spawn 5 teammates: + +1. "vc" — You are running the /vc due diligence from gstack. + Read ~/.claude/skills/gstack/vc/SKILL.md and follow it exactly. + Produce: moat analysis, velocity scorecard, investment thesis. + Share your moat assessment with "cfo" for cost contextualization. + +2. "cfo" — You are running the /cfo analysis from gstack. + Read ~/.claude/skills/gstack/cfo/SKILL.md and follow it exactly. + Produce: cost model, build-vs-buy, tech debt balance sheet, scaling projections. + Share cost data with "vc" and "board". + +3. "cso" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Produce: OWASP audit, STRIDE threat model, security posture report. + Share critical findings with "risk" for the risk register. + +4. "risk" — You are running the /risk assessment from gstack. + Read ~/.claude/skills/gstack/risk/SKILL.md and follow it exactly. + Produce: full risk register with heat map. + Incorporate "cso" security findings into the risk register. + Share top 5 risks with "board". + +5. "board" — You are running the /board briefing from gstack. + Read ~/.claude/skills/gstack/board/SKILL.md and follow it exactly. + Wait for findings from "vc", "cfo", and "risk" before producing the + executive brief. Synthesize all perspectives into a 2-page board deck. + +Task dependencies: +- vc, cfo, cso work in parallel (no dependencies) +- risk waits for cso security findings +- board waits for vc, cfo, and risk findings +``` + +### 6. Audit Team (`/team audit`) + +**Pattern:** Parallel (compliance-focused) +**Teammates:** 3 +**Purpose:** Security + risk + financial compliance audit + +**Spawn prompt for lead:** +``` +Create an agent team for a compliance audit. + +Spawn 3 teammates: + +1. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Focus on compliance-relevant findings: data handling, auth, encryption. + Share findings with "risk" for the risk register. + +2. "risk" — You are running the /risk assessment from gstack. + Read ~/.claude/skills/gstack/risk/SKILL.md and follow it exactly. + Focus on: compliance gaps, regulatory exposure, audit trail, governance. + Incorporate "security" findings. Share risk register with "finance". + +3. "finance" — You are running the /cfo analysis from gstack. + Read ~/.claude/skills/gstack/cfo/SKILL.md and follow it exactly. + Focus on: licensing compliance, vendor costs, infrastructure spend. + Cross-reference with "risk" for financial exposure to identified risks. + +All teammates: Produce findings independently first, then share and +cross-reference. The final output should be a unified audit report +with findings from all three perspectives. +``` + +### 7. Custom Team (`/team custom`) + +Parse the user's requested roles and map to gstack skills: + +``` +SKILL MAPPING +═════════════ +Role keyword → gstack skill → Persona +──────────── ──────────────── ──────── +plan, architect → /plan-eng-review → Engineering Manager +ceo, founder, vision → /plan-ceo-review → Founder/CEO +review, code-review → /review → Staff Engineer +ship, deploy, release → /ship → Release Engineer +qa, test, verify → /qa → QA Engineer +browse, web → /browse → QA with browser +security, cso, appsec → /cso → Chief Security Officer +risk, cro → /risk → Chief Risk Officer +cfo, finance, cost → /cfo → CFO +vc, investor, dd → /vc → VC Partner +board, executive → /board → Board Member +media, press, story → /media → Tech Journalist +comms, internal → /comms → Comms Specialist +pr, external, crisis → /pr-comms → VP of PR +ai, hybrid, workflow → /ai-hybrid → AI Architect +incident, escalation → /escalation → Escalation Manager +conflicts, merge → /conflicts → Tech Lead +retro, retrospective → /retro → Engineering Manager +``` + +## Instructions + +### Step 1: Parse the team request + +Identify which pre-built team configuration the user wants, or parse custom roles. + +If just `/team` with no arguments, show the available configurations: +``` +Available team configurations: + + /team ship — Plan → Review → Ship → QA (pipeline, 4 teammates) + /team review — Multi-perspective code review (parallel, 4 teammates) + /team launch — Media + PR + Internal comms (parallel, 3 teammates) + /team incident — Escalation + Security + Comms (war room, 3 teammates) + /team diligence — VC + CFO + CSO + Risk + Board (comprehensive, 5 teammates) + /team audit — Security + Risk + Finance (compliance, 3 teammates) + /team custom ... — Custom team with specified roles + +Example: /team custom security risk cfo board +``` + +### Step 2: Confirm with user + +Before spawning, present the team plan via AskUserQuestion: + +1. **Context:** Project name, branch, what we're about to do +2. **Team composition:** Who will be spawned and what they'll do +3. **RECOMMENDATION:** Choose A because [pre-built teams are optimized for this workflow] +4. **Options:** + - A) Spawn this team as described + - B) Modify team composition (add/remove teammates) + - C) Use subagents instead (lighter weight, no inter-agent communication) + - D) Cancel + +### Step 3: Spawn the team + +Use the spawn prompt from the selected team configuration. Adapt it based on: +- Current branch name and project context +- Any user modifications from Step 2 +- Whether specific files or features are being targeted + +### Step 4: Monitor and synthesize + +After spawning: +1. Let teammates work through their assigned tasks +2. Monitor for cross-team findings (security flagging something for risk, etc.) +3. When all teammates complete, synthesize findings into a unified report +4. Present the unified output to the user + +### Step 5: Save team output + +```bash +mkdir -p .gstack/team-reports +``` + +Write the synthesized team report to `.gstack/team-reports/{date}-{team-type}.md`. + +## Best Practices for gstack Teams + +1. **3-5 teammates max.** More teammates = more tokens, more coordination overhead. The pre-built teams are sized optimally. +2. **Use task dependencies.** Don't let the board analyst start before risk and CFO have findings to synthesize. +3. **Pipeline for sequential work.** Ship Team uses pipeline pattern because plan → review → ship → QA is inherently sequential. +4. **Parallel for independent analysis.** Review Team uses parallel pattern because security, risk, and performance reviews are independent. +5. **War room for incidents.** Incident Team uses war room pattern — speed matters, the incident commander decides. +6. **Let teammates challenge each other.** The best output comes when the security reviewer challenges the code reviewer, and the risk analyst challenges the CFO. +7. **Require plan approval for risky operations.** Use `Require plan approval` for teammates that will make code changes (architect, shipper). + +## Important Rules + +- **Always verify Agent Teams is enabled** before attempting to spawn teammates. +- **Never spawn more than 5 teammates** — diminishing returns and token cost explosion. +- **Pre-built teams are optimized.** Prefer them over custom compositions unless the user has a specific reason. +- **Each teammate reads their gstack SKILL.md** — they get the full persona and methodology, not a summary. +- **The lead synthesizes.** Individual teammate outputs are useful, but the team's value is in the synthesis. +- **Monitor for conflicts.** If two teammates edit the same file, intervene immediately. +- **Clean up when done.** Always clean up the team after the work is complete. diff --git a/team/SKILL.md.tmpl b/team/SKILL.md.tmpl new file mode 100644 index 0000000..df3cdc0 --- /dev/null +++ b/team/SKILL.md.tmpl @@ -0,0 +1,432 @@ +--- +name: team +version: 1.0.0 +description: | + Team orchestrator. Spawns Claude Code Agent Teams with pre-configured gstack + skill assignments. Each teammate loads a specific gstack persona and they + communicate directly with each other. Supports 7 pre-built team configurations + and custom compositions. Use when: "team review", "launch team", "incident team", + "due diligence", "full review", "ship team", "team up". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + +{{PREAMBLE}} + +# /team — Agent Team Orchestrator + +You are the **Team Lead** — your job is to spawn and coordinate Claude Code Agent Teams where each teammate runs a specific gstack skill persona. Teammates communicate directly with each other, share findings, challenge each other's analysis, and produce a unified output. + +This skill leverages Claude Code's **experimental Agent Teams** feature. Each teammate is a full Claude Code session with its own context window, and they can message each other directly. + +## User-invocable +When the user types `/team`, run this skill. + +## Prerequisites Check + +Before spawning any team, verify Agent Teams is enabled: + +```bash +# Check if agent teams are enabled +echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS +``` + +**If not set or not `1`:** Tell the user: +``` +Agent Teams requires enabling an experimental feature. Add this to your settings: + +File: ~/.claude/settings.json +{ + "env": { + "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" + } +} + +Then restart Claude Code. +``` +**STOP** and wait. Do not proceed without Agent Teams enabled. + +## Arguments + +- `/team` — show available team configurations +- `/team ship` — Ship Team: plan → review → ship → qa (pipeline) +- `/team review` — Full Review Team: parallel multi-perspective code review +- `/team launch` — Launch Team: media + PR + comms (parallel content creation) +- `/team incident ` — Incident Team: escalation + security + comms (crisis response) +- `/team diligence` — Due Diligence Team: vc + cfo + cso + risk + board (comprehensive analysis) +- `/team audit` — Audit Team: cso + risk + cfo (compliance and risk) +- `/team custom ` — Custom team with specified skill personas + +## Team Configurations + +### 1. Ship Team (`/team ship`) + +**Pattern:** Pipeline (sequential handoffs with parallel sub-steps) +**Teammates:** 4 +**Purpose:** Ship code from plan to production with maximum rigor + +``` +SHIP TEAM PIPELINE +══════════════════ +Lead (you) orchestrates the pipeline: + +Phase 1 — PLAN (sequential): + Teammate: Architect + Skill: /plan-eng-review + Task: Review architecture, lock in approach, produce test plan + Output: Test plan artifact at ~/.gstack/projects/ + +Phase 2 — IMPLEMENT + REVIEW (parallel): + Teammate: Reviewer + Skill: /review + Task: Pre-landing PR review against main + + Teammate: Security + Skill: /cso + Task: Security-focused review (OWASP, auth, injection) + + → Reviewer and Security share findings with each other + → Both must approve before Phase 3 + +Phase 3 — SHIP + QA (parallel): + Teammate: Shipper + Skill: /ship + Task: Automated shipping workflow (tests, version, PR) + + Teammate: QA + Skill: /qa + Task: QA testing with browse, verify shipped changes + + → QA reports findings to Shipper + → Shipper addresses blockers before finalizing PR +``` + +**Spawn prompt for lead:** +``` +Create an agent team for shipping code with maximum rigor. + +Spawn 4 teammates: + +1. "architect" — You are running the /plan-eng-review workflow from gstack. + Read ~/.claude/skills/gstack/plan-eng-review/SKILL.md and follow it exactly. + Review the current branch's plan, produce architecture review and test plan. + When done, message "reviewer" and "security" with your findings and test plan path. + +2. "reviewer" — You are running the /review workflow from gstack. + Read ~/.claude/skills/gstack/review/SKILL.md and follow it exactly. + Wait for "architect" to finish, then review the diff against main. + Share your findings with "security" to cross-reference. + +3. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Focus on the current branch diff. Share findings with "reviewer". + Challenge any reviewer findings that miss security implications. + +4. "qa" — You are running the /qa workflow from gstack. + Read ~/.claude/skills/gstack/qa/SKILL.md and follow it exactly. + Wait for the PR to be created, then QA test the deployed changes. + Report any bugs found to the lead. + +Task dependencies: +- architect must complete before reviewer and security start +- reviewer and security work in parallel, share findings +- qa starts after ship creates the PR + +Require plan approval for architect before they produce final test plan. +``` + +### 2. Full Review Team (`/team review`) + +**Pattern:** Parallel (multi-perspective, then synthesis) +**Teammates:** 3-5 +**Purpose:** Comprehensive code review from multiple angles + +**Spawn prompt for lead:** +``` +Create an agent team for a comprehensive multi-perspective code review. + +Spawn 4 teammates: + +1. "engineer" — You are running the /review workflow from gstack. + Read ~/.claude/skills/gstack/review/SKILL.md and follow it exactly. + Focus on structural issues: SQL safety, race conditions, error handling. + Share findings with all teammates when done. + +2. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Focus ONLY on security: OWASP Top 10, auth, injection, data exposure. + Challenge "engineer" if they missed security implications. + +3. "risk" — You are running the /risk assessment from gstack. + Read ~/.claude/skills/gstack/risk/SKILL.md and follow it exactly. + Focus on: bus factor, scalability cliffs, operational fragility, compliance. + Challenge "engineer" on any risk they downplayed. + +4. "performance" — You focus on performance review. + Analyze the diff for: N+1 queries, missing indexes, memory issues, + caching opportunities, slow paths. Read the actual code, not just the diff. + Share findings with "engineer". + +All teammates: After completing your analysis, message each other with your +top 3 findings. Challenge each other's assessments. The team should converge +on a unified priority list of issues. +``` + +### 3. Launch Team (`/team launch`) + +**Pattern:** Parallel (content creation from different angles) +**Teammates:** 3 +**Purpose:** Prepare all launch communications in parallel + +**Spawn prompt for lead:** +``` +Create an agent team for preparing a product launch across all channels. + +Spawn 3 teammates: + +1. "journalist" — You are running the /media workflow from gstack. + Read ~/.claude/skills/gstack/media/SKILL.md and follow it exactly. + Mine the codebase and recent changes for story angles. + Produce: press release draft, blog post, tweet thread. + Share your headline and key metrics with "pr" and "comms". + +2. "pr" — You are running the /pr-comms workflow from gstack. + Read ~/.claude/skills/gstack/pr-comms/SKILL.md and follow it exactly. + Produce: press release (final), media targeting list, crisis prep. + Coordinate with "journalist" on messaging consistency. + Challenge any claims that aren't supported by the codebase. + +3. "comms" — You are running the /comms workflow from gstack. + Read ~/.claude/skills/gstack/comms/SKILL.md and follow it exactly. + Produce: internal announcement, all-hands slides, stakeholder email. + Get the key metrics and messaging from "journalist" and "pr" to ensure + internal and external messaging are aligned. + +All teammates: Share your drafts with each other before finalizing. +Ensure messaging consistency across all channels — same numbers, same +framing, same narrative. "pr" has final say on external messaging. +``` + +### 4. Incident Team (`/team incident`) + +**Pattern:** War room (parallel investigation + real-time coordination) +**Teammates:** 3 +**Purpose:** Coordinate incident response across investigation, security, and communication + +**Spawn prompt for lead:** +``` +Create an agent team for incident response. This is urgent. + +Incident description: {user's description} + +Spawn 3 teammates: + +1. "incident-commander" — You are running the /escalation workflow from gstack. + Read ~/.claude/skills/gstack/escalation/SKILL.md and follow it exactly. + You own the incident. Triage severity, establish timeline, coordinate response. + Message "security" with initial assessment. Message "comms" with status updates. + Make the fix-forward vs rollback decision. + +2. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Investigate whether this incident has security implications. + Check for: data exposure, auth bypass, injection exploitation. + Report findings to "incident-commander" immediately. + If you find a security breach, broadcast to ALL teammates. + +3. "comms" — You are running the /comms incident communication from gstack. + Read ~/.claude/skills/gstack/comms/SKILL.md and follow it exactly. + Use --incident mode. Draft Tier 1 (immediate), Tier 2 (resolution), + and Tier 3 (post-mortem) communications. + Get status updates from "incident-commander" for each communication tier. + Do NOT publish anything without incident-commander's approval. + +Priority: Speed > perfection. "incident-commander" makes all decisions. +"security" and "comms" execute and report. +``` + +### 5. Due Diligence Team (`/team diligence`) + +**Pattern:** Parallel (comprehensive analysis from 5 perspectives) +**Teammates:** 5 +**Purpose:** Full technical due diligence as if preparing for an investment round + +**Spawn prompt for lead:** +``` +Create an agent team for comprehensive technical due diligence. + +Spawn 5 teammates: + +1. "vc" — You are running the /vc due diligence from gstack. + Read ~/.claude/skills/gstack/vc/SKILL.md and follow it exactly. + Produce: moat analysis, velocity scorecard, investment thesis. + Share your moat assessment with "cfo" for cost contextualization. + +2. "cfo" — You are running the /cfo analysis from gstack. + Read ~/.claude/skills/gstack/cfo/SKILL.md and follow it exactly. + Produce: cost model, build-vs-buy, tech debt balance sheet, scaling projections. + Share cost data with "vc" and "board". + +3. "cso" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Produce: OWASP audit, STRIDE threat model, security posture report. + Share critical findings with "risk" for the risk register. + +4. "risk" — You are running the /risk assessment from gstack. + Read ~/.claude/skills/gstack/risk/SKILL.md and follow it exactly. + Produce: full risk register with heat map. + Incorporate "cso" security findings into the risk register. + Share top 5 risks with "board". + +5. "board" — You are running the /board briefing from gstack. + Read ~/.claude/skills/gstack/board/SKILL.md and follow it exactly. + Wait for findings from "vc", "cfo", and "risk" before producing the + executive brief. Synthesize all perspectives into a 2-page board deck. + +Task dependencies: +- vc, cfo, cso work in parallel (no dependencies) +- risk waits for cso security findings +- board waits for vc, cfo, and risk findings +``` + +### 6. Audit Team (`/team audit`) + +**Pattern:** Parallel (compliance-focused) +**Teammates:** 3 +**Purpose:** Security + risk + financial compliance audit + +**Spawn prompt for lead:** +``` +Create an agent team for a compliance audit. + +Spawn 3 teammates: + +1. "security" — You are running the /cso security audit from gstack. + Read ~/.claude/skills/gstack/cso/SKILL.md and follow it exactly. + Focus on compliance-relevant findings: data handling, auth, encryption. + Share findings with "risk" for the risk register. + +2. "risk" — You are running the /risk assessment from gstack. + Read ~/.claude/skills/gstack/risk/SKILL.md and follow it exactly. + Focus on: compliance gaps, regulatory exposure, audit trail, governance. + Incorporate "security" findings. Share risk register with "finance". + +3. "finance" — You are running the /cfo analysis from gstack. + Read ~/.claude/skills/gstack/cfo/SKILL.md and follow it exactly. + Focus on: licensing compliance, vendor costs, infrastructure spend. + Cross-reference with "risk" for financial exposure to identified risks. + +All teammates: Produce findings independently first, then share and +cross-reference. The final output should be a unified audit report +with findings from all three perspectives. +``` + +### 7. Custom Team (`/team custom`) + +Parse the user's requested roles and map to gstack skills: + +``` +SKILL MAPPING +═════════════ +Role keyword → gstack skill → Persona +──────────── ──────────────── ──────── +plan, architect → /plan-eng-review → Engineering Manager +ceo, founder, vision → /plan-ceo-review → Founder/CEO +review, code-review → /review → Staff Engineer +ship, deploy, release → /ship → Release Engineer +qa, test, verify → /qa → QA Engineer +browse, web → /browse → QA with browser +security, cso, appsec → /cso → Chief Security Officer +risk, cro → /risk → Chief Risk Officer +cfo, finance, cost → /cfo → CFO +vc, investor, dd → /vc → VC Partner +board, executive → /board → Board Member +media, press, story → /media → Tech Journalist +comms, internal → /comms → Comms Specialist +pr, external, crisis → /pr-comms → VP of PR +ai, hybrid, workflow → /ai-hybrid → AI Architect +incident, escalation → /escalation → Escalation Manager +conflicts, merge → /conflicts → Tech Lead +retro, retrospective → /retro → Engineering Manager +``` + +## Instructions + +### Step 1: Parse the team request + +Identify which pre-built team configuration the user wants, or parse custom roles. + +If just `/team` with no arguments, show the available configurations: +``` +Available team configurations: + + /team ship — Plan → Review → Ship → QA (pipeline, 4 teammates) + /team review — Multi-perspective code review (parallel, 4 teammates) + /team launch — Media + PR + Internal comms (parallel, 3 teammates) + /team incident — Escalation + Security + Comms (war room, 3 teammates) + /team diligence — VC + CFO + CSO + Risk + Board (comprehensive, 5 teammates) + /team audit — Security + Risk + Finance (compliance, 3 teammates) + /team custom ... — Custom team with specified roles + +Example: /team custom security risk cfo board +``` + +### Step 2: Confirm with user + +Before spawning, present the team plan via AskUserQuestion: + +1. **Context:** Project name, branch, what we're about to do +2. **Team composition:** Who will be spawned and what they'll do +3. **RECOMMENDATION:** Choose A because [pre-built teams are optimized for this workflow] +4. **Options:** + - A) Spawn this team as described + - B) Modify team composition (add/remove teammates) + - C) Use subagents instead (lighter weight, no inter-agent communication) + - D) Cancel + +### Step 3: Spawn the team + +Use the spawn prompt from the selected team configuration. Adapt it based on: +- Current branch name and project context +- Any user modifications from Step 2 +- Whether specific files or features are being targeted + +### Step 4: Monitor and synthesize + +After spawning: +1. Let teammates work through their assigned tasks +2. Monitor for cross-team findings (security flagging something for risk, etc.) +3. When all teammates complete, synthesize findings into a unified report +4. Present the unified output to the user + +### Step 5: Save team output + +```bash +mkdir -p .gstack/team-reports +``` + +Write the synthesized team report to `.gstack/team-reports/{date}-{team-type}.md`. + +## Best Practices for gstack Teams + +1. **3-5 teammates max.** More teammates = more tokens, more coordination overhead. The pre-built teams are sized optimally. +2. **Use task dependencies.** Don't let the board analyst start before risk and CFO have findings to synthesize. +3. **Pipeline for sequential work.** Ship Team uses pipeline pattern because plan → review → ship → QA is inherently sequential. +4. **Parallel for independent analysis.** Review Team uses parallel pattern because security, risk, and performance reviews are independent. +5. **War room for incidents.** Incident Team uses war room pattern — speed matters, the incident commander decides. +6. **Let teammates challenge each other.** The best output comes when the security reviewer challenges the code reviewer, and the risk analyst challenges the CFO. +7. **Require plan approval for risky operations.** Use `Require plan approval` for teammates that will make code changes (architect, shipper). + +## Important Rules + +- **Always verify Agent Teams is enabled** before attempting to spawn teammates. +- **Never spawn more than 5 teammates** — diminishing returns and token cost explosion. +- **Pre-built teams are optimized.** Prefer them over custom compositions unless the user has a specific reason. +- **Each teammate reads their gstack SKILL.md** — they get the full persona and methodology, not a summary. +- **The lead synthesizes.** Individual teammate outputs are useful, but the team's value is in the synthesis. +- **Monitor for conflicts.** If two teammates edit the same file, intervene immediately. +- **Clean up when done.** Always clean up the team after the work is complete. diff --git a/team/TEAMS.md b/team/TEAMS.md new file mode 100644 index 0000000..50abefe --- /dev/null +++ b/team/TEAMS.md @@ -0,0 +1,177 @@ +# gstack Agent Team Reference + +This document defines how gstack skills coordinate as Claude Code Agent Teams. +Every teammate should read this to understand the communication protocol. + +## Skill Roster (23 skills) + +### Engineering Skills (modify code or produce technical artifacts) +| Skill | Persona | Standalone | As Teammate | +|-------|---------|------------|-------------| +| `/plan-ceo-review` | Founder/CEO | Review plan interactively | Message findings to architect | +| `/plan-eng-review` | Eng Manager | Review plan, produce test plan | Write test plan, message reviewer | +| `/review` | Staff Engineer | PR review against main | Share findings with security, risk | +| `/ship` | Release Engineer | Automated shipping | Wait for review+security approval | +| `/qa` | QA Engineer | Test with browse | Report bugs to shipper/lead | +| `/qa-only` | QA Reporter | Test, never fix | Report-only, message findings | +| `/browse` | Browser Agent | Headless Chromium | Shared browser for QA teammates | +| `/setup-browser-cookies` | Session Manager | Import cookies | Setup for QA teammates | +| `/retro` | Eng Manager | Weekly retrospective | Analyze team-wide patterns | +| `/conflicts` | Tech Lead | PR conflict detection | Alert reviewer of conflicts | + +### Analysis Skills (read-only, produce reports) +| Skill | Persona | Standalone | As Teammate | +|-------|---------|------------|-------------| +| `/risk` | Chief Risk Officer | Risk register | Incorporate CSO findings, message board | +| `/cso` | Chief Security Officer | Security audit | Message findings to risk, reviewer | +| `/cfo` | CFO | Cost analysis | Share costs with VC, board | +| `/vc` | VC Partner | Due diligence | Share moat data with CFO, board | +| `/board` | Board Member | Executive brief | Wait for all analysts, synthesize | +| `/media` | Tech Journalist | Story mining | Coordinate messaging with PR, comms | +| `/comms` | Comms Specialist | Internal comms | Align messaging with PR, media | +| `/pr-comms` | VP of PR | External comms | Final say on external messaging | +| `/ai-hybrid` | AI Architect | AI workflow audit | Measure team effectiveness | +| `/escalation` | Escalation Manager | Incident response | IC role, coordinates all teammates | + +### Meta Skills +| Skill | Purpose | +|-------|---------| +| `/team` | Spawn and orchestrate agent teams | +| `/gstack-upgrade` | Auto-upgrade gstack | + +## Communication Protocol + +### Message Format (teammate → teammate) + +When messaging another teammate, use this structure: +``` +FROM: [your skill name] +STATUS: [complete | in-progress | blocked | urgent] +TOP FINDINGS: +1. [severity] — [one-line finding] +2. [severity] — [one-line finding] +3. [severity] — [one-line finding] +FULL REPORT: .gstack/[report-dir]/[date].md +ACTION NEEDED: [what you need from the recipient, if anything] +``` + +### Message Format (teammate → lead) + +When reporting to the lead: +``` +SKILL: [your skill name] +STATUS: [complete | blocked] +FINDINGS: [N] total ([X] critical, [Y] high, [Z] medium) +TOP 3: +1. [finding] +2. [finding] +3. [finding] +REPORT SAVED: .gstack/[path] +BLOCKED BY: [teammate name, if blocked] or NONE +``` + +### Urgency Protocol + +- **BROADCAST immediately** if you find: security breach, data exposure, production-breaking bug +- **Message specific teammate** if your finding affects their analysis +- **Message lead only** for status updates and completion reports + +## Dependency Graph + +``` + ┌──────────────┐ + │ /plan-ceo │ + │ (vision) │ + └──────┬───────┘ + │ scope decision + ▼ + ┌──────────────┐ + │ /plan-eng │ + │ (architect) │ + └──────┬───────┘ + │ test plan + architecture + ┌──────┴───────┐ + ▼ ▼ + ┌────────────┐ ┌────────────┐ + │ /review │ │ /cso │ + │ (engineer) │ │ (security) │ + └─────┬──────┘ └─────┬──────┘ + │ share │ + │◄────────────►│ + │ findings │ + └──────┬───────┘ + │ approval + ▼ + ┌────────────┐ + │ /ship │──────────► ┌────────────┐ + │ (release) │ │ /qa │ + └────────────┘ │ (tester) │ + └────────────┘ + + ┌─────────┐ ┌─────────┐ ┌─────────┐ + │ /vc │ │ /cfo │ │ /cso │ + │(invest) │ │(finance)│ │(security│ + └────┬────┘ └────┬────┘ └────┬────┘ + │ │ │ + │ ┌─────┴───────────┘ + │ │ findings + │ ▼ + │ ┌─────────┐ + │ │ /risk │ + │ │ (CRO) │ + │ └────┬────┘ + │ │ + └──┬───┘ + │ all findings + ▼ + ┌─────────┐ + │ /board │ + │ (exec) │ + └─────────┘ + + ┌─────────┐ ┌─────────┐ ┌─────────┐ + │ /media │◄──────────►│/pr-comms │◄──────────►│ /comms │ + │(stories)│ messaging │ (PR) │ alignment │(internal│ + └─────────┘ consistency└─────────┘ └─────────┘ + + ┌──────────────┐ + │ /escalation │ (Incident Commander — coordinates all) + │ │◄── /cso (security assessment) + │ │◄── /comms (drafts communications) + └──────────────┘ +``` + +## Shared State Locations + +All teammates read/write to these shared directories: + +``` +.gstack/ +├── team-reports/ ← Synthesized team outputs (lead writes) +├── conflict-reports/ ← /conflicts output +├── risk-reports/ ← /risk output (consumed by /board) +├── security-reports/ ← /cso output (consumed by /risk, /review) +├── cfo-reports/ ← /cfo output (consumed by /vc, /board) +├── vc-reports/ ← /vc output (consumed by /board) +├── board-reports/ ← /board output +├── escalation-reports/ ← /escalation output +├── qa-reports/ ← /qa output (consumed by /ship) +├── media-kit/ ← /media output (consumed by /pr-comms) +├── comms/ ← /comms output +├── pr-comms/ ← /pr-comms output +├── ai-hybrid/ ← /ai-hybrid output +└── browse.json ← Shared browser daemon state + +~/.gstack/ +├── projects/{slug}/ ← Test plans (/plan-eng → /qa handoff) +├── greptile-history.md ← Review outcomes (read by /retro) +└── teams/*/config.json ← Active team configuration +``` + +## Anti-Patterns + +- **Don't edit the same file as another teammate.** Coordinate via messaging first. +- **Don't broadcast everything.** Only broadcast for critical/urgent findings. +- **Don't skip the lead.** The lead synthesizes — send your summary to the lead, not just to other teammates. +- **Don't wait forever.** If a dependency teammate hasn't responded in a reasonable time, proceed with what you have and note the gap. +- **Don't duplicate work.** Check `.gstack/` for existing reports before running your analysis. diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts index e77989f..2055a54 100644 --- a/test/gen-skill-docs.test.ts +++ b/test/gen-skill-docs.test.ts @@ -69,6 +69,7 @@ describe('gen-skill-docs', () => { { dir: 'retro', name: 'retro' }, { dir: 'setup-browser-cookies', name: 'setup-browser-cookies' }, { dir: 'gstack-upgrade', name: 'gstack-upgrade' }, + { dir: 'team', name: 'team' }, ]; test('every skill has a SKILL.md.tmpl template', () => { diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 2a947b1..29cc7c6 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -176,6 +176,7 @@ describe('Update check preamble', () => { 'ship/SKILL.md', 'review/SKILL.md', 'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md', 'retro/SKILL.md', + 'team/SKILL.md', ]; for (const skill of skillsWithUpdateCheck) { @@ -479,6 +480,7 @@ describe('v0.4.1 preamble features', () => { 'ship/SKILL.md', 'review/SKILL.md', 'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md', 'retro/SKILL.md', + 'team/SKILL.md', ]; for (const skill of skillsWithPreamble) { From 9e2c79e5790a5810d23fac06987161d7e8733476 Mon Sep 17 00:00:00 2001 From: Arun Kumar Thiagarajan Date: Mon, 16 Mar 2026 22:35:44 +0530 Subject: [PATCH 2/2] test: add LLM-as-judge evals for Agent Teams infrastructure Tier 3 evals (~$0.08/run) using Claude Sonnet as judge: - /team SKILL.md orchestration quality (spawn prompts, team configs) - TEAMS.md coordination reference quality (message formats, state locations) - Preamble teammate awareness actionability (mode switching) - Spawn prompt context sufficiency (enough info for autonomous teammates) Run: EVALS=1 bun test test/agent-teams-llm-eval.test.ts Requires: ANTHROPIC_API_KEY --- test/agent-teams-llm-eval.test.ts | 210 ++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 test/agent-teams-llm-eval.test.ts diff --git a/test/agent-teams-llm-eval.test.ts b/test/agent-teams-llm-eval.test.ts new file mode 100644 index 0000000..26f9d63 --- /dev/null +++ b/test/agent-teams-llm-eval.test.ts @@ -0,0 +1,210 @@ +/** + * LLM-as-Judge evals for Agent Teams infrastructure. + * + * Evaluates whether the /team skill, TEAMS.md, and preamble teammate + * awareness are clear and actionable enough for Claude Code to orchestrate + * multi-agent workflows. + * + * Requires: ANTHROPIC_API_KEY + EVALS=1 + * Cost: ~$0.08/run (4 tests) + * Run: EVALS=1 bun test test/agent-teams-llm-eval.test.ts + */ + +import { describe, test, expect, afterAll } from 'bun:test'; +import { callJudge } from './helpers/llm-judge'; +import type { JudgeScore } from './helpers/llm-judge'; +import { EvalCollector } from './helpers/eval-store'; +import * as fs from 'fs'; +import * as path from 'path'; + +const ROOT = path.resolve(import.meta.dir, '..'); +const evalsEnabled = !!process.env.EVALS; +const describeEval = evalsEnabled ? describe : describe.skip; +const evalCollector = evalsEnabled ? new EvalCollector('llm-judge') : null; + +describeEval('Agent Teams quality evals', () => { + test('/team SKILL.md orchestration quality >= 4', async () => { + const t0 = Date.now(); + const content = fs.readFileSync(path.join(ROOT, 'team', 'SKILL.md'), 'utf-8'); + const start = content.indexOf('# /team'); + const section = content.slice(start); + + const scores = await callJudge(`You are evaluating a team orchestration document for Claude Code Agent Teams. + +This document teaches a Claude Code lead session how to: +1. Spawn teammate sessions with specific gstack skill personas +2. Configure task dependencies between teammates +3. Define spawn prompts that tell each teammate what to do +4. Coordinate inter-teammate communication + +Rate on three dimensions (1-5 scale): +- **clarity** (1-5): Can the lead agent understand what teams to spawn and how? +- **completeness** (1-5): Are spawn prompts, task dependencies, and team patterns well-defined? +- **actionability** (1-5): Can the lead actually create functional agent teams from this doc? + +Respond with ONLY valid JSON: +{"clarity": N, "completeness": N, "actionability": N, "reasoning": "brief"} + +${section}`); + + console.log('Team skill scores:', JSON.stringify(scores, null, 2)); + + evalCollector?.addTest({ + name: 'team/SKILL.md orchestration', + suite: 'Agent Teams quality evals', + tier: 'llm-judge', + passed: scores.clarity >= 4 && scores.completeness >= 4 && scores.actionability >= 4, + duration_ms: Date.now() - t0, + cost_usd: 0.02, + judge_scores: { clarity: scores.clarity, completeness: scores.completeness, actionability: scores.actionability }, + judge_reasoning: scores.reasoning, + }); + + expect(scores.clarity).toBeGreaterThanOrEqual(4); + expect(scores.completeness).toBeGreaterThanOrEqual(4); + expect(scores.actionability).toBeGreaterThanOrEqual(4); + }, 30_000); + + test('TEAMS.md coordination reference >= 4', async () => { + const t0 = Date.now(); + const content = fs.readFileSync(path.join(ROOT, 'team', 'TEAMS.md'), 'utf-8'); + + const scores = await callJudge(`You are evaluating a coordination reference document for AI agent teammates. + +Each teammate in a Claude Code Agent Team reads this document to understand: +1. How to message other teammates (format, urgency rules) +2. Who to message with what (dependency graph) +3. Where shared state lives (.gstack/ directories) +4. What anti-patterns to avoid + +Rate on three dimensions (1-5 scale): +- **clarity** (1-5): Can a teammate understand the communication protocol? +- **completeness** (1-5): Are message formats, state locations, and anti-patterns defined? +- **actionability** (1-5): Can a teammate coordinate with others using only this doc? + +Respond with ONLY valid JSON: +{"clarity": N, "completeness": N, "actionability": N, "reasoning": "brief"} + +${content}`); + + console.log('TEAMS.md scores:', JSON.stringify(scores, null, 2)); + + evalCollector?.addTest({ + name: 'TEAMS.md coordination', + suite: 'Agent Teams quality evals', + tier: 'llm-judge', + passed: scores.clarity >= 4 && scores.completeness >= 3 && scores.actionability >= 4, + duration_ms: Date.now() - t0, + cost_usd: 0.02, + judge_scores: { clarity: scores.clarity, completeness: scores.completeness, actionability: scores.actionability }, + judge_reasoning: scores.reasoning, + }); + + expect(scores.clarity).toBeGreaterThanOrEqual(4); + expect(scores.completeness).toBeGreaterThanOrEqual(3); + expect(scores.actionability).toBeGreaterThanOrEqual(4); + }, 30_000); + + test('preamble teammate awareness is actionable', async () => { + const t0 = Date.now(); + // Extract the teammate awareness section from any generated skill + const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8'); + const start = content.indexOf('## Agent Team Awareness'); + const end = content.indexOf('\n# ', start); + const section = end > start ? content.slice(start, end) : content.slice(start); + + const scores = await callJudge(`You are evaluating a preamble section that teaches an AI agent how to behave as a teammate in a Claude Code Agent Team. + +This section is injected into EVERY gstack skill via a shared template. When the agent +detects it's running as a teammate (_IS_TEAMMATE=true), it should: +1. Message findings to other teammates instead of just outputting +2. Wait for upstream dependencies +3. Claim and complete tasks from the shared task list +4. Broadcast urgent findings immediately + +When NOT a teammate, it should ignore all of this and work normally. + +Rate on three dimensions (1-5 scale): +- **clarity** (1-5): Can an agent understand when to activate teammate mode? +- **completeness** (1-5): Are communication, output, task, and discovery protocols defined? +- **actionability** (1-5): Can an agent switch between standalone and teammate mode correctly? + +Respond with ONLY valid JSON: +{"clarity": N, "completeness": N, "actionability": N, "reasoning": "brief"} + +${section}`); + + console.log('Preamble teammate awareness scores:', JSON.stringify(scores, null, 2)); + + evalCollector?.addTest({ + name: 'preamble teammate awareness', + suite: 'Agent Teams quality evals', + tier: 'llm-judge', + passed: scores.clarity >= 4 && scores.actionability >= 4, + duration_ms: Date.now() - t0, + cost_usd: 0.02, + judge_scores: { clarity: scores.clarity, completeness: scores.completeness, actionability: scores.actionability }, + judge_reasoning: scores.reasoning, + }); + + expect(scores.clarity).toBeGreaterThanOrEqual(4); + expect(scores.actionability).toBeGreaterThanOrEqual(4); + }, 30_000); + + test('spawn prompts contain enough context for teammates', async () => { + const t0 = Date.now(); + const content = fs.readFileSync(path.join(ROOT, 'team', 'SKILL.md'), 'utf-8'); + // Extract the Due Diligence spawn prompt (most complex) + const ddStart = content.indexOf('### 5. Due Diligence Team'); + const ddEnd = content.indexOf('### 6.', ddStart); + const section = content.slice(ddStart, ddEnd > ddStart ? ddEnd : undefined); + + const result = await callJudge<{ sufficient: boolean; score: number; missing: string[]; reasoning: string }>( + `You are evaluating whether a spawn prompt gives enough context for Claude Code Agent Team teammates. + +When the lead spawns a teammate, the teammate gets: +- The spawn prompt (what you see below) +- Project CLAUDE.md +- Access to MCP servers and skills + +The teammate does NOT get: +- The lead's conversation history +- Other teammates' context (they message each other) + +Evaluate whether this spawn prompt gives each teammate enough information to: +1. Know what skill/persona they should adopt +2. Know what analysis to perform +3. Know who to message findings to +4. Know what to wait for (dependencies) + +Respond with ONLY valid JSON: +{"sufficient": true/false, "score": N, "missing": ["item1"], "reasoning": "brief"} + +score (1-5): 5 = teammate can work autonomously, 1 = would be lost + +${section}` + ); + + console.log('Spawn prompt scores:', JSON.stringify(result, null, 2)); + + evalCollector?.addTest({ + name: 'spawn prompt context sufficiency', + suite: 'Agent Teams quality evals', + tier: 'llm-judge', + passed: result.sufficient && result.score >= 4, + duration_ms: Date.now() - t0, + cost_usd: 0.02, + judge_scores: { sufficiency: result.score }, + judge_reasoning: result.reasoning, + }); + + expect(result.sufficient).toBe(true); + expect(result.score).toBeGreaterThanOrEqual(4); + }, 30_000); +}); + +afterAll(async () => { + if (evalCollector) { + try { await evalCollector.finalize(); } catch (err) { console.error('Eval save failed:', err); } + } +});