Skip to content

feat: add executable Tier 2 Agent Teams patterns#195

Merged
maystudios merged 1 commit intomainfrom
worktree-agent-a63098aa
Mar 25, 2026
Merged

feat: add executable Tier 2 Agent Teams patterns#195
maystudios merged 1 commit intomainfrom
worktree-agent-a63098aa

Conversation

@maystudios
Copy link
Copy Markdown
Owner

Summary

  • Replace prose-only Tier 2 descriptions in execute.md section 6.3 with concrete TeamCreate/SendMessage call syntax for the competitive implementation debate pattern
  • Add three complete Tier 2 workflow patterns to maxsim-batch/SKILL.md: competitive implementation (debate), multi-reviewer code review (cross-checking), and collaborative debugging (adversarial hypothesis testing)
  • Each pattern includes: TeamCreate call, teammate spawn with role prompts, SendMessage exchange, verifier resolution, and Tier 1 graceful degradation fallback
  • Remove "planned but not yet implemented" disclaimer from SKILL.md

Addresses PROJECT.md §7.2 audit gap (Parallelism PARTIAL 1-3).

Test plan

  • Unit tests pass (550/550)
  • No lint regressions from template changes
  • Tool call syntax matches agent-teams-guide.md API reference (SendMessage parameters: type, recipient, content, summary)
  • Graceful degradation section preserved and enhanced with per-pattern fallbacks
  • Tier selection table retained unchanged
  • Manual review: confirm patterns are consistent with AGENTS.md Tier 2 architecture

🤖 Generated with Claude Code

…xsim-batch SKILL.md

Replace prose descriptions of Tier 2 competitive implementation with
concrete TeamCreate/SendMessage call syntax in execute.md section 6.3.
Add three complete Tier 2 workflow patterns to maxsim-batch SKILL.md:
competitive implementation (debate), multi-reviewer code review
(cross-checking), and collaborative debugging (adversarial hypothesis
testing). Each pattern includes TeamCreate, teammate spawn, SendMessage
exchange, verifier resolution, and Tier 1 graceful degradation fallback.
Removes "planned but not yet implemented" disclaimer.

Addresses PROJECT.md §7.2 audit gap (Parallelism PARTIAL 1-3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 25, 2026 22:00
@maystudios maystudios merged commit 135c5e3 into main Mar 25, 2026
3 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 5.15.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades Tier 2 Agent Teams guidance from prose to concrete, copy/pasteable workflow patterns so orchestrators can apply TeamCreate/SendMessage-based collaboration (with Tier 1 fallbacks) during execution, reviews, and debugging.

Changes:

  • Replaces the Tier 2 “debate” description in execute.md with a step-by-step TeamCreate + teammate spawn + SendMessage critique + verifier selection flow (with Tier 1 fallback).
  • Adds three complete Tier 2 patterns to maxsim-batch/SKILL.md (competitive implementation, multi-reviewer cross-checking, collaborative debugging), including activation checks and per-pattern Tier 1 degradations.
  • Removes the “planned but not yet implemented” disclaimer from SKILL.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
templates/workflows/execute.md Adds concrete Tier 2 competitive debate steps (TeamCreate/SendMessage/verifier) and explicit Tier 1 fallback path.
templates/skills/maxsim-batch/SKILL.md Documents executable Tier 2 Agent Teams patterns + activation check + strengthened graceful-degradation guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +280 to +323
SendMessage({
type: "message",
recipient: "competitor-b",
content: "Review competitor-a's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.",
summary: "Requesting adversarial review of competitor-a's work"
})

SendMessage({
type: "message",
recipient: "competitor-a",
content: "Review competitor-b's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.",
summary: "Requesting adversarial review of competitor-b's work"
})
```
Each teammate responds with a structured critique. This fights LLM anchoring bias -- the first plausible answer does not automatically win.

**Step 2d -- Verifier selects winner:**
Spawn a fresh verifier agent (NOT a team member) to evaluate both implementations and both critiques:
```
Agent(
subagent_type: "verifier",
model: "{verifier_model}",
prompt: "
You are judging a competitive implementation. Two (or three) agents each implemented the same task independently, then reviewed each other's work adversarially.

## Implementations
- competitor-a (CONSERVATIVE): {summary or path to worktree-a}
- competitor-b (INNOVATIVE): {summary or path to worktree-b}

## Critiques
- competitor-b's critique of competitor-a: {critique-b-of-a}
- competitor-a's critique of competitor-b: {critique-a-of-b}

## Selection Criteria (in priority order)
1. Correctness -- does it satisfy all success criteria?
2. Test coverage -- are edge cases tested?
3. Code quality -- readability, maintainability, consistency with codebase
4. Simplicity -- prefer fewer abstractions when correctness is equal

Output exactly: WINNER: competitor-{a|b|c}
Followed by a justification paragraph.
"
)
```
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 2c says “each reviews the others’ work”, but the example SendMessage exchange only covers competitor-a ↔ competitor-b. If competitor-c is spawned (critical tasks), it would receive no critique request and its feedback won’t be available to the verifier. Expand the debate phase to include competitor-c (e.g., round-robin critiques) or explicitly state that Tier 2 debate is only for 2 competitors.

Suggested change
SendMessage({
type: "message",
recipient: "competitor-b",
content: "Review competitor-a's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.",
summary: "Requesting adversarial review of competitor-a's work"
})
SendMessage({
type: "message",
recipient: "competitor-a",
content: "Review competitor-b's implementation. Identify weaknesses, edge cases missed, and potential issues. Be adversarial -- find real problems, not style preferences. Report: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns.",
summary: "Requesting adversarial review of competitor-b's work"
})
```
Each teammate responds with a structured critique. This fights LLM anchoring bias -- the first plausible answer does not automatically win.
**Step 2d -- Verifier selects winner:**
Spawn a fresh verifier agent (NOT a team member) to evaluate both implementations and both critiques:
```
Agent(
subagent_type: "verifier",
model: "{verifier_model}",
prompt: "
You are judging a competitive implementation. Two (or three) agents each implemented the same task independently, then reviewed each other's work adversarially.
## Implementations
- competitor-a (CONSERVATIVE): {summary or path to worktree-a}
- competitor-b (INNOVATIVE): {summary or path to worktree-b}
## Critiques
- competitor-b's critique of competitor-a: {critique-b-of-a}
- competitor-a's critique of competitor-b: {critique-a-of-b}
## Selection Criteria (in priority order)
1. Correctness -- does it satisfy all success criteria?
2. Test coverage -- are edge cases tested?
3. Code quality -- readability, maintainability, consistency with codebase
4. Simplicity -- prefer fewer abstractions when correctness is equal
Output exactly: WINNER: competitor-{a|b|c}
Followed by a justification paragraph.
"
)
```

Copilot uses AI. Check for mistakes.
Comment on lines +319 to +320
Output exactly: WINNER: competitor-{a|b|c}
Followed by a justification paragraph.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verifier prompt instructs “Output exactly: WINNER: competitor-{a|b|c}” and also asks for a justification paragraph. “Output exactly” is ambiguous/contradictory here and can cause downstream parsers to fail if they expect a single-line output. Clarify that the first line must be exactly the WINNER line, followed by free-form justification.

Suggested change
Output exactly: WINNER: competitor-{a|b|c}
Followed by a justification paragraph.
On the first line, output exactly one of: WINNER: competitor-a, WINNER: competitor-b, or WINNER: competitor-c
After that first line, output a justification paragraph explaining your choice.

Copilot uses AI. Check for mistakes.
Comment on lines +300 to +301
subagent_type: "verifier",
model: "{verifier_model}",
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section mixes tool-call argument styles (earlier in the file Agent(...) is shown with subagent_type="executor" and run_in_background=true, but this new snippet uses subagent_type: "verifier", model: "...", commas, etc.). Since this is meant to be executable syntax, please standardize on one argument format within execute.md to avoid copy/paste errors.

Suggested change
subagent_type: "verifier",
model: "{verifier_model}",
subagent_type="verifier",
model="{verifier_model}",

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +118
TeamCreate(team_name: "probe-{timestamp}", description: "availability check")
# If probe fails, set TIER=1 and log reason
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tier 2 activation “probe TeamCreate (lightweight — create and immediately clean up)” example never shows the cleanup step. As written it will leave probe teams under ~/.claude/teams//~/.claude/tasks/ on every run. Add an explicit TeamDelete step (or a deterministic probe name + delete) so the probe is actually lightweight/idempotent.

Suggested change
TeamCreate(team_name: "probe-{timestamp}", description: "availability check")
# If probe fails, set TIER=1 and log reason
probe_name = "probe-tier2-activation"
TeamCreate(team_name: probe_name, description: "availability check")
TeamDelete(team_name: probe_name)
# If probe fails at any step, set TIER=1 and log reason

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +191
// (Optional -- critical tasks only) Teammate C -- defensive approach
Spawn teammate "competitor-c" with prompt:
"Implement {task_description} using approach: DEFENSIVE.
Maximize error handling, edge case coverage, and robustness.
Work in isolation until the review phase.
Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}.
Success criteria: {criteria from plan}.
When done, commit your work and report RESULT: PASS or RESULT: FAIL."
Model: {executor_model}
```

**Step 3 -- Adversarial critique via SendMessage:**
After all teammates complete, each reviews the others' implementations:

```
SendMessage({
type: "message",
recipient: "competitor-b",
content: "Review competitor-a's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.",
summary: "Requesting adversarial review of competitor-a"
})

SendMessage({
type: "message",
recipient: "competitor-a",
content: "Review competitor-b's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.",
summary: "Requesting adversarial review of competitor-b"
})
```
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern 1 allows spawning an optional competitor-c for critical tasks, but the SendMessage critique examples only request reviews between competitor-a and competitor-b. If competitor-c participates, include critique exchanges involving competitor-c (and pass those critiques into the verifier prompt), or state that the debate pattern is strictly 2-way.

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +183
```
SendMessage({
type: "message",
recipient: "competitor-b",
content: "Review competitor-a's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.",
summary: "Requesting adversarial review of competitor-a"
})
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Tier 2 patterns use the SendMessage({ type, recipient, content, summary }) schema, but docs/spec/agent-teams-research.md documents a newer v2.1.75+ schema using to/message/summary and calls out a breaking change. To avoid shipping “executable” examples that may be wrong depending on runtime version, please reconcile the repo docs (pick one schema + version guard, or note both with guidance on which to use).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants