Skip to content

Commit 135c5e3

Browse files
authored
Merge pull request #195 from maystudios/worktree-agent-a63098aa
feat: add executable Tier 2 Agent Teams patterns
2 parents 5674fae + 8460d8a commit 135c5e3

2 files changed

Lines changed: 409 additions & 13 deletions

File tree

templates/skills/maxsim-batch/SKILL.md

Lines changed: 320 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ When all agents complete:
8787

8888
Agent Teams (available since Claude Code v2.1.32, Feb 2026) enable inter-agent communication for workflows that require debate, cross-checking, or collaborative problem-solving. MaxsimCLI sets `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` during install and registers `TeammateIdle` and `TaskCompleted` quality-gate hooks.
8989

90-
**Current status:** Infrastructure is in place (env var, hooks). Workflow templates that invoke `TeamCreate`/`SendMessage` for Tier 2 patterns (competitive implementation, multi-reviewer code review, collaborative debugging) are planned but not yet implemented. All workflows currently use Tier 1 subagents. See PROJECT.md §7.2 for the full specification.
90+
**Current status:** Infrastructure is in place (env var, hooks). Tier 2 workflow patterns (competitive implementation, multi-reviewer code review, collaborative debugging) are defined below with executable `TeamCreate`/`SendMessage` call syntax. All workflows gracefully degrade to Tier 1 subagents when Agent Teams are unavailable. See PROJECT.md §7.2 for the authoritative specification.
9191

9292
### Tier Selection Logic
9393

@@ -103,19 +103,332 @@ MaxsimCLI chooses the tier automatically based on the workflow:
103103
| Collaborative debugging | Tier 2 (Agent Teams) | Hypotheses need adversarial testing |
104104
| Architecture exploration | Tier 2 (Agent Teams) | Requires discussion |
105105

106-
**When Tier 2 is ready, it will be used for:**
107-
- Competitive implementation with adversarial debate
108-
- Multi-dimensional code review (security + performance + test coverage)
109-
- Collaborative debugging with competing hypotheses
110-
- Cross-layer feature work (frontend + backend + tests)
106+
### Tier 2 Activation Check
107+
108+
Before using any Tier 2 pattern, verify availability:
109+
110+
```bash
111+
# 1. Check env var
112+
[ "$CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS" = "1" ] || { echo "Tier 2 unavailable: env var not set"; TIER=1; }
113+
```
114+
115+
```
116+
# 2. Probe TeamCreate (lightweight -- create and immediately clean up)
117+
TeamCreate(team_name: "probe-{timestamp}", description: "availability check")
118+
# If probe fails, set TIER=1 and log reason
119+
```
120+
121+
If either check fails, skip to the Graceful Degradation section below. Do not attempt Tier 2 patterns.
122+
123+
---
124+
125+
### Pattern 1 -- Competitive Implementation (Debate)
126+
127+
**Use when:** A task is marked `critical` and `config.execution.competitive_enabled` is `true`. Multiple agents implement the same task independently, then adversarially critique each other's work before a neutral verifier selects the winner.
128+
129+
**Flow:** `TeamCreate` --> spawn 2-3 competitors --> each implements independently --> `SendMessage` critiques --> fresh verifier judges --> winner selected.
130+
131+
**Step 1 -- Create the competition team:**
132+
```
133+
TeamCreate(
134+
team_name: "competition-phase-{N}-task-{id}",
135+
description: "Competitive implementation: {task_description}. Each teammate implements independently, then reviews the others adversarially."
136+
)
137+
```
138+
139+
**Step 2 -- Spawn competing teammates:**
140+
Spawn 2 teammates minimum, 3 for tasks labeled `critical`. Each gets a distinct approach directive and the full task context.
141+
142+
```
143+
// Teammate A -- conservative approach
144+
Spawn teammate "competitor-a" with prompt:
145+
"Implement {task_description} using approach: CONSERVATIVE.
146+
Prefer existing patterns, minimal new abstractions, conventional solutions.
147+
Work in isolation until the review phase.
148+
Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}.
149+
Success criteria: {criteria from plan}.
150+
When done, commit your work and report RESULT: PASS or RESULT: FAIL."
151+
Model: {executor_model}
152+
153+
// Teammate B -- innovative approach
154+
Spawn teammate "competitor-b" with prompt:
155+
"Implement {task_description} using approach: INNOVATIVE.
156+
Optimize for performance and elegance, explore novel patterns where justified.
157+
Work in isolation until the review phase.
158+
Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}.
159+
Success criteria: {criteria from plan}.
160+
When done, commit your work and report RESULT: PASS or RESULT: FAIL."
161+
Model: {executor_model}
162+
163+
// (Optional -- critical tasks only) Teammate C -- defensive approach
164+
Spawn teammate "competitor-c" with prompt:
165+
"Implement {task_description} using approach: DEFENSIVE.
166+
Maximize error handling, edge case coverage, and robustness.
167+
Work in isolation until the review phase.
168+
Phase: {N}, Plan: {id}, Issue: #{phase_issue_number}.
169+
Success criteria: {criteria from plan}.
170+
When done, commit your work and report RESULT: PASS or RESULT: FAIL."
171+
Model: {executor_model}
172+
```
173+
174+
**Step 3 -- Adversarial critique via SendMessage:**
175+
After all teammates complete, each reviews the others' implementations:
176+
177+
```
178+
SendMessage({
179+
type: "message",
180+
recipient: "competitor-b",
181+
content: "Review competitor-a's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.",
182+
summary: "Requesting adversarial review of competitor-a"
183+
})
184+
185+
SendMessage({
186+
type: "message",
187+
recipient: "competitor-a",
188+
content: "Review competitor-b's implementation. Be adversarial: (1) correctness issues, (2) missing edge cases, (3) maintainability concerns. Find real problems, not style preferences.",
189+
summary: "Requesting adversarial review of competitor-b"
190+
})
191+
```
192+
193+
**Step 4 -- Fresh verifier selects winner:**
194+
Spawn a verifier agent (NOT a team member) to evaluate both implementations and both critiques:
195+
196+
```
197+
Agent(
198+
subagent_type: "verifier",
199+
model: "{verifier_model}",
200+
prompt: "
201+
Judge a competitive implementation. Agents implemented the same task independently, then critiqued each other.
202+
203+
Implementations: competitor-a (CONSERVATIVE), competitor-b (INNOVATIVE)
204+
Critiques: {critique summaries}
205+
206+
Selection criteria (priority order):
207+
1. Correctness -- satisfies all success criteria
208+
2. Test coverage -- edge cases tested
209+
3. Code quality -- readability, codebase consistency
210+
4. Simplicity -- fewer abstractions when correctness is equal
211+
212+
Output: WINNER: competitor-{a|b|c}
213+
Followed by justification.
214+
"
215+
)
216+
```
217+
218+
Discard the losing worktree branch. Merge the winner via the standard flow.
219+
220+
**Tier 1 fallback:** Spawn 2 independent executor subagents via `Agent(isolation: "worktree", run_in_background: true)` with different approach prompts. After both complete, spawn a verifier to compare. No inter-agent messaging -- the verifier reads both outputs directly.
221+
222+
---
223+
224+
### Pattern 2 -- Multi-Reviewer Code Review (Cross-Checking)
225+
226+
**Use when:** A PR or implementation requires review from multiple specialist perspectives that must challenge each other's findings.
227+
228+
**Flow:** `TeamCreate` --> spawn 3 specialist reviewers --> each reviews independently --> `SendMessage` to share findings --> each reviewer challenges other reviewers' findings --> coordinator synthesizes unified report.
229+
230+
**Step 1 -- Create the review team:**
231+
```
232+
TeamCreate(
233+
team_name: "review-phase-{N}-task-{id}",
234+
description: "Multi-dimensional code review: {description}. Reviewers share and cross-check findings."
235+
)
236+
```
237+
238+
**Step 2 -- Spawn specialist reviewers:**
239+
```
240+
// Security reviewer
241+
Spawn teammate "reviewer-security" with prompt:
242+
"Review the implementation for security concerns: authentication, authorization, input validation, injection risks, token handling, data exposure.
243+
Files to review: {file list or PR reference}.
244+
Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number).
245+
When done, send your findings to reviewer-performance and reviewer-tests."
246+
Model: {executor_model}
247+
248+
// Performance reviewer
249+
Spawn teammate "reviewer-performance" with prompt:
250+
"Review the implementation for performance concerns: N+1 queries, missing indexes, unnecessary allocations, caching opportunities, algorithmic complexity.
251+
Files to review: {file list or PR reference}.
252+
Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number).
253+
When done, send your findings to reviewer-security and reviewer-tests."
254+
Model: {executor_model}
255+
256+
// Test coverage reviewer
257+
Spawn teammate "reviewer-tests" with prompt:
258+
"Review the implementation for test coverage: missing edge cases, untested error paths, assertion quality, flaky test patterns, coverage gaps.
259+
Files to review: {file list or PR reference}.
260+
Report findings as: CRITICAL / WARNING / INFO with evidence (file path + line number).
261+
When done, send your findings to reviewer-security and reviewer-performance."
262+
Model: {executor_model}
263+
```
264+
265+
**Step 3 -- Share and cross-check findings:**
266+
After each reviewer completes their initial review, they share findings with the others via `SendMessage`:
267+
268+
```
269+
// Each reviewer sends findings to the other two
270+
SendMessage({
271+
type: "message",
272+
recipient: "reviewer-performance",
273+
content: "My security findings: {findings list}. Do any of these conflict with your performance findings? Are there performance optimizations that would introduce security risks?",
274+
summary: "Security reviewer sharing findings for cross-check"
275+
})
276+
```
277+
278+
Each reviewer then challenges the others' findings:
279+
```
280+
SendMessage({
281+
type: "message",
282+
recipient: "reviewer-security",
283+
content: "Reviewing your security findings: Finding #2 (SQL injection risk in query builder) -- I confirmed this also causes a performance issue due to string concatenation in a hot path. Finding #4 (token expiry) -- this is a false positive; the token refresh middleware handles this case. Evidence: {file}:{line}.",
284+
summary: "Performance reviewer challenging security findings"
285+
})
286+
```
287+
288+
**Step 4 -- Coordinator synthesizes report:**
289+
The team lead (or a fresh agent) collects all findings and cross-check results, then produces a unified review:
290+
291+
```
292+
Agent(
293+
subagent_type: "verifier",
294+
model: "{verifier_model}",
295+
prompt: "
296+
Synthesize a unified code review from three specialist reviewers.
297+
298+
Security findings: {security reviewer's final findings}
299+
Performance findings: {performance reviewer's final findings}
300+
Test coverage findings: {test reviewer's final findings}
301+
Cross-check disputes: {list of challenged findings and resolutions}
302+
303+
Produce a single review report:
304+
- CRITICAL items (must fix before merge)
305+
- WARNING items (should fix, not blocking)
306+
- INFO items (suggestions)
307+
- Disputed findings and resolution
308+
"
309+
)
310+
```
311+
312+
Post the unified report as a GitHub comment on the relevant issue.
313+
314+
**Tier 1 fallback:** Spawn 3 independent reviewer subagents via `Agent(run_in_background: true)`. Each produces its own report. The orchestrator merges reports manually -- no cross-checking between reviewers. Less thorough but fully functional.
315+
316+
---
317+
318+
### Pattern 3 -- Collaborative Debugging (Adversarial Hypothesis Testing)
319+
320+
**Use when:** A bug's root cause is unclear and multiple hypotheses need to be tested simultaneously. Each investigator pursues a different theory and actively tries to disprove the others.
321+
322+
**Flow:** `TeamCreate` --> spawn 2-3 investigators --> each pursues a different hypothesis --> `SendMessage` to share evidence and challenge other hypotheses --> hypothesis that survives adversarial testing wins --> fix implemented by the confirmed investigator.
323+
324+
**Step 1 -- Create the investigation team:**
325+
```
326+
TeamCreate(
327+
team_name: "debug-phase-{N}-task-{id}",
328+
description: "Adversarial debugging: {bug description}. Investigators pursue competing hypotheses and challenge each other's evidence."
329+
)
330+
```
331+
332+
**Step 2 -- Spawn investigators with distinct hypotheses:**
333+
Derive hypotheses from the bug symptoms, error logs, and codebase analysis.
334+
335+
```
336+
// Investigator A -- hypothesis: race condition
337+
Spawn teammate "investigator-a" with prompt:
338+
"Bug: {bug description with symptoms and error output}.
339+
Your hypothesis: RACE CONDITION in {suspected component}.
340+
Investigate this hypothesis:
341+
1. Find evidence supporting or refuting it
342+
2. Write a reproducer test if possible
343+
3. If confirmed, draft a fix
344+
4. Share evidence with other investigators via SendMessage
345+
5. Actively challenge other investigators' hypotheses with counter-evidence"
346+
Model: {executor_model}
347+
348+
// Investigator B -- hypothesis: configuration error
349+
Spawn teammate "investigator-b" with prompt:
350+
"Bug: {bug description with symptoms and error output}.
351+
Your hypothesis: CONFIGURATION ERROR in {suspected component}.
352+
Investigate this hypothesis:
353+
1. Find evidence supporting or refuting it
354+
2. Write a reproducer test if possible
355+
3. If confirmed, draft a fix
356+
4. Share evidence with other investigators via SendMessage
357+
5. Actively challenge other investigators' hypotheses with counter-evidence"
358+
Model: {executor_model}
359+
360+
// Investigator C -- hypothesis: data corruption
361+
Spawn teammate "investigator-c" with prompt:
362+
"Bug: {bug description with symptoms and error output}.
363+
Your hypothesis: DATA CORRUPTION in {suspected component}.
364+
Investigate this hypothesis:
365+
1. Find evidence supporting or refuting it
366+
2. Write a reproducer test if possible
367+
3. If confirmed, draft a fix
368+
4. Share evidence with other investigators via SendMessage
369+
5. Actively challenge other investigators' hypotheses with counter-evidence"
370+
Model: {executor_model}
371+
```
372+
373+
**Step 3 -- Evidence sharing and adversarial challenges:**
374+
Investigators share findings and challenge each other via `SendMessage`:
375+
376+
```
377+
// Investigator A shares evidence
378+
SendMessage({
379+
type: "message",
380+
recipient: "investigator-b",
381+
content: "Evidence for race condition hypothesis: I found unsynchronized access to {resource} at {file}:{line}. The timing window is ~50ms under load. This contradicts your configuration hypothesis because the config values are correct -- the issue only manifests under concurrent access. Can you disprove this?",
382+
summary: "Investigator-a sharing race condition evidence, challenging config hypothesis"
383+
})
384+
385+
// Investigator B responds with counter-evidence
386+
SendMessage({
387+
type: "message",
388+
recipient: "investigator-a",
389+
content: "Your race condition evidence is plausible but I found that the same symptom occurs on single-threaded test runs. See: {test output}. This suggests the root cause is upstream of the concurrent access point. My config hypothesis: the timeout value at {file}:{line} defaults to 0 when the env var is missing.",
390+
summary: "Investigator-b providing counter-evidence to race condition hypothesis"
391+
})
392+
```
393+
394+
**Step 4 -- Resolution:**
395+
The team lead evaluates which hypothesis survived adversarial testing:
396+
397+
```
398+
Agent(
399+
subagent_type: "verifier",
400+
model: "{verifier_model}",
401+
prompt: "
402+
Evaluate competing debugging hypotheses.
403+
404+
Hypothesis A (race condition): {evidence summary, challenges received, responses}
405+
Hypothesis B (configuration): {evidence summary, challenges received, responses}
406+
Hypothesis C (data corruption): {evidence summary, challenges received, responses}
407+
408+
Determine:
409+
1. Which hypothesis best explains ALL symptoms?
410+
2. Which hypothesis survived adversarial challenge?
411+
3. Is the proposed fix correct and complete?
412+
413+
Output: CONFIRMED: investigator-{a|b|c} -- {hypothesis name}
414+
Followed by: evidence that confirms, evidence that was disproven, recommended fix.
415+
"
416+
)
417+
```
418+
419+
The confirmed investigator's fix is merged. Other worktree branches are discarded.
420+
421+
**Tier 1 fallback:** Spawn 2-3 independent debugging subagents via `Agent(isolation: "worktree", run_in_background: true)`. Each investigates a different hypothesis and reports findings. The orchestrator compares reports without inter-agent debate. Less adversarial but still tests multiple hypotheses in parallel.
422+
423+
---
111424

112425
### Graceful Degradation
113426

114427
If Agent Teams are unavailable (env var `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` not set, unsupported plan, or feature not yet stable), MaxsimCLI falls back to Tier 1 subagents for all workflows. Inform the user with this exact message:
115428

116429
> "Competitive mode: using Tier 1 subagents (Agent Teams not available or not required for this strategy). Each executor works independently; verifier selects the best result."
117430
118-
The user is informed but not blocked. All workflows remain fully functional via Tier 1.
431+
The user is informed but not blocked. All workflows remain fully functional via Tier 1. Each pattern above includes a specific Tier 1 fallback that preserves the core workflow (parallel execution + verifier selection) without inter-agent messaging.
119432

120433
## Limits
121434

0 commit comments

Comments
 (0)