-
Notifications
You must be signed in to change notification settings - Fork 198
docs: improve test agent autonomy and orchestration guidance #11561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds comprehensive guidance for autonomous testing without requiring "babysitting": **New Sections Added**: - Autonomy and Task Completion Principles - Complete all sub-tasks before reporting - Understanding scope and planning - Making autonomous decisions - Execution discipline with TodoWrite tracking - Anti-Patterns to Avoid - Don't ask permission for obvious next steps - Don't provide mid-stream progress reports - Don't stop when encountering issues - Don't ask what to test or require clarification on standard procedures - Example: Autonomous Testing Session - Real-world example based on HyperEVM testing - Shows correct vs incorrect autonomous execution - Demonstrates TodoWrite usage for task tracking **Updated Workflow**: - Emphasizes parsing full scope before starting - TodoWrite for tracking progress (not reporting) - Execute ALL tests before reporting - Report only once when entire scope is complete **Context**: Based on release v1.993.0 testing session where guidance was needed to complete all requested sub-tasks. This update ensures future test sessions run autonomously without requiring intermediate direction. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds comprehensive guidance on using subagents for orchestration during complex multi-feature testing, especially for release PRs. **New Section: Orchestration with Subagents** - When to use subagents vs direct execution - Orchestration pattern (parse, break down, launch, monitor, aggregate, report) - Benefits of subagent approach (parallel execution, specialized expertise, better context) - Example orchestration for multi-PR release testing - Guidance on crafting subagent task prompts **Updated Examples** - Example 1: Direct execution for single-feature testing (swap slippage UI) - Example 2: Orchestration approach for release PR testing (v1.993.0) - Shows domain breakdown (Tron, assets, HyperEVM, Thor/Maya, Ledger) - Demonstrates parallel subagent launches - Shows aggregation and comprehensive reporting **Updated Anti-Patterns** - Direct execution anti-patterns (asking permission, mid-stream reports) - Orchestration anti-patterns (sequential launches, partial reporting) **Key Principle**: - Direct execution: Simple, focused tests on single features - Orchestration: Complex, multi-domain release testing with parallel subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughExtensive documentation updates to the test agent guidance file, adding new sections on autonomy principles, scope understanding, autonomous decision-making, subagent orchestration patterns, execution discipline, anti-pattern examples, and refined workflow instructions for comprehensive test agent behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.claude/commands/test-agent.md (1)
109-109: Add language specifiers to fenced code blocks for consistency.Several code blocks lack language specifiers. While readability isn't materially affected, adding specifiers improves Markdown linting compliance. For pseudo-code examples, use a generic language like
plaintextorbashas appropriate.🔎 Example fixes
-Example Orchestration: -``` +Example Orchestration: +```plaintext-[Launching subagent 1: Tron Testing] -Prompt: "Test Tron TX parsing fixes... +[Launching subagent 1: Tron Testing] +```plaintext +Prompt: "Test Tron TX parsing fixes... +```Also applies to: 472-472, 498-498, 515-515, 537-537, 546-546
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
.claude/commands/test-agent.md
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10323
File: src/pages/RFOX/components/Stake/components/StakeSummary.tsx:112-114
Timestamp: 2025-08-22T13:00:44.879Z
Learning: NeOMakinG prefers to keep PR changes minimal and focused on the core objectives, avoiding cosmetic or defensive code improvements that aren't directly related to the PR scope, even when they would improve robustness.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10128
File: .cursor/rules/error-handling.mdc:266-274
Timestamp: 2025-07-29T10:35:22.059Z
Learning: NeOMakinG prefers less nitpicky suggestions on documentation and best practices files, finding overly detailed suggestions on minor implementation details (like console.error vs logger.error) too granular for cursor rules documentation.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10380
File: src/pages/Dashboard/components/AccountList/AccountTable.tsx:60-0
Timestamp: 2025-09-02T08:34:08.157Z
Learning: NeOMakinG prefers code review comments to focus only on actual PR changes, not pre-existing code issues, unless there are critical security or correctness concerns directly related to the new functionality.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10234
File: src/components/MultiHopTrade/hooks/useGetTradeQuotes/hooks/useTrackTradeQuotes.ts:42-86
Timestamp: 2025-08-08T11:41:22.794Z
Learning: NeOMakinG prefers not to include refactors in move-only PRs; such suggestions should be deferred to follow-up issues instead of being applied within the same PR.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10380
File: src/components/Table/Table.theme.ts:177-180
Timestamp: 2025-09-02T12:38:46.940Z
Learning: NeOMakinG prefers to defer technical debt and CSS correctness issues (like improper hover selectors) to follow-up PRs when the current PR is already large and focused on major feature implementation, even when the issues are valid from a usability/technical perspective.
📚 Learning: 2025-11-24T21:20:04.979Z
Learnt from: CR
Repo: shapeshift/web PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T21:20:04.979Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Write tests for critical business logic
Applied to files:
.claude/commands/test-agent.md
📚 Learning: 2025-11-24T21:20:04.979Z
Learnt from: CR
Repo: shapeshift/web PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T21:20:04.979Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Use descriptive test names that explain behavior
Applied to files:
.claude/commands/test-agent.md
🪛 markdownlint-cli2 (0.18.1)
.claude/commands/test-agent.md
109-109: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
472-472: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
498-498: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
515-515: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
537-537: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
546-546: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (3)
.claude/commands/test-agent.md (3)
18-175: Excellent clarity on autonomous operation and orchestration patterns.The new sections on autonomy principles, subagent orchestration, and anti-patterns are well-structured and provide concrete, actionable guidance. The Do/Don't patterns and task breakdown examples effectively communicate expectations for agent behavior without ambiguity.
198-208: Workflow steps align well with autonomy principles.The updated workflow section reinforces the "parse full scope" → "execute all" → "report once" model introduced in the autonomy section. The progression is logical and the emphasis on completing entire scope before reporting is consistent.
441-552: Examples effectively illustrate autonomous and orchestrated testing workflows.The Direct Execution and Orchestration examples are concrete, realistic, and clearly demonstrate the expected behavior patterns. The subagent prompts and result aggregation format provide practical templates for implementation.
0xApotheosis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

Summary
Improves the test agent documentation to reduce the need for "babysitting" during testing sessions and adds comprehensive guidance on using subagents for orchestration during complex multi-feature testing.
Problem
During the release v1.993.0 testing session, the test agent required intermediate guidance and direction to complete all requested sub-tasks. The agent would stop mid-execution to ask for permission or clarification on obvious next steps, rather than completing the entire scope autonomously.
Solution
This PR adds two major improvements to
.claude/commands/test-agent.md:1. Autonomy and Task Completion Principles
New Sections:
Anti-Patterns to Avoid:
2. Orchestration with Subagents
New Section: Guidance on when and how to use subagents for orchestration:
When to Use Subagents:
Orchestration Pattern:
Benefits:
3. Comprehensive Examples
Example 1: Direct Execution (Single Feature)
Example 2: Orchestration (Release PR Testing)
Updated Anti-Patterns:
Key Principles Established
Benefits
Testing
N/A - Documentation-only change. Future testing sessions will validate the improved guidance.
Related
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.