Skip to content

Conversation

@NeOMakinG
Copy link
Collaborator

@NeOMakinG NeOMakinG commented Jan 2, 2026

Summary

Improves the test agent documentation to reduce the need for "babysitting" during testing sessions and adds comprehensive guidance on using subagents for orchestration during complex multi-feature testing.

Problem

During the release v1.993.0 testing session, the test agent required intermediate guidance and direction to complete all requested sub-tasks. The agent would stop mid-execution to ask for permission or clarification on obvious next steps, rather than completing the entire scope autonomously.

Solution

This PR adds two major improvements to .claude/commands/test-agent.md:

1. Autonomy and Task Completion Principles

New Sections:

  • Complete All Sub-Tasks Before Reporting: Clear expectations for executing ALL requested tasks before reporting
  • Understanding Scope: How to parse multi-part requests and plan execution
  • Making Autonomous Decisions: What to decide independently vs when to ask
  • Execution Discipline: Using TodoWrite for tracking without mid-stream reporting

Anti-Patterns to Avoid:

  • 🚫 Asking permission for obvious next steps
  • 🚫 Mid-stream progress reports
  • 🚫 Stopping when encountering issues
  • 🚫 Asking what to test or requiring clarification on standard procedures

2. Orchestration with Subagents

New Section: Guidance on when and how to use subagents for orchestration:

When to Use Subagents:

  • Testing release PRs with multiple features
  • Comprehensive feature validation across multiple areas
  • Long-running test sessions that may hit context limits
  • Complex testing requiring specialized expertise

Orchestration Pattern:

  1. Parse the full testing scope from user request
  2. Break down into logical testing domains (e.g., swaps, sends, UI, performance)
  3. Launch specialized subagents for each domain using Task tool
  4. Monitor subagent progress and results
  5. Aggregate findings into comprehensive report
  6. Post final report when all subagents complete

Benefits:

  • Parallel test execution for faster results
  • Specialized expertise per domain (frontend, security, performance)
  • Better context management (each subagent has fresh context)
  • Clearer separation of concerns
  • Easier to debug individual test domains

3. Comprehensive Examples

Example 1: Direct Execution (Single Feature)

  • Testing swap slippage UI
  • Shows focused, single-feature testing approach

Example 2: Orchestration (Release PR Testing)

  • Testing release v1.993.0 with 5 PRs
  • Demonstrates domain breakdown (Tron, assets, HyperEVM, Thor/Maya, Ledger)
  • Shows parallel subagent launches with specific prompts
  • Demonstrates result aggregation and comprehensive reporting

Updated Anti-Patterns:

  • Direct execution anti-patterns (for single-feature testing)
  • Orchestration anti-patterns (for multi-PR release testing)

Key Principles Established

  1. Direct Execution: Simple, focused tests on single features - execute directly
  2. Orchestration: Complex, multi-domain release testing - use parallel subagents
  3. No Mid-Stream Check-Ins: Complete entire scope before reporting
  4. Autonomous Decision-Making: Choose test parameters independently
  5. TodoWrite for Tracking: Track progress internally, not for user updates

Benefits

  1. Reduced Babysitting: Test agent operates autonomously without requiring intermediate direction
  2. Better Scalability: Subagent orchestration enables parallel testing of complex releases
  3. Clearer Patterns: Examples show both direct execution and orchestration approaches
  4. Improved Context Management: Subagents prevent context limits on long test sessions
  5. Higher Quality Testing: Specialized subagents for different domains (frontend, security, etc.)

Testing

N/A - Documentation-only change. Future testing sessions will validate the improved guidance.

Related

  • Based on feedback from release v1.993.0 testing session (PR chore: release v1.993.0 #11548)
  • Complements existing test scenario bank and testing workflows

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Summary by CodeRabbit

  • Documentation
    • Updated test agent documentation with comprehensive guidance on autonomous task execution and decision-making workflows.
    • Added detailed examples and orchestration patterns for improved clarity on agent capabilities.
    • Included CLI command reference for easier agent usage.

✏️ Tip: You can customize this high-level summary in your review settings.

NeOMakinG and others added 2 commits January 2, 2026 12:19
Adds comprehensive guidance for autonomous testing without requiring "babysitting":

**New Sections Added**:
- Autonomy and Task Completion Principles
  - Complete all sub-tasks before reporting
  - Understanding scope and planning
  - Making autonomous decisions
  - Execution discipline with TodoWrite tracking

- Anti-Patterns to Avoid
  - Don't ask permission for obvious next steps
  - Don't provide mid-stream progress reports
  - Don't stop when encountering issues
  - Don't ask what to test or require clarification on standard procedures

- Example: Autonomous Testing Session
  - Real-world example based on HyperEVM testing
  - Shows correct vs incorrect autonomous execution
  - Demonstrates TodoWrite usage for task tracking

**Updated Workflow**:
- Emphasizes parsing full scope before starting
- TodoWrite for tracking progress (not reporting)
- Execute ALL tests before reporting
- Report only once when entire scope is complete

**Context**:
Based on release v1.993.0 testing session where guidance was needed to complete
all requested sub-tasks. This update ensures future test sessions run autonomously
without requiring intermediate direction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds comprehensive guidance on using subagents for orchestration during complex
multi-feature testing, especially for release PRs.

**New Section: Orchestration with Subagents**
- When to use subagents vs direct execution
- Orchestration pattern (parse, break down, launch, monitor, aggregate, report)
- Benefits of subagent approach (parallel execution, specialized expertise, better context)
- Example orchestration for multi-PR release testing
- Guidance on crafting subagent task prompts

**Updated Examples**
- Example 1: Direct execution for single-feature testing (swap slippage UI)
- Example 2: Orchestration approach for release PR testing (v1.993.0)
  - Shows domain breakdown (Tron, assets, HyperEVM, Thor/Maya, Ledger)
  - Demonstrates parallel subagent launches
  - Shows aggregation and comprehensive reporting

**Updated Anti-Patterns**
- Direct execution anti-patterns (asking permission, mid-stream reports)
- Orchestration anti-patterns (sequential launches, partial reporting)

**Key Principle**:
- Direct execution: Simple, focused tests on single features
- Orchestration: Complex, multi-domain release testing with parallel subagents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@NeOMakinG NeOMakinG requested a review from a team as a code owner January 2, 2026 11:22
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 2, 2026

📝 Walkthrough

Walkthrough

Extensive documentation updates to the test agent guidance file, adding new sections on autonomy principles, scope understanding, autonomous decision-making, subagent orchestration patterns, execution discipline, anti-pattern examples, and refined workflow instructions for comprehensive test agent behavior.

Changes

Cohort / File(s) Summary
Test Agent Documentation
\.claude/commands/test-agent.md
Added comprehensive guidance sections: Autonomy and Task Completion Principles with explicit Do/Dont guidelines; Understanding Scope workflow (parsing, task lists, planning, dependencies); Making Autonomous Decisions guidelines; Execution Discipline with subagent orchestration patterns; Anti-Patterns section with Bad/Good examples; Extended Available Tools section; Updated Test Scenario Bank and running workflow to emphasize parse-plan-execute-validate-report cadence; Large example Autonomous Testing Session with Direct Execution and Orchestration scenarios; new How to Use This Agent CLI commands section.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hops with glee through test commands new,
Autonomy and scope — now crystal true,
Subagents coordinating with finesse,
No more mid-stream chaos, just success!
This rabbit grins: orchestration's blessed!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'docs: improve test agent autonomy and orchestration guidance' accurately reflects the main changes: extensive additions to test agent documentation focusing on autonomy principles and orchestration with subagents.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch improve-test-agent-autonomy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.claude/commands/test-agent.md (1)

109-109: Add language specifiers to fenced code blocks for consistency.

Several code blocks lack language specifiers. While readability isn't materially affected, adding specifiers improves Markdown linting compliance. For pseudo-code examples, use a generic language like plaintext or bash as appropriate.

🔎 Example fixes
-Example Orchestration:
-```
+Example Orchestration:
+```plaintext
-[Launching subagent 1: Tron Testing]
-Prompt: "Test Tron TX parsing fixes...
+[Launching subagent 1: Tron Testing]
+```plaintext
+Prompt: "Test Tron TX parsing fixes...
+```

Also applies to: 472-472, 498-498, 515-515, 537-537, 546-546

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec9533 and 5c6de4e.

📒 Files selected for processing (1)
  • .claude/commands/test-agent.md
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10323
File: src/pages/RFOX/components/Stake/components/StakeSummary.tsx:112-114
Timestamp: 2025-08-22T13:00:44.879Z
Learning: NeOMakinG prefers to keep PR changes minimal and focused on the core objectives, avoiding cosmetic or defensive code improvements that aren't directly related to the PR scope, even when they would improve robustness.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10128
File: .cursor/rules/error-handling.mdc:266-274
Timestamp: 2025-07-29T10:35:22.059Z
Learning: NeOMakinG prefers less nitpicky suggestions on documentation and best practices files, finding overly detailed suggestions on minor implementation details (like console.error vs logger.error) too granular for cursor rules documentation.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10380
File: src/pages/Dashboard/components/AccountList/AccountTable.tsx:60-0
Timestamp: 2025-09-02T08:34:08.157Z
Learning: NeOMakinG prefers code review comments to focus only on actual PR changes, not pre-existing code issues, unless there are critical security or correctness concerns directly related to the new functionality.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10234
File: src/components/MultiHopTrade/hooks/useGetTradeQuotes/hooks/useTrackTradeQuotes.ts:42-86
Timestamp: 2025-08-08T11:41:22.794Z
Learning: NeOMakinG prefers not to include refactors in move-only PRs; such suggestions should be deferred to follow-up issues instead of being applied within the same PR.
Learnt from: NeOMakinG
Repo: shapeshift/web PR: 10380
File: src/components/Table/Table.theme.ts:177-180
Timestamp: 2025-09-02T12:38:46.940Z
Learning: NeOMakinG prefers to defer technical debt and CSS correctness issues (like improper hover selectors) to follow-up PRs when the current PR is already large and focused on major feature implementation, even when the issues are valid from a usability/technical perspective.
📚 Learning: 2025-11-24T21:20:04.979Z
Learnt from: CR
Repo: shapeshift/web PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T21:20:04.979Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Write tests for critical business logic

Applied to files:

  • .claude/commands/test-agent.md
📚 Learning: 2025-11-24T21:20:04.979Z
Learnt from: CR
Repo: shapeshift/web PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T21:20:04.979Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Use descriptive test names that explain behavior

Applied to files:

  • .claude/commands/test-agent.md
🪛 markdownlint-cli2 (0.18.1)
.claude/commands/test-agent.md

109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


472-472: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


498-498: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


515-515: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


537-537: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


546-546: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (3)
.claude/commands/test-agent.md (3)

18-175: Excellent clarity on autonomous operation and orchestration patterns.

The new sections on autonomy principles, subagent orchestration, and anti-patterns are well-structured and provide concrete, actionable guidance. The Do/Don't patterns and task breakdown examples effectively communicate expectations for agent behavior without ambiguity.


198-208: Workflow steps align well with autonomy principles.

The updated workflow section reinforces the "parse full scope" → "execute all" → "report once" model introduced in the autonomy section. The progression is logical and the emphasis on completing entire scope before reporting is consistent.


441-552: Examples effectively illustrate autonomous and orchestrated testing workflows.

The Direct Execution and Orchestration examples are concrete, realistic, and clearly demonstrate the expected behavior patterns. The subagent prompts and result aggregation format provide practical templates for implementation.

Copy link
Member

@0xApotheosis 0xApotheosis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks sane. Get the Claude skill a run as a sanity check and it works as expected. Get in!

Image

@0xApotheosis 0xApotheosis merged commit ac9065a into develop Jan 5, 2026
4 checks passed
@0xApotheosis 0xApotheosis deleted the improve-test-agent-autonomy branch January 5, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants