feat: Claude Code native skill mode for agent-loop orchestration#283
Conversation
…216) Adds a Claude Code skill that lets users run the multi-agent review loop inside an interactive Claude Code session without calling `claude -p` for Claude turns. External agents (Codex, Gemini) are still invoked as subprocesses. New files: - helpers/validate_response.py: validates structured protocol responses using the existing library entry points (_validate_plan_review_response, _validate_review_response, _validate_coder_followup_response, etc.) - helpers/state_manager.py: manages local session state and resume via _resume_plan_round/_resume_pr_round; accepts --reviewers and --head-sha/--pr as required by the existing resume APIs - helpers/run_external.py: runs codex/gemini CLIs; --dry-run writes a canned approved plan_review stub for testing - helpers/gh_ops.py: wraps gh CLI for issue/PR comment operations - helpers/demo_loop.py: standalone dry-run demo (no live Claude/GitHub calls) - SKILL.md: step-by-step skill orchestration instructions for Claude - docs/skill_mode.md: architecture table, billing guidance, limitations - tests/test_skill_helpers.py: unit tests for all helper CLIs - tests/test_skill_loop.py: subprocess integration test for demo_loop.py Fixes #216 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implementation: Claude Code native skill mode for agent-loop orchestrationPR #283 implements the approved plan for issue #216. What was builtNew files (all in the feature branch
Acceptance criteria status
Key design decisions
Tests:
-- Anthropic Claude |
|
Review verdict: Blocking The new skill scaffold is additive and the focused helper tests pass locally with python3, but the prototype is not merge-ready because skill-posted comments are not written with the metadata that its resume path requires, and plan-revision validation is weaker than the existing orchestrator protocol. Blocking issues
-- OpenAI Codex |
|
Review verdict: Approved The PR successfully implements the Claude Code native skill mode with robust CLI helpers, clear orchestration instructions in SKILL.md, and comprehensive test coverage. It correctly preserves the existing headless path while providing a session-aware orchestration model for interactive Claude users. -- Google Gemini |
…_revision validation Addresses Codex blocking item-1 from PR #283 round 1 review: 1. Add state_manager attach-metadata subcommand that builds PostedRoundMetadata and calls _attach_round_metadata before posting, so skill-posted comments carry AGENT_LOOP_META and can be found by build-resume's _resume_plan_round / _resume_pr_round. Update demo_loop.py and SKILL.md to use this step. 2. Fix validate_response.py plan_revision validation to check that prior_plan_item_dispositions only reference item IDs present in the context ledger, matching the check in the headless orchestrator's _validate_plan_revision_response (orchestrator.py:1281). Add regression tests. 3. New tests: test_attach_metadata_produces_valid_agent_loop_meta and test_attach_metadata_reviewer_found_by_resume verify that skill-posted rounds are found by _resume_plan_round; test_plan_revision_with_unknown_prior_item_rejected and test_plan_revision_with_known_items_accepted cover the ledger check. Refs #216 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coder follow-upAddressed both blocking issues from Codex round 1: added AGENT_LOOP_META attachment to skill-posted comments via state_manager attach-metadata, and strengthened plan_revision validation to reject unknown prior item IDs against the context ledger. Addressed items
Remaining items
Tests run
-- Anthropic Claude |
|
Review verdict: Approved Approved. The round-2 changes add metadata attachment for skill-posted comments and bring plan-revision validation in line with the existing orchestrator checks. I locally ran the focused skill tests with python3; the PR checks are still pending per the supplied metadata. Prior unresolved item dispositions
-- OpenAI Codex |
|
Review verdict: Approved The Claude Code native skill mode implementation is complete and resolves the prior blocking concerns. The new helper CLIs (state_manager, validate_response, run_external, gh_ops) provide a robust orchestration path that maintains structured protocol integrity and session resume compatibility with the headless CLI. Unit and integration tests confirm that skill-posted comments carry the necessary metadata for reconstruction and that plan-revision validation correctly enforces prior item ID checks. Prior unresolved item dispositions
-- Google Gemini |
|
GitHub PR checks are still pending for PR #283. Reviewer approvals do not make this PR merge-ready until GitHub PR checks are green, or the PR explicitly states that only a local subset passed.
-- coding-review-agent-loop |
Summary
Fixes #216
Adds a Claude Code skill that runs the multi-agent review loop inside an interactive Claude Code session, eliminating the need for
claude -psubprocess invocations for Claude turns. This addresses the billing concern raised in #216 (Anthropic Agent SDK credit pool separation effective June 15, 2026).helpers/validate_response.py: validates structured protocol responses via existing library entry points (_validate_plan_review_response,_validate_review_response, etc.), accepting an optional--context-fileJSON with reviewer identity, prior items, and human-requirements contexthelpers/state_manager.py: manages local session state outside git checkouts (~/.local/state/coding-review-agent-loop/skill-sessions/);build-resumeaccepts--reviewersand--head-sha/--prso it can correctly call_resume_plan_round(comments, configured_reviewers=...)and_resume_pr_round(comments, head_sha=..., configured_reviewers=...)with the required inputshelpers/run_external.py: invokes Codex/Gemini CLI subprocesses;--dry-runwrites a canned approvedplan_reviewstub for offline testinghelpers/gh_ops.py: wrapsghCLI for issue/PR comment operations with--dry-runsupporthelpers/demo_loop.py: standalone script demonstrating a minimal Claude-host + Codex-dry-run loop without any live API callsSKILL.md: step-by-step orchestration instructions for Claude inside a Claude Code sessiondocs/skill_mode.md: architecture table, billing guidance (non-prescriptive), limitations, install instructionstests/test_skill_helpers.py: unit tests covering validate_response (valid inputs accepted, missing markers rejected, unknown prior item IDs rejected), state_manager round-trips, run_external dry-run stubtests/test_skill_loop.py: subprocess integration test asserting demo_loop exits 0, stdout contains bothvalidation passed:lines, and local session JSON recordslast_completed_step=post_reviewThe existing headless
agent-loopCLI path is completely unchanged.Test plan
python -m pytest tests/test_skill_helpers.py tests/test_skill_loop.py -v— 8 passedpython -m pytest tests/test_agent_loop.py -q— 716 passed (no regressions)python -m helpers.demo_loop --issue 123 --repo demo/reporuns end-to-end with both validation steps passingArchitecture notes
~/.local/state/coding-review-agent-loop/skill-sessions/{repo-slug}/{issue}.json— never dirties a git checkoutstate_manager build-resume --reviewersis required (not optional) per plan item-8 resolution: the existing_resume_plan_roundand_resume_pr_roundAPIs both requireconfigured_reviewers--head-shaor--pris likewise required; the helper fetchesheadRefOidviagh pr viewwhen only--pris providedAGENT_LOOP_METAmarkers written by the skill are identical to the headless CLI format, enabling mixed-mode resume🤖 Generated with Claude Code