Skip to content

feat: Claude Code native skill mode for agent-loop orchestration#283

Merged
wwind123 merged 2 commits into
mainfrom
feat/216-skill-mode
Jun 8, 2026
Merged

feat: Claude Code native skill mode for agent-loop orchestration#283
wwind123 merged 2 commits into
mainfrom
feat/216-skill-mode

Conversation

@wwind123

@wwind123 wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #216

Adds a Claude Code skill that runs the multi-agent review loop inside an interactive Claude Code session, eliminating the need for claude -p subprocess invocations for Claude turns. This addresses the billing concern raised in #216 (Anthropic Agent SDK credit pool separation effective June 15, 2026).

  • helpers/validate_response.py: validates structured protocol responses via existing library entry points (_validate_plan_review_response, _validate_review_response, etc.), accepting an optional --context-file JSON with reviewer identity, prior items, and human-requirements context
  • helpers/state_manager.py: manages local session state outside git checkouts (~/.local/state/coding-review-agent-loop/skill-sessions/); build-resume accepts --reviewers and --head-sha/--pr so it can correctly call _resume_plan_round(comments, configured_reviewers=...) and _resume_pr_round(comments, head_sha=..., configured_reviewers=...) with the required inputs
  • helpers/run_external.py: invokes Codex/Gemini CLI subprocesses; --dry-run writes a canned approved plan_review stub for offline testing
  • helpers/gh_ops.py: wraps gh CLI for issue/PR comment operations with --dry-run support
  • helpers/demo_loop.py: standalone script demonstrating a minimal Claude-host + Codex-dry-run loop without any live API calls
  • SKILL.md: step-by-step orchestration instructions for Claude inside a Claude Code session
  • docs/skill_mode.md: architecture table, billing guidance (non-prescriptive), limitations, install instructions
  • tests/test_skill_helpers.py: unit tests covering validate_response (valid inputs accepted, missing markers rejected, unknown prior item IDs rejected), state_manager round-trips, run_external dry-run stub
  • tests/test_skill_loop.py: subprocess integration test asserting demo_loop exits 0, stdout contains both validation passed: lines, and local session JSON records last_completed_step=post_review

The existing headless agent-loop CLI path is completely unchanged.

Test plan

  • python -m pytest tests/test_skill_helpers.py tests/test_skill_loop.py -v — 8 passed
  • python -m pytest tests/test_agent_loop.py -q — 716 passed (no regressions)
  • python -m helpers.demo_loop --issue 123 --repo demo/repo runs end-to-end with both validation steps passing

Architecture notes

  • State stored at ~/.local/state/coding-review-agent-loop/skill-sessions/{repo-slug}/{issue}.json — never dirties a git checkout
  • state_manager build-resume --reviewers is required (not optional) per plan item-8 resolution: the existing _resume_plan_round and _resume_pr_round APIs both require configured_reviewers
  • For PR-flow sessions, --head-sha or --pr is likewise required; the helper fetches headRefOid via gh pr view when only --pr is provided
  • GitHub comment AGENT_LOOP_META markers written by the skill are identical to the headless CLI format, enabling mixed-mode resume

🤖 Generated with Claude Code

…216)

Adds a Claude Code skill that lets users run the multi-agent review loop
inside an interactive Claude Code session without calling `claude -p` for
Claude turns. External agents (Codex, Gemini) are still invoked as subprocesses.

New files:
- helpers/validate_response.py: validates structured protocol responses
  using the existing library entry points (_validate_plan_review_response,
  _validate_review_response, _validate_coder_followup_response, etc.)
- helpers/state_manager.py: manages local session state and resume via
  _resume_plan_round/_resume_pr_round; accepts --reviewers and --head-sha/--pr
  as required by the existing resume APIs
- helpers/run_external.py: runs codex/gemini CLIs; --dry-run writes a
  canned approved plan_review stub for testing
- helpers/gh_ops.py: wraps gh CLI for issue/PR comment operations
- helpers/demo_loop.py: standalone dry-run demo (no live Claude/GitHub calls)
- SKILL.md: step-by-step skill orchestration instructions for Claude
- docs/skill_mode.md: architecture table, billing guidance, limitations
- tests/test_skill_helpers.py: unit tests for all helper CLIs
- tests/test_skill_loop.py: subprocess integration test for demo_loop.py

Fixes #216

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wwind123

wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

Implementation: Claude Code native skill mode for agent-loop orchestration

PR #283 implements the approved plan for issue #216.

What was built

New files (all in the feature branch feat/216-skill-mode):

File Purpose
helpers/validate_response.py CLI to validate structured protocol responses using the existing library (_validate_plan_review_response, _validate_review_response, parse_plan_state, etc.) with optional --context-file JSON supplying reviewer identity and prior items
helpers/state_manager.py Manages local session state outside git checkouts; build-resume accepts --reviewers (required) and --head-sha/--pr so it correctly calls _resume_plan_round(comments, configured_reviewers=...) and _resume_pr_round(comments, head_sha=..., configured_reviewers=...) — resolves item-8
helpers/run_external.py Invokes Codex/Gemini CLIs as subprocesses; --dry-run writes a canned approved plan_review stub
helpers/gh_ops.py gh CLI wrapper for issue/PR comment operations with --dry-run support
helpers/demo_loop.py Standalone dry-run demo: Claude stub plan → validate_response → Codex dry-run review → validate_responsegh_ops --dry-runstate_manager write-session
SKILL.md Step-by-step orchestration instructions for Claude inside a Claude Code session
docs/skill_mode.md Architecture table, billing guidance (non-prescriptive), limitations, install instructions
tests/test_skill_helpers.py Unit tests: valid inputs accepted, missing markers rejected, unknown prior item IDs rejected, state_manager round-trips, run_external dry-run stub
tests/test_skill_loop.py Integration test: python -m helpers.demo_loop --issue 88888 --repo demo/skill-loop-test exits 0, stdout contains both validation passed: lines, session JSON records last_completed_step=post_review

Acceptance criteria status

  • ✅ Concrete architecture documented in docs/skill_mode.md
  • ✅ Minimal Claude Code skill scaffold in SKILL.md + helpers/
  • ✅ Demo loop where Claude is host/coder and Codex is invoked externally (demo_loop.py), without calling claude -p for Claude's turn
  • ✅ Structured protocol validation preserved and reusing existing library entry points
  • ✅ Limitations, billing assumptions, and terms-sensitive boundaries documented
  • ✅ Existing headless CLI path unchanged (716 original tests still pass)

Key design decisions

  • State location: ~/.local/state/coding-review-agent-loop/skill-sessions/{repo-slug}/{issue}.json — never dirties a git checkout
  • --reviewers required by build-resume: resolves plan item-8; the existing _resume_plan_round and _resume_pr_round APIs both require configured_reviewers to reconstruct completed reviewer records
  • AGENT_LOOP_META compatibility: markers written by the skill are identical to the headless CLI format, enabling mixed-mode resume

Tests:

  • python -m pytest tests/test_skill_helpers.py tests/test_skill_loop.py -v → 8 passed
  • python -m pytest tests/test_agent_loop.py -q → 716 passed (no regressions)

-- Anthropic Claude

@wwind123

wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

Review verdict: Blocking

The new skill scaffold is additive and the focused helper tests pass locally with python3, but the prototype is not merge-ready because skill-posted comments are not written with the metadata that its resume path requires, and plan-revision validation is weaker than the existing orchestrator protocol.

Blocking issues

  • SKILL.md:74 and SKILL.md:118 route the host plan and reviewer output directly through helpers/gh_ops.py, whose post commands only read the file and pass it to gh as-is (helpers/gh_ops.py:47 and helpers/gh_ops.py:84). No helper or documented step constructs PostedRoundMetadata or calls _attach_round_metadata before posting, while the resume helper later depends on AGENT_LOOP_META records (helpers/state_manager.py:85, docs/skill_mode.md:42). As a result, comments produced by the skill cannot be resumed by state_manager build-resume, and the documented mixed-mode compatibility with the headless orchestrator is false. Add a metadata-backed comment preparation/posting path for coder plan/revision and reviewer comments, with tests that a skill-posted round is found by build-resume.
  • helpers/validate_response.py:136 validates plan_revision with validate_structured_plan_revision only, so it accepts prior_plan_item_dispositions for IDs that are not in the context ledger. The existing orchestrator path rejects those via _validate_plan_revision_response by comparing dispositions to unresolved_items (src/coding_review_agent_loop/orchestrator.py:1281). This breaks the requirement that the skill preserve the same structured protocol validation for host Claude plan revisions and can let invalid carried-item dispositions be posted. Route plan_revision through the existing validator or equivalent ledger check, and add a regression test with an unknown prior item ID.

-- OpenAI Codex

@wwind123

wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

Review verdict: Approved

The PR successfully implements the Claude Code native skill mode with robust CLI helpers, clear orchestration instructions in SKILL.md, and comprehensive test coverage. It correctly preserves the existing headless path while providing a session-aware orchestration model for interactive Claude users.

-- Google Gemini

…_revision validation

Addresses Codex blocking item-1 from PR #283 round 1 review:

1. Add state_manager attach-metadata subcommand that builds PostedRoundMetadata
   and calls _attach_round_metadata before posting, so skill-posted comments carry
   AGENT_LOOP_META and can be found by build-resume's _resume_plan_round /
   _resume_pr_round. Update demo_loop.py and SKILL.md to use this step.

2. Fix validate_response.py plan_revision validation to check that
   prior_plan_item_dispositions only reference item IDs present in the context
   ledger, matching the check in the headless orchestrator's
   _validate_plan_revision_response (orchestrator.py:1281). Add regression tests.

3. New tests: test_attach_metadata_produces_valid_agent_loop_meta and
   test_attach_metadata_reviewer_found_by_resume verify that skill-posted rounds
   are found by _resume_plan_round; test_plan_revision_with_unknown_prior_item_rejected
   and test_plan_revision_with_known_items_accepted cover the ledger check.

Refs #216

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wwind123

wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

Coder follow-up

Addressed both blocking issues from Codex round 1: added AGENT_LOOP_META attachment to skill-posted comments via state_manager attach-metadata, and strengthened plan_revision validation to reject unknown prior item IDs against the context ledger.

Addressed items

  • item-1: Blocking issue from OpenAI Codex, round 1: The new skill scaffold is additive and the focused helper tests pass locally with python3, but th...
    • Resolution: Two sub-issues resolved: (1) Added state_manager attach-metadata subcommand that builds PostedRoundMetadata and calls _attach_round_metadata before posting, so skill-posted comments carry AGENT_LOOP_META and are found by build-resume. Updated demo_loop.py to use attach-metadata for both the coder plan and reviewer comments, and updated SKILL.md with the required step. New tests test_attach_metadata_produces_valid_agent_loop_meta and test_attach_metadata_reviewer_found_by_resume directly call _resume_plan_round on fake comments produced by attach-metadata and assert the round is found with 1 completed reviewer. (2) Fixed validate_response.py plan_revision kind to run the same ledger check as the headless orchestrator's _validate_plan_revision_response (orchestrator.py:1281): after validate_structured_plan_revision, check that all prior_plan_item_dispositions item IDs are in the allowed_ids set from the context file's prior_items, and raise UnknownPriorItemDispositionError for any unknown IDs. Added regression tests test_plan_revision_with_unknown_prior_item_rejected and test_plan_revision_with_known_items_accepted.

Remaining items

  • None.

Tests run

  • python -m pytest tests/test_skill_helpers.py tests/test_skill_loop.py -v → 12 passed
  • python -m pytest tests/test_agent_loop.py -q → 716 passed

-- Anthropic Claude

@wwind123

wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

Review verdict: Approved

Approved. The round-2 changes add metadata attachment for skill-posted comments and bring plan-revision validation in line with the existing orchestrator checks. I locally ran the focused skill tests with python3; the PR checks are still pending per the supplied metadata.

Prior unresolved item dispositions

  • [item-1] Blocking issue from OpenAI Codex, round 1: The new skill scaffold is additive and the focused helper tests pass locally with python3, but th... -> resolved

-- OpenAI Codex

@wwind123

wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

Review verdict: Approved

The Claude Code native skill mode implementation is complete and resolves the prior blocking concerns. The new helper CLIs (state_manager, validate_response, run_external, gh_ops) provide a robust orchestration path that maintains structured protocol integrity and session resume compatibility with the headless CLI. Unit and integration tests confirm that skill-posted comments carry the necessary metadata for reconstruction and that plan-revision validation correctly enforces prior item ID checks.

Prior unresolved item dispositions

  • [item-1] Blocking issue from OpenAI Codex, round 1: The new skill scaffold is additive and the focused helper tests pass locally with python3, but th... -> resolved

-- Google Gemini

@wwind123

wwind123 commented Jun 8, 2026

Copy link
Copy Markdown
Owner Author

GitHub PR checks are still pending for PR #283.

Reviewer approvals do not make this PR merge-ready until GitHub PR checks are green, or the PR explicitly states that only a local subset passed.

  • Pending checks: test (in_progress)
  • Required status checks are not configured on the PR base branch.

-- coding-review-agent-loop

@wwind123 wwind123 merged commit 3a93d92 into main Jun 8, 2026
1 check passed
@wwind123 wwind123 deleted the feat/216-skill-mode branch June 8, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore Claude Code native skill mode for agent-loop orchestration

1 participant