diff --git a/.claude/commands/ingest-review.md b/.claude/commands/ingest-review.md index 14f3dc0..3f7f0a4 100644 --- a/.claude/commands/ingest-review.md +++ b/.claude/commands/ingest-review.md @@ -11,33 +11,70 @@ outputs: # Ingest PR Review Feedback -Process review comments from a merged PR and update agent memory. +Parse structured review findings from TaskRun records and update agent memory. ## Instructions 1. Get the PR number from `$ARGUMENTS`. If empty, ask the user. -2. Fetch PR data: +2. Query the database for reviewer TaskRun records linked to this PR: ```bash - gh pr view --json comments,reviews,body,title - gh api repos///pulls//comments + # Find the reviewer run's handoff data + python3 -c " + import asyncio, json, os + os.environ.setdefault('SOVA_DATABASE_URL', 'sqlite+aiosqlite://.claude/sova.db') + from sova.db.session import init_db, get_session + from sova.db.models import TaskRun + from sqlalchemy import select + + async def main(): + await init_db(run_migrations=False) + async with get_session() as session: + stmt = select(TaskRun).where( + TaskRun.pr_number == , + TaskRun.role.in_(['reviewer', 'command:review-pr']), + TaskRun.status == 'done', + ).order_by(TaskRun.id.desc()).limit(1) + run = (await session.execute(stmt)).scalar_one_or_none() + if not run or not run.handoff_json: + print('NO_FINDINGS') + return + print(json.dumps(run.handoff_json, indent=2)) + + asyncio.run(main()) + " + ``` + +3. If the output is `NO_FINDINGS`, report "No reviewer findings found for PR #N" and stop. + +4. Parse the `pending_findings` array from the handoff JSON. Each finding has: + - `file`: file path + - `line`: line number + - `severity`: 1-10 score + - `category`: type of issue (bug, style, performance, etc.) + - `description`: what the issue is + - `suggestion`: how to fix it + +5. Also fetch external review comments (CodeRabbit, human reviewers): + ```bash + gh pr view --json reviews,comments --jq '.reviews[] | {author: .author.login, state: .state, body: .body}' ``` -3. Analyze review comments and extract lessons: - - **Patterns to always follow** -- things reviewers praised or requested - - **Mistakes to avoid** -- bugs caught, missing edge cases, style violations - - **Style preferences** -- formatting, naming, structural preferences - - **Test coverage gaps** -- missing assertions, untested scenarios +6. Classify findings into memory categories: + - **Severity >= 7**: likely a "common_mistake" -- check `.claude/agent-memory/cookbook.md` for existing entries + - **Severity 4-6 with "style" or "naming" category**: "style preference" + - **Repeated patterns across findings**: "review pattern" worth codifying + - **Test-related findings**: "test coverage gap" -4. Read existing memory file: +7. Read existing memory file: - `.claude/agent-memory/cookbook.md` -5. Update `.claude/agent-memory/cookbook.md`: +8. Update `.claude/agent-memory/cookbook.md`: - Append new findings under the matching domain section (no duplicates) - - If a mistake has appeared before, add it to the "Common Mistakes" section with `[Nx]` count - - If a finding is high-impact, add it to `MEMORY.md` + - If a mistake already exists, increment its `[Nx]` count + - If a finding is high-impact (severity >= 8), also add it to `.claude/agent-memory/MEMORY.md` -6. Report what was learned and which files were updated. +9. Report what was learned and which files were updated. ## Cross-References @@ -47,4 +84,5 @@ Process review comments from a merged PR and update agent memory. ## Rules - Only record actionable, specific lessons -- not generic advice +- Parse structured data from DB records; do NOT use LLM summarization - NEVER use emojis in any output diff --git a/.claude/rules/architecture.md b/.claude/rules/architecture.md index 8b301e3..888022e 100644 --- a/.claude/rules/architecture.md +++ b/.claude/rules/architecture.md @@ -14,7 +14,7 @@ SOVA has four main components: - `core/state.py` -- 17-state TaskStatus StrEnum with transition validation - `core/context.py` -- ExecutionContext dataclass threading state through steps - `core/output.py` -- OutputWriter for per-run DB-backed output persistence, read_lines, retention cleanup -- `core/steps/` -- 29 BaseStep implementations with execute/validate_output/can_skip. Four pipeline variants: +- `core/steps/` -- 26 BaseStep implementations with execute/validate_output/can_skip. Four pipeline variants: - **Developer pipeline** (16 steps): sync -> assess -> create_worktree -> capture_baseline -> develop -> simplify -> self_review -> commit -> validate -> push -> create_pr -> wait_for_external_reviews -> address_external_findings -> monitor_ci -> extract_memory -> handoff_to_reviewer - **Address-review pipeline** (9 steps): rebase -> address_review -> commit -> validate -> push -> monitor_ci -> resolve_external_reviews -> extract_memory -> handoff_to_user - **Researcher pipeline** (4 steps): fetch_task -> research -> spec -> extract_memory @@ -115,12 +115,12 @@ The project's full name is **SOVA** (Software Orchestration Via Agents). - **Auto-handoff must clear handoff file before spawning next agent**: `_process_auto_handoff()` in `agent_handoff.py` calls `handoff_service.clear_handoff()` before spawning the next agent via `start_agent()` or `start_command()`. Without this, the newly spawned agent or a concurrent dashboard poll may re-process the same handoff, leading to duplicate agent spawns. The clear must happen synchronously before the spawn call, not after. - **Seed cross-agent data before clearing the handoff file**: when spawning an agent that needs data from the previous agent's handoff (e.g., review findings for address-review), extract the data before `clear_handoff()` and write it to the new TaskRun's `handoff_json` via `start_agent(handoff_findings=...)`. Without this, the address-review step can't find findings when: (1) the handoff file was cleared, (2) `resume_run_id` points to a non-reviewer run, and (3) the issue-level DB fallback fails. File: `sova/dashboard/services/agent_db.py:_seed_handoff_findings()`. - **Adapter ABC contract**: `TaskAdapter` in `sova/adapters/base.py` defines 18 abstract methods: `list_tasks`, `get_task`, `get_state`, `transition_state`, `assign`, `add_label`, `remove_label`, `_do_post_comment`, `_do_post_pr_comment`, `_do_post_pr_review`, `_do_edit_body`, `link_pr`, `get_pr_reviews`, `_do_create_issue`, `get_available_transitions`, `list_milestones`, `create_milestone`, `set_milestone`, plus `github_user` field (constructor param). Public posting methods (`post_comment`, `post_pr_comment`, `post_pr_review`, `edit_body`, `create_issue`) are concrete Template Methods that run the egress filter before delegating to `_do_*` implementations. When adding new agent capabilities that interact with the tracker, add the method to the ABC first, then implement in `GitHubAdapter`. Factory: `create_adapter(type, repo, github_user, project_number)`. -- **LLM for user-facing outputs with structured fallback**: workflow steps producing user-facing content (PR descriptions, review comments) use a focused LLM call (Sonnet, ~$0.01) with structured context (commit log, diff stats, issue body). Fallback MUST preserve available data -- build a structured body from already-fetched data rather than discarding to a bare stub. Pattern: `_generate_pr_body()` + `_build_fallback_body()` in `create_pr.py`, `_run_review()` in `reviewer.py`. +- **Deterministic PR body from structured data**: `CreatePRStep._build_pr_body()` builds PR descriptions deterministically from commits, diff stats, issue body excerpt (truncated to 500 chars), and Closes link -- no LLM call. The reviewer role's `_run_review()` is the only user-facing LLM call in the pipeline (structured output at the right abstraction level). - **Rebase with LLM conflict resolution**: `rebase_with_conflict_resolution()` in `sova/git/rebase.py` rebases onto the latest base branch and uses the LLM to resolve merge conflicts. Loop: detect conflicted files, invoke LLM to fix markers + `git add`, `git rebase --continue`. Aborts on LLM failure or after `max_attempts` (default 3) so the worktree is never left in a broken rebase state. `RebaseStep` in the address-review pipeline runs this before `AddressReviewStep`. `PushStep` uses `--force-with-lease` when `ctx.pr_number` is set (post-rebase history rewrite). - **Pydantic request models must be at module scope in `app.py`**: `from __future__ import annotations` makes all type annotations lazy strings. FastAPI resolves annotation strings by searching the module's global namespace. Pydantic `BaseModel` subclasses defined inside nested functions (e.g., `_setup_multi_project()`) are not in module globals, so FastAPI falls back to treating the parameter as a query param, producing 422 errors. Always define request/response models at module level. File: `sova/dashboard/app.py`. - **Thin re-export wrappers during module splits**: when splitting a large module into submodules (e.g., `control_service.py` -> `agent_lifecycle.py` + `agent_output.py` + `agent_recovery.py`), convert the original module to a thin re-export facade (`from agent_lifecycle import X`) rather than deleting it. This preserves backward compatibility for all existing imports in routers, tests, and commands. Delete the old module only after all imports are migrated. - **Never non-visible overflow on containers with popout children**: the sidebar `