Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 52 additions & 14 deletions .claude/commands/ingest-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,33 +11,70 @@ outputs:

# Ingest PR Review Feedback

Process review comments from a merged PR and update agent memory.
Parse structured review findings from TaskRun records and update agent memory.

## Instructions

1. Get the PR number from `$ARGUMENTS`. If empty, ask the user.

2. Fetch PR data:
2. Query the database for reviewer TaskRun records linked to this PR:
```bash
gh pr view <PR_NUMBER> --json comments,reviews,body,title
gh api repos/<OWNER>/<REPO>/pulls/<PR_NUMBER>/comments
# Find the reviewer run's handoff data
python3 -c "
import asyncio, json, os
os.environ.setdefault('SOVA_DATABASE_URL', 'sqlite+aiosqlite://.claude/sova.db')
from sova.db.session import init_db, get_session
from sova.db.models import TaskRun
from sqlalchemy import select

async def main():
Comment thread
xsovad06 marked this conversation as resolved.
await init_db(run_migrations=False)
async with get_session() as session:
stmt = select(TaskRun).where(
TaskRun.pr_number == <PR_NUMBER>,
TaskRun.role.in_(['reviewer', 'command:review-pr']),
TaskRun.status == 'done',
).order_by(TaskRun.id.desc()).limit(1)
run = (await session.execute(stmt)).scalar_one_or_none()
if not run or not run.handoff_json:
print('NO_FINDINGS')
return
print(json.dumps(run.handoff_json, indent=2))

asyncio.run(main())
"
```

3. If the output is `NO_FINDINGS`, report "No reviewer findings found for PR #N" and stop.

4. Parse the `pending_findings` array from the handoff JSON. Each finding has:
- `file`: file path
- `line`: line number
- `severity`: 1-10 score
- `category`: type of issue (bug, style, performance, etc.)
- `description`: what the issue is
- `suggestion`: how to fix it

5. Also fetch external review comments (CodeRabbit, human reviewers):
```bash
gh pr view <PR_NUMBER> --json reviews,comments --jq '.reviews[] | {author: .author.login, state: .state, body: .body}'
```

3. Analyze review comments and extract lessons:
- **Patterns to always follow** -- things reviewers praised or requested
- **Mistakes to avoid** -- bugs caught, missing edge cases, style violations
- **Style preferences** -- formatting, naming, structural preferences
- **Test coverage gaps** -- missing assertions, untested scenarios
6. Classify findings into memory categories:
- **Severity >= 7**: likely a "common_mistake" -- check `.claude/agent-memory/cookbook.md` for existing entries
- **Severity 4-6 with "style" or "naming" category**: "style preference"
- **Repeated patterns across findings**: "review pattern" worth codifying
- **Test-related findings**: "test coverage gap"

4. Read existing memory file:
7. Read existing memory file:
- `.claude/agent-memory/cookbook.md`

5. Update `.claude/agent-memory/cookbook.md`:
8. Update `.claude/agent-memory/cookbook.md`:
- Append new findings under the matching domain section (no duplicates)
- If a mistake has appeared before, add it to the "Common Mistakes" section with `[Nx]` count
- If a finding is high-impact, add it to `MEMORY.md`
- If a mistake already exists, increment its `[Nx]` count
- If a finding is high-impact (severity >= 8), also add it to `.claude/agent-memory/MEMORY.md`

6. Report what was learned and which files were updated.
9. Report what was learned and which files were updated.

## Cross-References

Expand All @@ -47,4 +84,5 @@ Process review comments from a merged PR and update agent memory.
## Rules

- Only record actionable, specific lessons -- not generic advice
- Parse structured data from DB records; do NOT use LLM summarization
- NEVER use emojis in any output
6 changes: 3 additions & 3 deletions .claude/rules/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ SOVA has four main components:
- `core/state.py` -- 17-state TaskStatus StrEnum with transition validation
- `core/context.py` -- ExecutionContext dataclass threading state through steps
- `core/output.py` -- OutputWriter for per-run DB-backed output persistence, read_lines, retention cleanup
- `core/steps/` -- 29 BaseStep implementations with execute/validate_output/can_skip. Four pipeline variants:
- `core/steps/` -- 26 BaseStep implementations with execute/validate_output/can_skip. Four pipeline variants:
- **Developer pipeline** (16 steps): sync -> assess -> create_worktree -> capture_baseline -> develop -> simplify -> self_review -> commit -> validate -> push -> create_pr -> wait_for_external_reviews -> address_external_findings -> monitor_ci -> extract_memory -> handoff_to_reviewer
- **Address-review pipeline** (9 steps): rebase -> address_review -> commit -> validate -> push -> monitor_ci -> resolve_external_reviews -> extract_memory -> handoff_to_user
- **Researcher pipeline** (4 steps): fetch_task -> research -> spec -> extract_memory
Expand Down Expand Up @@ -115,12 +115,12 @@ The project's full name is **SOVA** (Software Orchestration Via Agents).
- **Auto-handoff must clear handoff file before spawning next agent**: `_process_auto_handoff()` in `agent_handoff.py` calls `handoff_service.clear_handoff()` before spawning the next agent via `start_agent()` or `start_command()`. Without this, the newly spawned agent or a concurrent dashboard poll may re-process the same handoff, leading to duplicate agent spawns. The clear must happen synchronously before the spawn call, not after.
- **Seed cross-agent data before clearing the handoff file**: when spawning an agent that needs data from the previous agent's handoff (e.g., review findings for address-review), extract the data before `clear_handoff()` and write it to the new TaskRun's `handoff_json` via `start_agent(handoff_findings=...)`. Without this, the address-review step can't find findings when: (1) the handoff file was cleared, (2) `resume_run_id` points to a non-reviewer run, and (3) the issue-level DB fallback fails. File: `sova/dashboard/services/agent_db.py:_seed_handoff_findings()`.
- **Adapter ABC contract**: `TaskAdapter` in `sova/adapters/base.py` defines 18 abstract methods: `list_tasks`, `get_task`, `get_state`, `transition_state`, `assign`, `add_label`, `remove_label`, `_do_post_comment`, `_do_post_pr_comment`, `_do_post_pr_review`, `_do_edit_body`, `link_pr`, `get_pr_reviews`, `_do_create_issue`, `get_available_transitions`, `list_milestones`, `create_milestone`, `set_milestone`, plus `github_user` field (constructor param). Public posting methods (`post_comment`, `post_pr_comment`, `post_pr_review`, `edit_body`, `create_issue`) are concrete Template Methods that run the egress filter before delegating to `_do_*` implementations. When adding new agent capabilities that interact with the tracker, add the method to the ABC first, then implement in `GitHubAdapter`. Factory: `create_adapter(type, repo, github_user, project_number)`.
- **LLM for user-facing outputs with structured fallback**: workflow steps producing user-facing content (PR descriptions, review comments) use a focused LLM call (Sonnet, ~$0.01) with structured context (commit log, diff stats, issue body). Fallback MUST preserve available data -- build a structured body from already-fetched data rather than discarding to a bare stub. Pattern: `_generate_pr_body()` + `_build_fallback_body()` in `create_pr.py`, `_run_review()` in `reviewer.py`.
- **Deterministic PR body from structured data**: `CreatePRStep._build_pr_body()` builds PR descriptions deterministically from commits, diff stats, issue body excerpt (truncated to 500 chars), and Closes link -- no LLM call. The reviewer role's `_run_review()` is the only user-facing LLM call in the pipeline (structured output at the right abstraction level).
- **Rebase with LLM conflict resolution**: `rebase_with_conflict_resolution()` in `sova/git/rebase.py` rebases onto the latest base branch and uses the LLM to resolve merge conflicts. Loop: detect conflicted files, invoke LLM to fix markers + `git add`, `git rebase --continue`. Aborts on LLM failure or after `max_attempts` (default 3) so the worktree is never left in a broken rebase state. `RebaseStep` in the address-review pipeline runs this before `AddressReviewStep`. `PushStep` uses `--force-with-lease` when `ctx.pr_number` is set (post-rebase history rewrite).
- **Pydantic request models must be at module scope in `app.py`**: `from __future__ import annotations` makes all type annotations lazy strings. FastAPI resolves annotation strings by searching the module's global namespace. Pydantic `BaseModel` subclasses defined inside nested functions (e.g., `_setup_multi_project()`) are not in module globals, so FastAPI falls back to treating the parameter as a query param, producing 422 errors. Always define request/response models at module level. File: `sova/dashboard/app.py`.
- **Thin re-export wrappers during module splits**: when splitting a large module into submodules (e.g., `control_service.py` -> `agent_lifecycle.py` + `agent_output.py` + `agent_recovery.py`), convert the original module to a thin re-export facade (`from agent_lifecycle import X`) rather than deleting it. This preserves backward compatibility for all existing imports in routers, tests, and commands. Delete the old module only after all imports are migrated.
- **Never non-visible overflow on containers with popout children**: the sidebar `<nav>` and its scrollable child div contain absolutely-positioned children (notification panel, CSS hover tooltips) that extend beyond their boundary at `left: 100%`. ANY non-visible overflow value (`hidden`, `auto`, `scroll`) on ANY ancestor creates a clipping boundary for positioned descendants. CSS spec: when one axis is non-visible, the other is computed to `auto` even if explicitly set to `visible` -- so `overflow-y: auto; overflow-x: visible` does NOT work (x becomes `auto` too). Use `max-width` and opacity transitions on individual child elements instead of overflow on the container. This applies to any fixed-position sidebar or panel that contains popout menus or tooltips.
- **Automatic memory extraction**: `ExtractMemoryStep` runs before every handoff step in both pipelines. `ReviewerRole.execute()` calls `extract_memories()` directly (reviewer doesn't use WorkflowEngine). A single Haiku LLM call (~$0.005-0.01) extracts 0-5 reusable learnings from run context (role, task, files changed, step summaries, review findings). Results are stored to the Memory DB table via `memory.store()` with deduplication (title similarity check against existing memories in same category). Confirmation counters (`[confirmed: N]` in content field) track reuse; memories auto-promote to "shared" tier at N=3. Extraction is fully non-fatal: failures are logged but never block the pipeline. Module: `sova/knowledge/extraction.py`.
- **Memory extraction is a no-op**: `ExtractMemoryStep` still runs before every handoff step in both pipelines (step slot preserved for future rule-based extraction), but `extract_memories()` returns immediately without calling the LLM. `ReviewerRole._extract_review_memories()` is also a no-op. Use the human-reviewed `/extract-knowledge` command for knowledge capture. Infrastructure (`_build_extraction_prompt`, `_parse_extraction_response`, `_deduplicate_and_store`) is retained in `sova/knowledge/extraction.py` for future use.
- **Issue Lifecycle Control**: `IssueLifecycle` is the "spine" connecting all `TaskRun` records for a single issue into a unified journey with 6 phases (`LifecyclePhase` enum: development, post_pr, review, address_review, integrate, post_merge). `LifecyclePhaseRecord` tracks each phase execution (status, cost, attempt counter, linked TaskRun). Phase transitions are advisory (warnings, not strict enforcement) -- matches `--force` philosophy. The `post_pr` phase is passive (no TaskRun; inferred from PR creation). `lifecycle_service.py` provides CRUD, phase transitions, and backward-compatible reconstruction from pre-existing TaskRuns via `build_lifecycle_view()`. Integration hooks in `agent_lifecycle.py` (`_link_run_to_lifecycle`, `_finalize_lifecycle_phase`) are non-fatal side effects. Dashboard UI at `/lifecycle/{issue_number}` shows a phase rail with live polling.
- **Doc counts drift after refactors**: step count, service count, adapter method count, CLI subcommand list, and test count go stale after module splits, service extractions, or ABC changes. After any structural refactor, run actual counts (`find`, `grep -c`, `wc -l`) and update AGENTS.md, architecture.md, and CLAUDE.md. The `/review` command checks doc freshness automatically.
- **Stale references persist after file/feature renames**: when a file is deleted, renamed, or a concept changes (e.g., "PAK" -> "SOVA"), references survive in commands, rules, issue bodies, and vision docs. After any rename or deletion, run `grep -rn "OLD_NAME" . --include="*.md"` to find stale references across all markdown files. Also check GitHub Issues and VISION.md for outdated naming.
Expand Down
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ sova/
KNOWLEDGE.md # 4-tier knowledge management system
templates/ # Project scaffolding templates
deploy/ # systemd + launchd service files
tests/ # pytest suite (2800+ tests)
tests/ # pytest suite (2900+ tests)
docs/
VISION.md # Product vision and roadmap
ARCHITECTURE.md # Architecture overview (points to .claude/rules/)
Expand Down
8 changes: 5 additions & 3 deletions docs/error-handling-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,13 +180,15 @@ Status updates are conditional (once terminal, never overwrite); cost updates ar

When LLM calls fail for user-facing outputs, always provide a structured fallback from available data. **Never discard to a bare stub.**

PR body generation uses a deterministic template (`_build_pr_body()`) -- no LLM call. For steps that still use conditional LLM calls (validate, monitor_ci, rebase), the pattern is:

```python
try:
result = await invoke(prompt, model="sonnet", cwd=ctx.working_dir, timeout=120)
result = await invoke(prompt, model=model, cwd=ctx.working_dir, timeout=120)
return result.text
except RuntimeError:
log.warning("step.create_pr.body_generation_failed", fallback="structured")
return self._build_fallback_body(ctx, task_title, commit_log, diff_stat)
log.warning("step.operation_failed", fallback="structured")
return build_structured_fallback(available_data)
```

### JSON Parsing with Substring Extraction
Expand Down
4 changes: 2 additions & 2 deletions docs/pipeline-determinism.html
Original file line number Diff line number Diff line change
Expand Up @@ -782,10 +782,10 @@ <h2>Determinism Opportunities</h2>
</tr>
<tr>
<td>PR body generation</td>
<td><span class="step-tag tag-llm" style="display:inline">llm</span></td>
<td><span class="step-tag tag-code" style="display:inline">code</span></td>
<td class="arrow">--></td>
<td><span class="step-tag tag-code" style="display:inline">code</span></td>
<td>Structured template from commits + diff stats + issue body. Fallback already exists in <code>_build_fallback_body()</code>; promote it to primary.</td>
<td>Deterministic template via <code>_build_pr_body()</code>: commits + diff stats + issue body excerpt (500 char limit). Done.</td>
</tr>
<tr>
<td>Commit message (step 7)</td>
Expand Down
87 changes: 13 additions & 74 deletions sova/core/steps/create_pr.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Step 9: Create PR -- generate a rich description and open a pull request."""
"""Step 9: Create PR -- generate a structured description and open a pull request."""

from __future__ import annotations

Expand All @@ -9,61 +9,22 @@
from sova.core.context import ExecutionContext
from sova.core.steps.base import BaseStep, GateCheckResult, StepResult
from sova.git import operations as git_ops
from sova.llm.client import invoke
from sova.utils.formatting import truncate
from sova.utils.logging import get_logger
from sova.utils.shell import run

log = get_logger(component="step.create_pr")

_PLACEHOLDER = "(none)"

_ISSUE_BODY_EXCERPT_LIMIT = 500

_CONVENTIONAL_RE = re.compile(
r"^(feat|fix|refactor|test|docs|chore|ci)"
r"(?:\([^)]*\))?"
r":\s*",
)

_PR_BODY_PROMPT_BASE = """\
Generate a pull request description for the changes below. Output ONLY the \
markdown body (no fences, no commentary). Use this structure:

## Summary
1-3 bullet points: WHAT changed and WHY.

## Changes
Brief description of each logical change grouped by area.

## Review guidance
What should a reviewer focus on? Any trade-offs or shortcuts?

## Test plan
How were these changes verified?
"""

_PR_BODY_ISSUE_SECTION = """
Closes #{issue_number}

---
Issue #{issue_number}: {issue_title}

{issue_body}

"""

_PR_BODY_NO_ISSUE_SECTION = """
---
Task: {issue_title}

"""

_PR_BODY_CONTEXT = """\
Commits on this branch:
{commit_log}

Files changed:
{diff_stat}
"""


def _build_pr_title(task_title: str, issue_number: str | None) -> str:
"""Build a PR title from the task title, avoiding double conventional prefixes.
Expand Down Expand Up @@ -169,41 +130,12 @@ async def _generate_pr_body(self, ctx: ExecutionContext, task_title: str) -> str
),
)

issue_body = ctx.task.body if ctx.task else ""
commit_log = log_result.stdout.strip() if log_result.success else "(unavailable)"
diff_stat = diff_result.stdout.strip() if diff_result.success else "(unavailable)"

if ctx.has_issue:
middle = _PR_BODY_ISSUE_SECTION.format(
issue_number=ctx.issue_number,
issue_title=task_title,
issue_body=issue_body or "(no description)",
)
else:
middle = _PR_BODY_NO_ISSUE_SECTION.format(issue_title=task_title)

prompt = (
_PR_BODY_PROMPT_BASE
+ middle
+ _PR_BODY_CONTEXT.format(
commit_log=commit_log,
diff_stat=diff_stat,
)
)

try:
result = await invoke(prompt, model="sonnet", cwd=ctx.working_dir, timeout=120)
ctx.add_cost(result.cost_usd)
body = result.text
if ctx.has_issue and f"#{ctx.issue_number}" not in body:
body += f"\n\nCloses #{ctx.issue_number}"
return body
except RuntimeError:
log.warning("step.create_pr.body_generation_failed", fallback="structured")
return self._build_fallback_body(ctx, task_title, commit_log, diff_stat)
return self._build_pr_body(ctx, task_title, commit_log, diff_stat)

@staticmethod
def _build_fallback_body(ctx: ExecutionContext, task_title: str, commit_log: str, diff_stat: str) -> str:
def _build_pr_body(ctx: ExecutionContext, task_title: str, commit_log: str, diff_stat: str) -> str:
lines = [
"## Summary",
"",
Expand All @@ -213,6 +145,13 @@ def _build_fallback_body(ctx: ExecutionContext, task_title: str, commit_log: str
if ctx.has_issue:
lines.append(f"Closes #{ctx.issue_number}")
lines.append("")

issue_body = (ctx.task.body if ctx.task else "") or ""
stripped_body = issue_body.strip()
if stripped_body:
excerpt = truncate(stripped_body, max_length=_ISSUE_BODY_EXCERPT_LIMIT)
lines.extend(["## Context", "", excerpt, ""])

Comment thread
xsovad06 marked this conversation as resolved.
lines.extend(
[
"## Commits",
Expand Down
Loading
Loading