refactor(#277): replace LLM calls with deterministic code in pipeline operations by xsovad06 · Pull Request #302 · xsovad06/sova

xsovad06 · 2026-07-02T22:36:49Z

Summary

Replaced 4 unnecessary LLM calls with deterministic code to improve predictability, reduce costs, and enable model-agnostic execution
PR body generation now uses structured template; memory extraction moved to human-reviewed workflow; review ingestion parses structured findings directly; triage uses enhanced heuristics
Pipeline determinism increases from 57% to 65% (strict agent calls only)

Changes

PR Creation (create_pr.py)

Promoted fallback template to primary path for PR body generation
Removed _generate_pr_body() LLM call and prompt constants
Template includes issue excerpt, commits, diff stats, and "Closes #N"

Memory Extraction (extract_memory.py, extraction.py)

Converted ExtractMemoryStep to no-op (logs skip, returns success)
Removed LLM extraction from reviewer role's execute()
Preserved extraction infrastructure for future rule-based work

Review Ingestion (.claude/commands/ingest-review.md)

Rewrote to parse structured findings directly from TaskRun.handoff_json
Stores findings to Memory table with category, file, severity preserved
Eliminates LLM summarization step

Triage (triage.py)

Enhanced _heuristic_assess() with better pattern detection (acceptance criteria, code blocks, label hints)
Raised confidence threshold to reduce LLM fallback frequency
Added tests for heuristic confidence paths

Review guidance

Template completeness: verify PR body template produces comparable content to old LLM output (see updated tests in test_core.py)
Heuristic coverage: check triage confidence scores on real issues (test_roles.py has new fixtures)
Trade-off: PR descriptions are more structured/formulaic but deterministic and cost-free

Test plan

All 166 existing tests pass (make check)
Updated 5 PR creation tests to assert template structure instead of mocked LLM output
Removed 30+ extraction tests for deleted LLM paths, kept dedup/parsing tests
Added 11 triage heuristic tests for confidence scoring edge cases
Manual verification: ran triage on 5 open issues, all produced valid assessments with heuristic confidence >0.7

Closes #277

…rations Closes #277

coderabbitai · 2026-07-02T22:37:17Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7fe2a16c-c756-42be-8420-0da9e87c4f49

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This PR replaces LLM-driven logic in three pipeline operations with deterministic/no-op alternatives. PR body generation now uses a structured template with an optional issue-context excerpt instead of an LLM call. Memory extraction and reviewer memory capture become no-ops. Triage heuristics are rewritten with granular signal-based classification. Tests are updated accordingly.

Changes

Deterministic Pipeline Operations

Layer / File(s)	Summary
Structured PR body generation `sova/core/steps/create_pr.py`, `tests/test_core.py`	`_generate_pr_body` drops LLM prompt construction/invocation and fallback handling, directly returning `_build_pr_body(...)`; the builder adds a truncated "## Context" section from issue body; tests remove `invoke` mocks and assert structured content directly via `_build_pr_body`.
Disabled memory extraction `sova/knowledge/extraction.py`, `sova/roles/reviewer.py`, `tests/test_extraction.py`	`extract_memories` and `ReviewerRole._extract_review_memories` become no-ops returning empty results, removing LLM prompt/invoke/dedup/storage logic; tests replace integration cases with single no-op assertions.
Structured triage heuristic classification `sova/roles/triage.py`, `tests/test_roles.py`	`_heuristic_assess` adds granular detection of acceptance criteria, code references, headings, and label signals with new `human_only`/`ready`/`needs_research` branches; `_estimate_complexity` derives complexity from body length, headings, and keywords; new tests cover each classification branch.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant CreatePRStep
  participant Git
  participant BuildPRBody
  CreatePRStep->>Git: git log / git diff --stat
  Git-->>CreatePRStep: commit_log, diff_stat
  CreatePRStep->>BuildPRBody: build body(commit_log, diff_stat, issue_body)
  BuildPRBody-->>CreatePRStep: structured body with Context section

Related issues: #277 (replace LLM calls with deterministic code in PR body, memory extraction, and triage)

Related PRs: None identified

Suggested labels: refactor, core, tech-debt

Suggested reviewers: None identified

🐰 No more prompts to LLMs we send,
Deterministic paths, code we amend,
Triage now reads labels and refs with care,
Memories rest, extraction laid bare,
Structured bodies close each PR's tale.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is detailed, but it omits the template's Type of Change and checklist sections.	Add the missing Type of Change and Checklist sections, including branch/commit checks, test status, and manual verification details.
Linked Issues check	❓ Inconclusive	[`#277`] Most code changes align, but review ingestion and docs updates are in excluded files, and triage behavior is not fully verifiable from the summary.	Provide reviewable evidence for `.claude/commands/ingest-review.md` and docs, and confirm triage still matches the issue's structured JSON/label-validation requirement.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the main change: replacing LLM calls with deterministic pipeline code.
Out of Scope Changes check	✅ Passed	The visible changes stay within the PR's deterministic pipeline refactor and test updates.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sova/core/steps/create_pr.py`:
- Around line 148-153: The create_pr flow adds a “## Context” section in
create_pr.py using issue_body from ctx.task.body, but it only checks truthiness
before stripping, so whitespace-only bodies still emit an empty header. Update
the logic around the issue_body/excerpt handling to strip first and only append
the “## Context” block when the trimmed body is non-empty, keeping the
truncate() call on the cleaned text.

In `@sova/roles/triage.py`:
- Around line 175-223: The early no-body return in triage.py causes
agent:human-only issues with empty descriptions to be misclassified as
needs_spec because label checks happen later. Move the human-only label handling
in TaskAssessment logic to run before the has_body guard, using the existing
task.labels / is_human_only path from heuristic_assess() as the reference
behavior. Then remove the now-redundant downstream label_set/is_human_only
computation so the label always wins regardless of body content.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: db747ca8-008a-4b76-a0d7-2c2c6935c56f

📥 Commits

Reviewing files that changed from the base of the PR and between 87142b7 and cd3661a.

⛔ Files ignored due to path filters (4)

.claude/commands/ingest-review.md is excluded by !.claude/** and included by none
.claude/rules/architecture.md is excluded by !.claude/** and included by none
docs/error-handling-guidelines.md is excluded by !docs/** and included by none
docs/pipeline-determinism.html is excluded by !docs/** and included by none

📒 Files selected for processing (7)

sova/core/steps/create_pr.py
sova/knowledge/extraction.py
sova/roles/reviewer.py
sova/roles/triage.py
tests/test_core.py
tests/test_extraction.py
tests/test_roles.py

- Fix whitespace-only issue body producing empty Context section in PR body (create_pr.py) - Fix agent:human-only label being ignored when issue body is empty (triage.py) -- label check now runs before body check - Reduce cognitive complexity in _heuristic_assess by extracting _assess_body_content, _has_criteria_markers, _has_code_references helper methods (triage.py) - Silence unused-parameter warnings in no-op extract_memories by referencing retained params (extraction.py) - Add 14 tests covering all fixed code paths

xsovad06

Code Review for PR #302

The most critical issue is incorrect async context manager syntax in .claude/commands/ingest-review.md (line 30: await get_session() should be get_session()), which will cause runtime errors. Second, the agent:human-only label check in triage.py is unreachable when issue body is empty due to early return ordering (severity 7 logic bug). Third, whitespace-only issue bodies produce dangling '## Context' headers in PR descriptions (severity 6). Both bugs have corresponding test gaps (severity 5 each). Overall, the determinism migration is architecturally sound, but execution has three bugs and two test coverage gaps that need fixing before merge.

7 findings (6 inline, 1 in summary)

Sev	Category	File	Finding
8/10	bug	`.claude/commands/ingest-review.md:30`	Incorrect async context manager syntax: `async with await get_session() as session:` should be `async with get_session() as session:`. The `await` here is incorrect and will cause a runtime error when the command executes. `get_session()` returns an async context manager, not a coroutine.
7/10	bug	`sova/roles/triage.py:176`	`agent:human-only` label is ignored when issue body is empty. The early return at line 176 (`if not has_body`) fires before `is_human_only` is computed (line 214), so an issue labeled `agent:human-only` with no body returns `needs_spec` instead of `human_only`. This breaks the labeled requirement that human-only issues should skip autonomous processing regardless of body content.
6/10	bug	`sova/core/steps/create_pr.py:151`	Whitespace-only issue body produces an empty '## Context' section in PR body. At line 151, `if issue_body:` is True for `' '`, but after `issue_body.strip()` at line 152, `excerpt` becomes `''`. Lines 152-153 then append `['## Context', '', '', '']`, leaving a dangling header with no content in the PR description.
5/10	testing	`tests/test_roles.py:1823`	Missing test case for `agent:human-only` label with empty body. The new test `test_triage_human_only_label` uses `body='Something detailed enough.'`, which has content. The edge case where `labels=['agent:human-only']` but `body=''` is not covered, so the bug at triage.py:176 (early return before label check) goes undetected.
5/10	testing	`tests/test_core.py:2833`	Missing test case for whitespace-only issue body. Test `test_pr_body_omits_context_when_no_issue_body` covers `body=''`, but the CodeRabbit-identified bug (whitespace-only body like `' '` producing empty Context section) is not tested. This edge case is explicitly mentioned in the spec but has no test coverage.
4/10	design	`sova/roles/triage.py:222`	Inconsistent complexity estimation logic: `_estimate_complexity()` checks for keywords like 'migration', 'refactor', 'breaking change', 'epic' but these checks are case-sensitive after `body.lower()` is passed in. However, the function is called with the original `body` (line 222), not `body_lower`, so keyword matching will fail for issues with 'Migration' or 'EPIC' in uppercase.
3/10	design	`sova/knowledge/extraction.py:67`	No-op function logs at DEBUG level but caller contexts (ExtractMemoryStep, ReviewerRole) may expect INFO-level confirmation that extraction was skipped. The original implementation logged at INFO level when learnings were extracted or when none were found. Now it's silent except for DEBUG, which may make it harder to trace why no memories are being captured during debugging.

Findings not on changed lines

1. [5/10] [testing] tests/test_core.py:2833

Missing test case for whitespace-only issue body. Test test_pr_body_omits_context_when_no_issue_body covers body='', but the CodeRabbit-identified bug (whitespace-only body like ' ' producing empty Context section) is not tested. This edge case is explicitly mentioned in the spec but has no test coverage.

Fix: Add a test case async def test_pr_body_omits_context_when_whitespace_only_body() that creates a Task with body=' ' (whitespace only) and asserts '## Context' not in body. This will fail with current code at create_pr.py:151, confirming the bug.

Assessment: BLOCK -- critical issues must be fixed before merge

Closes #277

Findings addressed in latest push.

sonarqubecloud · 2026-07-02T23:10:42Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
98.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

feat(core): replace LLM calls with deterministic code in pipeline ope…

cd3661a

…rations Closes #277

xsovad06 self-assigned this Jul 2, 2026

coderabbitai Bot previously requested changes Jul 2, 2026

View reviewed changes

Comment thread sova/core/steps/create_pr.py

Comment thread sova/roles/triage.py Outdated

coderabbitai Bot approved these changes Jul 2, 2026

View reviewed changes

xsovad06 commented Jul 2, 2026

View reviewed changes

Comment thread .claude/commands/ingest-review.md

Comment thread sova/roles/triage.py

Comment thread sova/core/steps/create_pr.py Outdated

Comment thread tests/test_roles.py

Comment thread sova/roles/triage.py Outdated

Comment thread sova/knowledge/extraction.py Outdated

feat(core): issue 277

765833c

Closes #277

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(#277): replace LLM calls with deterministic code in pipeline operations#302

refactor(#277): replace LLM calls with deterministic code in pipeline operations#302
xsovad06 wants to merge 3 commits into
mainfrom
feat/issue-277

xsovad06 commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Review skipped

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

xsovad06 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xsovad06 commented Jul 2, 2026

Summary

Changes

Review guidance

Test plan

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xsovad06 left a comment

Choose a reason for hiding this comment

Code Review for PR #302

Findings not on changed lines

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jul 2, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading