Skip to content

refactor(#277): replace LLM calls with deterministic code in pipeline operations#302

Open
xsovad06 wants to merge 3 commits into
mainfrom
feat/issue-277
Open

refactor(#277): replace LLM calls with deterministic code in pipeline operations#302
xsovad06 wants to merge 3 commits into
mainfrom
feat/issue-277

Conversation

@xsovad06

@xsovad06 xsovad06 commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • Replaced 4 unnecessary LLM calls with deterministic code to improve predictability, reduce costs, and enable model-agnostic execution
  • PR body generation now uses structured template; memory extraction moved to human-reviewed workflow; review ingestion parses structured findings directly; triage uses enhanced heuristics
  • Pipeline determinism increases from 57% to 65% (strict agent calls only)

Changes

PR Creation (create_pr.py)

  • Promoted fallback template to primary path for PR body generation
  • Removed _generate_pr_body() LLM call and prompt constants
  • Template includes issue excerpt, commits, diff stats, and "Closes #N"

Memory Extraction (extract_memory.py, extraction.py)

  • Converted ExtractMemoryStep to no-op (logs skip, returns success)
  • Removed LLM extraction from reviewer role's execute()
  • Preserved extraction infrastructure for future rule-based work

Review Ingestion (.claude/commands/ingest-review.md)

  • Rewrote to parse structured findings directly from TaskRun.handoff_json
  • Stores findings to Memory table with category, file, severity preserved
  • Eliminates LLM summarization step

Triage (triage.py)

  • Enhanced _heuristic_assess() with better pattern detection (acceptance criteria, code blocks, label hints)
  • Raised confidence threshold to reduce LLM fallback frequency
  • Added tests for heuristic confidence paths

Review guidance

  • Template completeness: verify PR body template produces comparable content to old LLM output (see updated tests in test_core.py)
  • Heuristic coverage: check triage confidence scores on real issues (test_roles.py has new fixtures)
  • Trade-off: PR descriptions are more structured/formulaic but deterministic and cost-free

Test plan

  • All 166 existing tests pass (make check)
  • Updated 5 PR creation tests to assert template structure instead of mocked LLM output
  • Removed 30+ extraction tests for deleted LLM paths, kept dedup/parsing tests
  • Added 11 triage heuristic tests for confidence scoring edge cases
  • Manual verification: ran triage on 5 open issues, all produced valid assessments with heuristic confidence >0.7

Closes #277

@xsovad06 xsovad06 self-assigned this Jul 2, 2026
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7fe2a16c-c756-42be-8420-0da9e87c4f49

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR replaces LLM-driven logic in three pipeline operations with deterministic/no-op alternatives. PR body generation now uses a structured template with an optional issue-context excerpt instead of an LLM call. Memory extraction and reviewer memory capture become no-ops. Triage heuristics are rewritten with granular signal-based classification. Tests are updated accordingly.

Changes

Deterministic Pipeline Operations

Layer / File(s) Summary
Structured PR body generation
sova/core/steps/create_pr.py, tests/test_core.py
_generate_pr_body drops LLM prompt construction/invocation and fallback handling, directly returning _build_pr_body(...); the builder adds a truncated "## Context" section from issue body; tests remove invoke mocks and assert structured content directly via _build_pr_body.
Disabled memory extraction
sova/knowledge/extraction.py, sova/roles/reviewer.py, tests/test_extraction.py
extract_memories and ReviewerRole._extract_review_memories become no-ops returning empty results, removing LLM prompt/invoke/dedup/storage logic; tests replace integration cases with single no-op assertions.
Structured triage heuristic classification
sova/roles/triage.py, tests/test_roles.py
_heuristic_assess adds granular detection of acceptance criteria, code references, headings, and label signals with new human_only/ready/needs_research branches; _estimate_complexity derives complexity from body length, headings, and keywords; new tests cover each classification branch.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant CreatePRStep
  participant Git
  participant BuildPRBody
  CreatePRStep->>Git: git log / git diff --stat
  Git-->>CreatePRStep: commit_log, diff_stat
  CreatePRStep->>BuildPRBody: build body(commit_log, diff_stat, issue_body)
  BuildPRBody-->>CreatePRStep: structured body with Context section
Loading

Related issues: #277 (replace LLM calls with deterministic code in PR body, memory extraction, and triage)

Related PRs: None identified

Suggested labels: refactor, core, tech-debt

Suggested reviewers: None identified

🐰 No more prompts to LLMs we send,
Deterministic paths, code we amend,
Triage now reads labels and refs with care,
Memories rest, extraction laid bare,
Structured bodies close each PR's tale.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is detailed, but it omits the template's Type of Change and checklist sections. Add the missing Type of Change and Checklist sections, including branch/commit checks, test status, and manual verification details.
Linked Issues check ❓ Inconclusive [#277] Most code changes align, but review ingestion and docs updates are in excluded files, and triage behavior is not fully verifiable from the summary. Provide reviewable evidence for .claude/commands/ingest-review.md and docs, and confirm triage still matches the issue's structured JSON/label-validation requirement.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: replacing LLM calls with deterministic pipeline code.
Out of Scope Changes check ✅ Passed The visible changes stay within the PR's deterministic pipeline refactor and test updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

coderabbitai[bot]
coderabbitai Bot previously requested changes Jul 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sova/core/steps/create_pr.py`:
- Around line 148-153: The create_pr flow adds a “## Context” section in
create_pr.py using issue_body from ctx.task.body, but it only checks truthiness
before stripping, so whitespace-only bodies still emit an empty header. Update
the logic around the issue_body/excerpt handling to strip first and only append
the “## Context” block when the trimmed body is non-empty, keeping the
truncate() call on the cleaned text.

In `@sova/roles/triage.py`:
- Around line 175-223: The early no-body return in triage.py causes
agent:human-only issues with empty descriptions to be misclassified as
needs_spec because label checks happen later. Move the human-only label handling
in TaskAssessment logic to run before the has_body guard, using the existing
task.labels / is_human_only path from heuristic_assess() as the reference
behavior. Then remove the now-redundant downstream label_set/is_human_only
computation so the label always wins regardless of body content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: db747ca8-008a-4b76-a0d7-2c2c6935c56f

📥 Commits

Reviewing files that changed from the base of the PR and between 87142b7 and cd3661a.

⛔ Files ignored due to path filters (4)
  • .claude/commands/ingest-review.md is excluded by !.claude/** and included by none
  • .claude/rules/architecture.md is excluded by !.claude/** and included by none
  • docs/error-handling-guidelines.md is excluded by !docs/** and included by none
  • docs/pipeline-determinism.html is excluded by !docs/** and included by none
📒 Files selected for processing (7)
  • sova/core/steps/create_pr.py
  • sova/knowledge/extraction.py
  • sova/roles/reviewer.py
  • sova/roles/triage.py
  • tests/test_core.py
  • tests/test_extraction.py
  • tests/test_roles.py

Comment thread sova/core/steps/create_pr.py
Comment thread sova/roles/triage.py Outdated
- Fix whitespace-only issue body producing empty Context section in PR
  body (create_pr.py)
- Fix agent:human-only label being ignored when issue body is empty
  (triage.py) -- label check now runs before body check
- Reduce cognitive complexity in _heuristic_assess by extracting
  _assess_body_content, _has_criteria_markers, _has_code_references
  helper methods (triage.py)
- Silence unused-parameter warnings in no-op extract_memories by
  referencing retained params (extraction.py)
- Add 14 tests covering all fixed code paths

@xsovad06 xsovad06 left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review for PR #302

The most critical issue is incorrect async context manager syntax in .claude/commands/ingest-review.md (line 30: await get_session() should be get_session()), which will cause runtime errors. Second, the agent:human-only label check in triage.py is unreachable when issue body is empty due to early return ordering (severity 7 logic bug). Third, whitespace-only issue bodies produce dangling '## Context' headers in PR descriptions (severity 6). Both bugs have corresponding test gaps (severity 5 each). Overall, the determinism migration is architecturally sound, but execution has three bugs and two test coverage gaps that need fixing before merge.

7 findings (6 inline, 1 in summary)

Sev Category File Finding
8/10 bug .claude/commands/ingest-review.md:30 Incorrect async context manager syntax: async with await get_session() as session: should be async with get_session() as session:. The await here is incorrect and will cause a runtime error when the command executes. get_session() returns an async context manager, not a coroutine.
7/10 bug sova/roles/triage.py:176 agent:human-only label is ignored when issue body is empty. The early return at line 176 (if not has_body) fires before is_human_only is computed (line 214), so an issue labeled agent:human-only with no body returns needs_spec instead of human_only. This breaks the labeled requirement that human-only issues should skip autonomous processing regardless of body content.
6/10 bug sova/core/steps/create_pr.py:151 Whitespace-only issue body produces an empty '## Context' section in PR body. At line 151, if issue_body: is True for ' ', but after issue_body.strip() at line 152, excerpt becomes ''. Lines 152-153 then append ['## Context', '', '', ''], leaving a dangling header with no content in the PR description.
5/10 testing tests/test_roles.py:1823 Missing test case for agent:human-only label with empty body. The new test test_triage_human_only_label uses body='Something detailed enough.', which has content. The edge case where labels=['agent:human-only'] but body='' is not covered, so the bug at triage.py:176 (early return before label check) goes undetected.
5/10 testing tests/test_core.py:2833 Missing test case for whitespace-only issue body. Test test_pr_body_omits_context_when_no_issue_body covers body='', but the CodeRabbit-identified bug (whitespace-only body like ' ' producing empty Context section) is not tested. This edge case is explicitly mentioned in the spec but has no test coverage.
4/10 design sova/roles/triage.py:222 Inconsistent complexity estimation logic: _estimate_complexity() checks for keywords like 'migration', 'refactor', 'breaking change', 'epic' but these checks are case-sensitive after body.lower() is passed in. However, the function is called with the original body (line 222), not body_lower, so keyword matching will fail for issues with 'Migration' or 'EPIC' in uppercase.
3/10 design sova/knowledge/extraction.py:67 No-op function logs at DEBUG level but caller contexts (ExtractMemoryStep, ReviewerRole) may expect INFO-level confirmation that extraction was skipped. The original implementation logged at INFO level when learnings were extracted or when none were found. Now it's silent except for DEBUG, which may make it harder to trace why no memories are being captured during debugging.

Findings not on changed lines

1. [5/10] [testing] tests/test_core.py:2833

Missing test case for whitespace-only issue body. Test test_pr_body_omits_context_when_no_issue_body covers body='', but the CodeRabbit-identified bug (whitespace-only body like ' ' producing empty Context section) is not tested. This edge case is explicitly mentioned in the spec but has no test coverage.

Fix: Add a test case async def test_pr_body_omits_context_when_whitespace_only_body() that creates a Task with body=' ' (whitespace only) and asserts '## Context' not in body. This will fail with current code at create_pr.py:151, confirming the bug.


Assessment: BLOCK -- critical issues must be fixed before merge

Comment thread .claude/commands/ingest-review.md
Comment thread sova/roles/triage.py
Comment thread sova/core/steps/create_pr.py Outdated
Comment thread tests/test_roles.py
Comment thread sova/roles/triage.py Outdated
Comment thread sova/knowledge/extraction.py Outdated
@xsovad06 xsovad06 dismissed coderabbitai[bot]’s stale review July 2, 2026 23:06

Findings addressed in latest push.

@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(core): replace LLM calls with deterministic code in pipeline operations

1 participant