feat: add convergence exit condition to peer review loop#63
Open
simonabler wants to merge 3 commits intomainfrom
Open
feat: add convergence exit condition to peer review loop#63simonabler wants to merge 3 commits intomainfrom
simonabler wants to merge 3 commits intomainfrom
Conversation
Prevents the coder↔reviewer loop from spinning indefinitely by implementing three deterministic exit conditions (in priority order): 1. Max rework cycles (hard ceiling) → ESCALATE_TESTING [existing] 2. Diff-delta convergence: if SequenceMatcher ratio of successive git diffs >= 0.97 after ≥2 rework cycles → ESCALATE_CONVERGENCE → tester. The coder is no longer making meaningful changes; the reviewer is going in circles. Hand off to the objective tester. 3. Second+ rework cycle → REWORK_VIA_TESTER: run tester before sending back to coder so a deterministic pass/fail can short-circuit further subjective reviewer loops. 4. First rework → CODING (fast path, no tester overhead). Implementation: - TodoItem.last_coder_diff: git diff HEAD snapshot stored by coder after each pass, used by _check_convergence on the next cycle. - GraphState.convergence_detected: surfaced to UI/logs for visibility. - _check_convergence(): difflib.SequenceMatcher comparison helper. Addresses: STRAT-OC-003b (feedback loop without convergence check) Tests: 20 new tests in tests/test_convergence.py covering _check_convergence unit tests and all four peer_review routing paths.
…001)
Addresses findings:
STRAT-COMP-001 Unsupervised chain with silent error propagation
STRAT-COMP-004 Silent errors across data boundaries
STRAT-SI-001a Unvalidated error boundary planner → coder
STRAT-SI-002 Silent error swallowing in catch-all handler
Design
------
Four validate_node instances (one per major handoff) sit between agents
in the graph. Each enforces a named contract — a set of invariants that
MUST hold before the downstream node runs. On failure, state is routed
to error_handler_node (terminal, clean) instead of silently propagating
bad state through the chain.
Handoff contracts:
planner_to_coder — todo_items not empty, valid task types,
current item in bounds + has description,
no unhandled error_message on state
coder_to_peer_review — coder result not empty, item in bounds
peer_review_to_learn — peer review notes not empty (when reviewing)
tester_to_decide — test result not empty
New files:
app/core/nodes/validation.py — Contract, ValidationFailure, check
functions, validate_node, error_handler_node
tests/test_validation.py — 52 tests covering all checks,
validate_node contracts, error_handler, state fields, routing
Orchestrator changes:
- _make_validate_node(contract) factory injects contract name
- _route_after_validate(next_node) routes pass→next or fail→error
- Four validate_* nodes wired between planner/coder/reviewer/tester
- error_handler node added with edge to END
Planner fix (STRAT-SI-002):
- LLM invocation now wrapped with explicit except blocks
- BudgetExceededException and generic Exception both set
error_message + phase=STOPPED instead of propagating silently
Test updates:
- 5 existing test files updated to expect validation node names
in routing assertions (e.g. 'validate_planner_to_coder' not 'coder')
Addresses findings:
STRAT-DC-003 Unobserved decision point in delegation chain
STRAT-SI-007 Output aggregation loses source attribution
Design
------
Every coder pass and every coder-assignment decision is now recorded
in append-only audit logs on GraphState, making the full delegation
chain observable and every output attributable.
New state models (app/core/state.py):
CoderOutput — provenance envelope: agent, item_id, iteration,
rework_cycle, timestamp, result_summary (300 chars),
verdict (back-filled by peer_review)
CoderDecision — audit record: trigger, from_coder, to_coder,
item_id, iteration, rework_cycle, timestamp, reason
New GraphState fields:
agent_output_log — list[CoderOutput], append-only
coder_decision_log — list[CoderDecision], append-only
Both survive checkpoint/resume (Pydantic model_dump round-trip).
Instrumentation:
coder.py — emits CoderDecision (trigger: item_start|rework) and
CoderOutput at the end of each pass; both appended to
their respective logs in the returned updates dict
reviewer.py — stamps verdict back onto the last CoderOutput after
peer review, so each output record carries its final
disposition (APPROVE / REWORK / ESCALATE_*)
committer.py — emits CoderDecision (trigger: item_advance) when
advancing to the next item, recording the switch
between coder_a and coder_b
Tests: 24 tests in tests/test_provenance.py covering model fields,
checkpoint round-trips, coder_node instrumentation, reviewer verdict
stamping, and committer item-advance decision logging.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prevents the coder↔reviewer loop from spinning indefinitely by implementing three deterministic exit conditions (in priority order):
Implementation:
Addresses: STRAT-OC-003b (feedback loop without convergence check)
Tests: 20 new tests in tests/test_convergence.py covering _check_convergence unit tests and all four peer_review routing paths.