Skip to content

feat: add convergence exit condition to peer review loop#63

Open
simonabler wants to merge 3 commits intomainfrom
feat/convergence-exit-condition
Open

feat: add convergence exit condition to peer review loop#63
simonabler wants to merge 3 commits intomainfrom
feat/convergence-exit-condition

Conversation

@simonabler
Copy link
Copy Markdown
Owner

Prevents the coder↔reviewer loop from spinning indefinitely by implementing three deterministic exit conditions (in priority order):

  1. Max rework cycles (hard ceiling) → ESCALATE_TESTING [existing]
  2. Diff-delta convergence: if SequenceMatcher ratio of successive git diffs >= 0.97 after ≥2 rework cycles → ESCALATE_CONVERGENCE → tester. The coder is no longer making meaningful changes; the reviewer is going in circles. Hand off to the objective tester.
  3. Second+ rework cycle → REWORK_VIA_TESTER: run tester before sending back to coder so a deterministic pass/fail can short-circuit further subjective reviewer loops.
  4. First rework → CODING (fast path, no tester overhead).

Implementation:

  • TodoItem.last_coder_diff: git diff HEAD snapshot stored by coder after each pass, used by _check_convergence on the next cycle.
  • GraphState.convergence_detected: surfaced to UI/logs for visibility.
  • _check_convergence(): difflib.SequenceMatcher comparison helper.

Addresses: STRAT-OC-003b (feedback loop without convergence check)

Tests: 20 new tests in tests/test_convergence.py covering _check_convergence unit tests and all four peer_review routing paths.

claude added 3 commits March 23, 2026 17:45
Prevents the coder↔reviewer loop from spinning indefinitely by
implementing three deterministic exit conditions (in priority order):

1. Max rework cycles (hard ceiling) → ESCALATE_TESTING  [existing]
2. Diff-delta convergence: if SequenceMatcher ratio of successive
   git diffs >= 0.97 after ≥2 rework cycles → ESCALATE_CONVERGENCE
   → tester. The coder is no longer making meaningful changes; the
   reviewer is going in circles. Hand off to the objective tester.
3. Second+ rework cycle → REWORK_VIA_TESTER: run tester before
   sending back to coder so a deterministic pass/fail can
   short-circuit further subjective reviewer loops.
4. First rework → CODING (fast path, no tester overhead).

Implementation:
- TodoItem.last_coder_diff: git diff HEAD snapshot stored by coder
  after each pass, used by _check_convergence on the next cycle.
- GraphState.convergence_detected: surfaced to UI/logs for visibility.
- _check_convergence(): difflib.SequenceMatcher comparison helper.

Addresses: STRAT-OC-003b (feedback loop without convergence check)

Tests: 20 new tests in tests/test_convergence.py covering
_check_convergence unit tests and all four peer_review routing paths.
…001)

Addresses findings:
  STRAT-COMP-001  Unsupervised chain with silent error propagation
  STRAT-COMP-004  Silent errors across data boundaries
  STRAT-SI-001a   Unvalidated error boundary planner → coder
  STRAT-SI-002    Silent error swallowing in catch-all handler

Design
------
Four validate_node instances (one per major handoff) sit between agents
in the graph. Each enforces a named contract — a set of invariants that
MUST hold before the downstream node runs. On failure, state is routed
to error_handler_node (terminal, clean) instead of silently propagating
bad state through the chain.

Handoff contracts:
  planner_to_coder       — todo_items not empty, valid task types,
                           current item in bounds + has description,
                           no unhandled error_message on state
  coder_to_peer_review   — coder result not empty, item in bounds
  peer_review_to_learn   — peer review notes not empty (when reviewing)
  tester_to_decide       — test result not empty

New files:
  app/core/nodes/validation.py  — Contract, ValidationFailure, check
    functions, validate_node, error_handler_node
  tests/test_validation.py      — 52 tests covering all checks,
    validate_node contracts, error_handler, state fields, routing

Orchestrator changes:
  - _make_validate_node(contract) factory injects contract name
  - _route_after_validate(next_node) routes pass→next or fail→error
  - Four validate_* nodes wired between planner/coder/reviewer/tester
  - error_handler node added with edge to END

Planner fix (STRAT-SI-002):
  - LLM invocation now wrapped with explicit except blocks
  - BudgetExceededException and generic Exception both set
    error_message + phase=STOPPED instead of propagating silently

Test updates:
  - 5 existing test files updated to expect validation node names
    in routing assertions (e.g. 'validate_planner_to_coder' not 'coder')
Addresses findings:
  STRAT-DC-003  Unobserved decision point in delegation chain
  STRAT-SI-007  Output aggregation loses source attribution

Design
------
Every coder pass and every coder-assignment decision is now recorded
in append-only audit logs on GraphState, making the full delegation
chain observable and every output attributable.

New state models (app/core/state.py):
  CoderOutput   — provenance envelope: agent, item_id, iteration,
                  rework_cycle, timestamp, result_summary (300 chars),
                  verdict (back-filled by peer_review)
  CoderDecision — audit record: trigger, from_coder, to_coder,
                  item_id, iteration, rework_cycle, timestamp, reason

New GraphState fields:
  agent_output_log    — list[CoderOutput], append-only
  coder_decision_log  — list[CoderDecision], append-only
  Both survive checkpoint/resume (Pydantic model_dump round-trip).

Instrumentation:
  coder.py      — emits CoderDecision (trigger: item_start|rework) and
                  CoderOutput at the end of each pass; both appended to
                  their respective logs in the returned updates dict
  reviewer.py   — stamps verdict back onto the last CoderOutput after
                  peer review, so each output record carries its final
                  disposition (APPROVE / REWORK / ESCALATE_*)
  committer.py  — emits CoderDecision (trigger: item_advance) when
                  advancing to the next item, recording the switch
                  between coder_a and coder_b

Tests: 24 tests in tests/test_provenance.py covering model fields,
checkpoint round-trips, coder_node instrumentation, reviewer verdict
stamping, and committer item-advance decision logging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants