fix: harden partial runtime proof guidance by mohanagy · Pull Request #518 · mohanagy/madar

mohanagy · 2026-06-08T08:54:44Z

Summary

harden partial runtime answer contracts so partial slices explicitly say not enough evidence and name missing phases
emit answer_ready barriers for medium-confidence partial explain slices instead of dropping guidance entirely
inject strict missing-phase proof guidance only for degraded/not-ready backend runtime packs, with regression coverage for degraded vs ready prompt behavior

Testing

npm run test:run -- tests/unit/compare-native-agent.test.ts tests/unit/answer-ready-explain-pack.test.ts tests/unit/retrieve-production-correctness.test.ts tests/unit/compare.test.ts
npm run typecheck
npm run build
CI=1 npm run test:run

Benchmark note

A fresh non-isolated local diagnostic rerun still does not get the public six-repo runtime suite to 6/6 full wins. formbricks remains the strongest row, while documenso, twenty, and novu still hit readiness blockers and dub / cal-diy still need separate prompt-contract work.

Summary by CodeRabbit

Bug Fixes
- Improved detection and reporting of missing execution phases during partial runtime analysis
- Enhanced uncertainty messaging to include specific missing phase details when available
- Refined handling of degraded readiness states with better phase visibility
New Features
- Extended medium-confidence explanation support with appropriate uncertainty caveats
- Added targeted follow-up prompts to guide users toward addressing execution gaps

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-06-08T08:54:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: acc5f4cf-e0ba-48ce-b0ec-0819c5f1064c

📥 Commits

Reviewing files that changed from the base of the PR and between 00acdb3 and 497d018.

📒 Files selected for processing (6)

src/infrastructure/compare.ts
src/runtime/context-pack.ts
src/runtime/retrieve.ts
tests/unit/answer-ready-explain-pack.test.ts
tests/unit/compare-native-agent.test.ts
tests/unit/retrieve-production-correctness.test.ts

📝 Walkthrough

Walkthrough

This PR refines how missing runtime phases are surfaced in explain-slice readiness assessments and compare prompt guidance. It enables answer-ready summaries for explain slices at medium confidence when partial, injects missing-phase follow-up prompts, and conditions strict runtime-proof guidance on degraded benchmark readiness with identified missing phases.

Changes

Missing-phase explain readiness and prompt guidance

Layer / File(s)	Summary
Compare prompt generation: missing-phase triggers and readiness gating `src/infrastructure/compare.ts`	`answerContractInstructions` now relies on `do_not_claim` containing `full_runtime_certainty_when_slice_is_partial` to add missing-phase instruction; `strictRuntimeProofPromptOptions` computes `missingPhases` upfront and enables strict-proof options when `benchmarkReadiness.status !== 'ready'` and missing phases exist.
Answer-ready generation for medium-confidence explain slices with missing-phase details `src/runtime/context-pack.ts`	`generateAnswerReadyFromExecutionSlice` now generates explain-ready results at `medium` confidence when the slice is `partial`, with `stop_condition` and `allowed_followups` branching on partial barrier state and missing-phase list.
Uncertainty notes with missing-phase specifics `src/runtime/retrieve.ts`	`buildRuntimeGenerationAnswerContract` now includes missing-phase names in `uncertainty_notes` when phases are absent, or uses generic "not enough evidence" when no specific phases are missing.
Unit tests for missing-phase explain and readiness gating `tests/unit/answer-ready-explain-pack.test.ts`, `tests/unit/compare-native-agent.test.ts`, `tests/unit/retrieve-production-correctness.test.ts`	Validates medium-confidence explain answer-ready generation with missing-phase follow-ups, strict runtime-proof guidance injection under degraded vs. ready readiness states, and missing-phase uncertainty messaging.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Strict runtime-proof mode for flow benchmarks #514: Implements the strict runtime-proof behavior for missing phases, including the core logic for strictRuntimeProofPromptOptions, missing-phase messaging in buildRuntimeGenerationAnswerContract, and readiness-based gating with forceForDegradedReadiness and partial explain handling.

Possibly related PRs

mohanagy/madar#364: Modifies src/infrastructure/compare.ts and tests/unit/compare-native-agent.test.ts to handle degraded compare behavior; this PR directly extends that degraded-readiness handling to gate strict missing-phase runtime-proof guidance.
mohanagy/madar#394: Earlier implementation of answer-ready and stop-condition logic for explain slices; this PR builds on that foundation by extending it to medium confidence with partial barriers and missing-phase follow-ups.
mohanagy/madar#271: Overhauls runtime phase taxonomy and missing-phase computation; this PR uses the computed missingPhases to drive uncertainty messaging and prompt guidance.

Poem

🐰 A rabbit hops through explain's darkened wood,
finds medium-phase maps where bright once stood—
when phases vanish, guides now clearly say,
"what's missing here? Let's seek a better way."
With degraded readiness and cautious proof,
we answer wisely from each slanted roof. 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'fix: harden partial runtime proof guidance' clearly describes the main change—hardening partial runtime proof guidance with improved messaging and conditional logic.
Description check	✅ Passed	The PR description includes a clear Summary section explaining the key changes, Testing section with specific test commands and build verification, and a Benchmark note providing context. All required template sections are adequately covered.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch benchmark-full-win-followup

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix: harden partial runtime proof guidance

497d018

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mohanagy merged commit 2bee5ef into next Jun 8, 2026
7 checks passed

mohanagy deleted the benchmark-full-win-followup branch June 8, 2026 09:03

This was referenced Jun 8, 2026

fix: harden explain-runtime proof gates #519

Merged

Merge next into main for 0.28.0 release #521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden partial runtime proof guidance#518

fix: harden partial runtime proof guidance#518
mohanagy merged 1 commit into
nextfrom
benchmark-full-win-followup

mohanagy commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Benchmark note

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading