Skip to content

fix: harden partial runtime proof guidance#518

Merged
mohanagy merged 1 commit into
nextfrom
benchmark-full-win-followup
Jun 8, 2026
Merged

fix: harden partial runtime proof guidance#518
mohanagy merged 1 commit into
nextfrom
benchmark-full-win-followup

Conversation

@mohanagy

@mohanagy mohanagy commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

  • harden partial runtime answer contracts so partial slices explicitly say not enough evidence and name missing phases
  • emit answer_ready barriers for medium-confidence partial explain slices instead of dropping guidance entirely
  • inject strict missing-phase proof guidance only for degraded/not-ready backend runtime packs, with regression coverage for degraded vs ready prompt behavior

Testing

  • npm run test:run -- tests/unit/compare-native-agent.test.ts tests/unit/answer-ready-explain-pack.test.ts tests/unit/retrieve-production-correctness.test.ts tests/unit/compare.test.ts
  • npm run typecheck
  • npm run build
  • CI=1 npm run test:run

Benchmark note

  • A fresh non-isolated local diagnostic rerun still does not get the public six-repo runtime suite to 6/6 full wins. formbricks remains the strongest row, while documenso, twenty, and novu still hit readiness blockers and dub / cal-diy still need separate prompt-contract work.

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection and reporting of missing execution phases during partial runtime analysis
    • Enhanced uncertainty messaging to include specific missing phase details when available
    • Refined handling of degraded readiness states with better phase visibility
  • New Features

    • Extended medium-confidence explanation support with appropriate uncertainty caveats
    • Added targeted follow-up prompts to guide users toward addressing execution gaps

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: acc5f4cf-e0ba-48ce-b0ec-0819c5f1064c

📥 Commits

Reviewing files that changed from the base of the PR and between 00acdb3 and 497d018.

📒 Files selected for processing (6)
  • src/infrastructure/compare.ts
  • src/runtime/context-pack.ts
  • src/runtime/retrieve.ts
  • tests/unit/answer-ready-explain-pack.test.ts
  • tests/unit/compare-native-agent.test.ts
  • tests/unit/retrieve-production-correctness.test.ts

📝 Walkthrough

Walkthrough

This PR refines how missing runtime phases are surfaced in explain-slice readiness assessments and compare prompt guidance. It enables answer-ready summaries for explain slices at medium confidence when partial, injects missing-phase follow-up prompts, and conditions strict runtime-proof guidance on degraded benchmark readiness with identified missing phases.

Changes

Missing-phase explain readiness and prompt guidance

Layer / File(s) Summary
Compare prompt generation: missing-phase triggers and readiness gating
src/infrastructure/compare.ts
answerContractInstructions now relies on do_not_claim containing full_runtime_certainty_when_slice_is_partial to add missing-phase instruction; strictRuntimeProofPromptOptions computes missingPhases upfront and enables strict-proof options when benchmarkReadiness.status !== 'ready' and missing phases exist.
Answer-ready generation for medium-confidence explain slices with missing-phase details
src/runtime/context-pack.ts
generateAnswerReadyFromExecutionSlice now generates explain-ready results at medium confidence when the slice is partial, with stop_condition and allowed_followups branching on partial barrier state and missing-phase list.
Uncertainty notes with missing-phase specifics
src/runtime/retrieve.ts
buildRuntimeGenerationAnswerContract now includes missing-phase names in uncertainty_notes when phases are absent, or uses generic "not enough evidence" when no specific phases are missing.
Unit tests for missing-phase explain and readiness gating
tests/unit/answer-ready-explain-pack.test.ts, tests/unit/compare-native-agent.test.ts, tests/unit/retrieve-production-correctness.test.ts
Validates medium-confidence explain answer-ready generation with missing-phase follow-ups, strict runtime-proof guidance injection under degraded vs. ready readiness states, and missing-phase uncertainty messaging.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • Strict runtime-proof mode for flow benchmarks #514: Implements the strict runtime-proof behavior for missing phases, including the core logic for strictRuntimeProofPromptOptions, missing-phase messaging in buildRuntimeGenerationAnswerContract, and readiness-based gating with forceForDegradedReadiness and partial explain handling.

Possibly related PRs

  • mohanagy/madar#364: Modifies src/infrastructure/compare.ts and tests/unit/compare-native-agent.test.ts to handle degraded compare behavior; this PR directly extends that degraded-readiness handling to gate strict missing-phase runtime-proof guidance.
  • mohanagy/madar#394: Earlier implementation of answer-ready and stop-condition logic for explain slices; this PR builds on that foundation by extending it to medium confidence with partial barriers and missing-phase follow-ups.
  • mohanagy/madar#271: Overhauls runtime phase taxonomy and missing-phase computation; this PR uses the computed missingPhases to drive uncertainty messaging and prompt guidance.

Poem

🐰 A rabbit hops through explain's darkened wood,
finds medium-phase maps where bright once stood—
when phases vanish, guides now clearly say,
"what's missing here? Let's seek a better way."
With degraded readiness and cautious proof,
we answer wisely from each slanted roof. 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix: harden partial runtime proof guidance' clearly describes the main change—hardening partial runtime proof guidance with improved messaging and conditional logic.
Description check ✅ Passed The PR description includes a clear Summary section explaining the key changes, Testing section with specific test commands and build verification, and a Benchmark note providing context. All required template sections are adequately covered.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch benchmark-full-win-followup

Comment @coderabbitai help to get the list of available commands and usage tips.

@mohanagy mohanagy merged commit 2bee5ef into next Jun 8, 2026
7 checks passed
@mohanagy mohanagy deleted the benchmark-full-win-followup branch June 8, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant