Merge next into main for 0.28.0 release by mohanagy · Pull Request #521 · mohanagy/madar

mohanagy · 2026-06-10T09:18:57Z

Summary

Merge the current next line into main for the next stable release.
Includes strict runtime-proof benchmark gating, public TypeScript explain-runtime receipts, benchmark-suite hardening, and agent/retrieval fixes.
Brings the checked-in 6/6 legacy public TypeScript explain-runtime full_win receipts into the stable branch.

Notes

This PR should merge before the stacked release metadata PR.
Release metadata, version bump, README, and changelog updates are kept in the follow-up PR so the release commit is easy to review.

Verification

CI on the merged PR fix: complete proof-backed public explain-runtime full-win rows #520 was green across Ubuntu/macOS/Windows on Node 20/22.
Release PR will run package/version/docs verification separately.

Summary by CodeRabbit

New Features
- Added runtime proof validation framework for benchmarks, including obligation-based evidence assessment across multiple repositories.
- Extended benchmark suite with scoped workspace support (graphRoot) for monorepo configurations.
- Implemented isolated execution profiles for reproducible benchmark runs with fail-fast authentication checks.
Documentation
- Updated benchmark methodology with guidance on repository scoping, execution behavior, and isolation mode.
- Expanded benchmark results documentation with new public "explain-runtime" trials across multiple repositories.
- Refined claims and evidence mapping with stricter validation requirements.
Bug Fixes & Improvements
- Fixed file-stem collision handling in code extraction for better identifier stability.
- Enhanced TypeScript compiler options discovery per workspace subdirectory.
- Improved Next.js convention detection for nested workspace layouts.

Fix benchmark Pages npm links, add git-backed public benchmark rows, and harden suite workspace preparation for public repos. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Publish isolated public explain-runtime receipts for documenso, formbricks, dub, cal-diy, and novu, plus a scoped Twenty benchmark receipt and the docs/tests that link to them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Downgrade non-answer or not-ready compare results to not_measured, require direct evidence for the public explain-runtime gates, surface benchmark outcomes in suite summaries, refresh the final rerun receipts, and remove the superseded invalid 12:xx receipt bundles. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Write native_agent-prompt.txt from the baseline prompt instead of the Madar prompt and refresh the published public-repo receipt artifacts to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sync

…ime-proof Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: enforce strict runtime proof for benchmarks

Add safe relative graphRoot support to benchmark suite repos so large public monorepo rows can generate, install, warm up, and compare from scoped graph roots instead of oversized repo roots. Also reset unsafe repo-local agent config at both the copied parent workspace and scoped root to preserve benchmark isolation for scoped runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move runtime Claude/Cursor state out of the tracked isolation fixture, fail fast when the isolated benchmark profile is not authenticated, and print the exact runtime-profile login command required for measured runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Allow the isolation launcher tests to override the CLI path so CI can exercise the auth-preflight branches without depending on a prebuilt dist tree. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: support scoped benchmark repo roots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: harden partial runtime proof guidance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: harden explain-runtime proof gates

- recover out-of-scope runtime-proof obligation evidence: the first missing-obligation recovery loop now materializes graph nodes outside the initial slice scope, matching the phase-recovery loop - drop dangling stdio relationships: compaction now filters relationships against the retained matched-node ID set before the cap - keep hand-written top-level lib/ TypeScript source discoverable while still hard-ignoring compiled lib output (js/cjs/mjs/d.ts), restoring dub apps/web/lib middleware evidence - carry same-turn retrieve persistence, prompt-contract targeting, routing/tool/latency scoring, SPI cache invalidation, Express and nested Next.js SPI detection fixes from review follow-ups - refresh all six public TypeScript explain-runtime legacy receipts (documenso, formbricks, dub, twenty, cal-diy, novu) with proof-backed full_win bundles generated sequentially from the final binary, and point suite README, claims-and-evidence, and docs tests at them

- thread rootPath through runtime-proof recovery so recovered branch steps emit workspace-relative source files (no mixed path formats in receipts) - use real primary-path boundaries in recovery phase-coverage scoring instead of an empty boundary list - include focused bash follow-ups in prompt-contract follow-up input extraction, matching focused-call classification - activate preserveFinalRuntimeEntrypointContextPreview by removing the self-excluding kept-key filter - decide file-stem uniqueness on normalized ids and disambiguate deterministic collisions (foo-bar.ts vs foo_bar.ts) - include the module stem in the Express analysis cache validity check - fail fast when the benchmark suite is missing the built CLI - regenerate all six public explain-runtime legacy receipts with the final binary; every report is full_win/ready with consistent workspace-relative evidence paths

fix: complete proof-backed public explain-runtime full-win rows

coderabbitai · 2026-06-10T09:19:14Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Benchmark receipts, runtime-proof contracts, isolation behavior, and graph/extraction plumbing were updated together. The PR also refreshes explain-runtime benchmark artifacts and tests for multiple repos.

Changes

Benchmark runtime-proof and suite wiring

Layer / File(s)	Summary
Docs and receipt refresh `docs/benchmarks/suite/README.md`, `docs/benchmarks/suite/methodology.md`, `docs/benchmarks/suite/repos.json`, `docs/benchmarks/suite/runtime-proof.json`, `docs/benchmarks/suite/results/...` , `docs/claims-and-evidence.md`	Benchmark docs, claims guidance, runtime-proof config, and generated receipt artifacts are updated for the newer warm-cache explain-runtime runs.
Suite manifest and isolation wiring `docs/benchmarks/suite/isolation/run-isolated.sh`, `src/infrastructure/benchmark/suite.ts`, `tests/unit/benchmark-suite.test.ts`, `tests/unit/benchmark-suite-isolation-docs.test.ts`	Repo manifests, isolation launcher behavior, and benchmark suite workspace preparation are extended for scoped graph roots, CLI shims, and isolated Claude profile handling.
Runtime-proof contracts and retrieval `src/contracts/context-pack.ts`, `src/contracts/retrieval-gate.ts`, `src/contracts/runtime-proof.ts`, `src/infrastructure/benchmark/runtime-proof.ts`, `src/runtime/retrieval-gate.ts`, `src/runtime/retrieve/slicing.ts`, `src/runtime/runtime-proof.ts`, `src/runtime/context-pack.ts`, `tests/unit/answer-ready-explain-pack.test.ts`, `tests/unit/benchmark-suite.test.ts`	Runtime-proof schema, benchmark profile loading, strict retrieval overrides, retrieval slicing, and answer-ready gating are added for explain-runtime questions.
Compare/report runtime-proof plumbing `src/infrastructure/compare.ts`, `src/infrastructure/context-pack-command.ts`, `src/runtime/stdio/tools.ts`	Compare trace parsing, runtime-proof-aware readiness, prompt generation, and report merging now carry strict explain-runtime evidence through native-agent outputs.
Extraction and SPI graph pipeline `src/pipeline/extract.ts`, `src/pipeline/extract/core.ts`, `src/pipeline/extract/cross-file.ts`, `src/pipeline/extract/frameworks/*`, `src/pipeline/extract/generic.ts`, `src/pipeline/extract/non-code.ts`, `src/pipeline/extract/python-rationale.ts`, `src/pipeline/spi/build.ts`, `src/pipeline/spi/cache.ts`, `src/pipeline/spi/framework-nextjs.ts`, `src/pipeline/spi/projector.ts`, `src/shared/source-discovery.ts`	File-stem-based extraction IDs, cross-file resolution, SPI projection, and SPI cache/config discovery are updated across the extraction and graph-building pipeline.
Runtime context pack shaping `src/infrastructure/context-pack-command.ts`	Answer-ready explain packs and expandable previews are reworked around entrypoint-focused execution-spine context for runtime generation.
Runtime stdio tool overrides `src/runtime/stdio/tools.ts`	The stdio retrieve/context_pack tools now apply strict runtime-proof overrides and emit focused payloads when the matched benchmark profile requires them.
Runtime source discovery and tests `src/shared/source-discovery.ts`, `tests/fixtures/go-semantic-workspace/cmd/chi/main.go`, `tests/unit/benchmark-suite-docs.test.ts`, `tests/unit/benchmark-suite-isolation-docs.test.ts`, `tests/unit/benchmark-suite.test.ts`	Runtime source discovery, helper fixtures, and tests are updated to assert the new benchmark suite, runtime-proof, and isolation behaviors.

Sequence Diagram(s)

sequenceDiagram
  participant CompareTrace
  participant RuntimeProof
  participant ContextPack
  participant Report

  CompareTrace->>RuntimeProof: buildRuntimeProofAssessment(profile, candidates)
  RuntimeProof-->>CompareTrace: obligations, missing_obligations
  CompareTrace->>ContextPack: buildNativeAgentPrompt(strictRuntimeProof)
  CompareTrace->>Report: mergeCompareReportPackWithTraceFollowUps(...)
  Report-->>CompareTrace: answer_contract, execution_slice, readiness

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

mohanagy/madar#517: Overlaps on scoped graphRoot benchmark wiring and the isolation launcher changes in docs/benchmarks/suite/* and src/infrastructure/benchmark/suite.ts.
mohanagy/madar#519: Shares the runtime-proof contract and strict explain-runtime gating work across runtime-proof.json, compare/retrieval plumbing, and benchmark readiness checks.
mohanagy/madar#518: Also changes src/runtime/context-pack.ts to alter partial explain answer-ready handling and missing-phase stop conditions.

Poem

I nibble receipts beneath the moon, 🐇
With graph roots scoped and profiles tuned.
I hop through proofs, both strict and bright,
And stash the clues in share-safe light.
Thump thump — the suite now knows the way!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch next

Merge pull request #521 from mohanagy/next

mohanagy and others added 25 commits June 7, 2026 21:47

feat: expand benchmark suite targets

d98c33e

Fix benchmark Pages npm links, add git-backed public benchmark rows, and harden suite workspace preparation for public repos. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: pin benchmark suite git refs

0d0a5c8

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: publish public benchmark receipts

8161801

Publish isolated public explain-runtime receipts for documenso, formbricks, dub, cal-diy, and novu, plus a scoped Twenty benchmark receipt and the docs/tests that link to them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: keep baseline prompt artifacts clean

def3292

Write native_agent-prompt.txt from the baseline prompt instead of the Madar prompt and refresh the published public-repo receipt artifacts to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: enforce strict runtime proof for benchmarks

c967d3b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge pull request #516 from mohanagy/main

9aae83d

sync

Merge remote-tracking branch 'origin/next' into issue-514-strict-runt…

d47aceb

…ime-proof Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: use rescoped benchmark project root

e197a8d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge pull request #515 from mohanagy/issue-514-strict-runtime-proof

5ae0032

fix: enforce strict runtime proof for benchmarks

test: decouple isolation launcher checks from dist

d65f0e4

Allow the isolation launcher tests to override the CLI path so CI can exercise the auth-preflight branches without depending on a prebuilt dist tree. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: sandbox isolation profile root in launcher tests

b6c9af6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge pull request #517 from mohanagy/issue-514-scoped-benchmark-roots

00acdb3

feat: support scoped benchmark repo roots

fix: harden partial runtime proof guidance

497d018

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge pull request #518 from mohanagy/benchmark-full-win-followup

2bee5ef

fix: harden partial runtime proof guidance

fix: harden explain-runtime proof gates

55be19c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: relax implement timeout race budget

ac9980f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge pull request #519 from mohanagy/benchmark-proof-completeness

92bd488

fix: harden explain-runtime proof gates

test: make graph-scope and duplicate-stem path assertions platform-aware

2f02a15

fix: honor MADAR_BENCH_CLI_PATH override so suite tests run before build

6b091dd

Merge pull request #520 from mohanagy/benchmark-public-full-win

bf5e70c

fix: complete proof-backed public explain-runtime full-win rows

mohanagy mentioned this pull request Jun 10, 2026

chore: prepare 0.28.0 release #522

Merged

mohanagy merged commit 2199d48 into main Jun 10, 2026
6 of 7 checks passed

mohanagy added a commit that referenced this pull request Jun 10, 2026

Merge pull request #523 from mohanagy/main

e1dc751

Merge pull request #521 from mohanagy/next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge next into main for 0.28.0 release#521

Merge next into main for 0.28.0 release#521
mohanagy merged 25 commits into
mainfrom
next

mohanagy commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading