Skip to content

Merge next into main for 0.28.0 release#521

Merged
mohanagy merged 25 commits into
mainfrom
next
Jun 10, 2026
Merged

Merge next into main for 0.28.0 release#521
mohanagy merged 25 commits into
mainfrom
next

Conversation

@mohanagy

@mohanagy mohanagy commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

  • Merge the current next line into main for the next stable release.
  • Includes strict runtime-proof benchmark gating, public TypeScript explain-runtime receipts, benchmark-suite hardening, and agent/retrieval fixes.
  • Brings the checked-in 6/6 legacy public TypeScript explain-runtime full_win receipts into the stable branch.

Notes

  • This PR should merge before the stacked release metadata PR.
  • Release metadata, version bump, README, and changelog updates are kept in the follow-up PR so the release commit is easy to review.

Verification

Summary by CodeRabbit

  • New Features

    • Added runtime proof validation framework for benchmarks, including obligation-based evidence assessment across multiple repositories.
    • Extended benchmark suite with scoped workspace support (graphRoot) for monorepo configurations.
    • Implemented isolated execution profiles for reproducible benchmark runs with fail-fast authentication checks.
  • Documentation

    • Updated benchmark methodology with guidance on repository scoping, execution behavior, and isolation mode.
    • Expanded benchmark results documentation with new public "explain-runtime" trials across multiple repositories.
    • Refined claims and evidence mapping with stricter validation requirements.
  • Bug Fixes & Improvements

    • Fixed file-stem collision handling in code extraction for better identifier stability.
    • Enhanced TypeScript compiler options discovery per workspace subdirectory.
    • Improved Next.js convention detection for nested workspace layouts.

mohanagy and others added 25 commits June 7, 2026 21:47
Fix benchmark Pages npm links, add git-backed public benchmark rows, and harden suite workspace preparation for public repos.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Publish isolated public explain-runtime receipts for documenso, formbricks, dub, cal-diy, and novu, plus a scoped Twenty benchmark receipt and the docs/tests that link to them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Downgrade non-answer or not-ready compare results to not_measured, require direct evidence for the public explain-runtime gates, surface benchmark outcomes in suite summaries, refresh the final rerun receipts, and remove the superseded invalid 12:xx receipt bundles.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Write native_agent-prompt.txt from the baseline prompt instead of the Madar prompt and refresh the published public-repo receipt artifacts to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ime-proof

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix: enforce strict runtime proof for benchmarks
Add safe relative graphRoot support to benchmark suite repos so large public monorepo rows can generate, install, warm up, and compare from scoped graph roots instead of oversized repo roots.

Also reset unsafe repo-local agent config at both the copied parent workspace and scoped root to preserve benchmark isolation for scoped runs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move runtime Claude/Cursor state out of the tracked isolation fixture, fail fast when the isolated benchmark profile is not authenticated, and print the exact runtime-profile login command required for measured runs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow the isolation launcher tests to override the CLI path so CI can exercise the auth-preflight branches without depending on a prebuilt dist tree.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix: harden partial runtime proof guidance
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix: harden explain-runtime proof gates
- recover out-of-scope runtime-proof obligation evidence: the first
  missing-obligation recovery loop now materializes graph nodes outside
  the initial slice scope, matching the phase-recovery loop
- drop dangling stdio relationships: compaction now filters
  relationships against the retained matched-node ID set before the cap
- keep hand-written top-level lib/ TypeScript source discoverable while
  still hard-ignoring compiled lib output (js/cjs/mjs/d.ts), restoring
  dub apps/web/lib middleware evidence
- carry same-turn retrieve persistence, prompt-contract targeting,
  routing/tool/latency scoring, SPI cache invalidation, Express and
  nested Next.js SPI detection fixes from review follow-ups
- refresh all six public TypeScript explain-runtime legacy receipts
  (documenso, formbricks, dub, twenty, cal-diy, novu) with proof-backed
  full_win bundles generated sequentially from the final binary, and
  point suite README, claims-and-evidence, and docs tests at them
- thread rootPath through runtime-proof recovery so recovered branch
  steps emit workspace-relative source files (no mixed path formats in
  receipts)
- use real primary-path boundaries in recovery phase-coverage scoring
  instead of an empty boundary list
- include focused bash follow-ups in prompt-contract follow-up input
  extraction, matching focused-call classification
- activate preserveFinalRuntimeEntrypointContextPreview by removing the
  self-excluding kept-key filter
- decide file-stem uniqueness on normalized ids and disambiguate
  deterministic collisions (foo-bar.ts vs foo_bar.ts)
- include the module stem in the Express analysis cache validity check
- fail fast when the benchmark suite is missing the built CLI
- regenerate all six public explain-runtime legacy receipts with the
  final binary; every report is full_win/ready with consistent
  workspace-relative evidence paths
fix: complete proof-backed public explain-runtime full-win rows
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Benchmark receipts, runtime-proof contracts, isolation behavior, and graph/extraction plumbing were updated together. The PR also refreshes explain-runtime benchmark artifacts and tests for multiple repos.

Changes

Benchmark runtime-proof and suite wiring

Layer / File(s) Summary
Docs and receipt refresh
docs/benchmarks/suite/README.md, docs/benchmarks/suite/methodology.md, docs/benchmarks/suite/repos.json, docs/benchmarks/suite/runtime-proof.json, docs/benchmarks/suite/results/... , docs/claims-and-evidence.md
Benchmark docs, claims guidance, runtime-proof config, and generated receipt artifacts are updated for the newer warm-cache explain-runtime runs.
Suite manifest and isolation wiring
docs/benchmarks/suite/isolation/run-isolated.sh, src/infrastructure/benchmark/suite.ts, tests/unit/benchmark-suite.test.ts, tests/unit/benchmark-suite-isolation-docs.test.ts
Repo manifests, isolation launcher behavior, and benchmark suite workspace preparation are extended for scoped graph roots, CLI shims, and isolated Claude profile handling.
Runtime-proof contracts and retrieval
src/contracts/context-pack.ts, src/contracts/retrieval-gate.ts, src/contracts/runtime-proof.ts, src/infrastructure/benchmark/runtime-proof.ts, src/runtime/retrieval-gate.ts, src/runtime/retrieve/slicing.ts, src/runtime/runtime-proof.ts, src/runtime/context-pack.ts, tests/unit/answer-ready-explain-pack.test.ts, tests/unit/benchmark-suite.test.ts
Runtime-proof schema, benchmark profile loading, strict retrieval overrides, retrieval slicing, and answer-ready gating are added for explain-runtime questions.
Compare/report runtime-proof plumbing
src/infrastructure/compare.ts, src/infrastructure/context-pack-command.ts, src/runtime/stdio/tools.ts
Compare trace parsing, runtime-proof-aware readiness, prompt generation, and report merging now carry strict explain-runtime evidence through native-agent outputs.
Extraction and SPI graph pipeline
src/pipeline/extract.ts, src/pipeline/extract/core.ts, src/pipeline/extract/cross-file.ts, src/pipeline/extract/frameworks/*, src/pipeline/extract/generic.ts, src/pipeline/extract/non-code.ts, src/pipeline/extract/python-rationale.ts, src/pipeline/spi/build.ts, src/pipeline/spi/cache.ts, src/pipeline/spi/framework-nextjs.ts, src/pipeline/spi/projector.ts, src/shared/source-discovery.ts
File-stem-based extraction IDs, cross-file resolution, SPI projection, and SPI cache/config discovery are updated across the extraction and graph-building pipeline.
Runtime context pack shaping
src/infrastructure/context-pack-command.ts
Answer-ready explain packs and expandable previews are reworked around entrypoint-focused execution-spine context for runtime generation.
Runtime stdio tool overrides
src/runtime/stdio/tools.ts
The stdio retrieve/context_pack tools now apply strict runtime-proof overrides and emit focused payloads when the matched benchmark profile requires them.
Runtime source discovery and tests
src/shared/source-discovery.ts, tests/fixtures/go-semantic-workspace/cmd/chi/main.go, tests/unit/benchmark-suite-docs.test.ts, tests/unit/benchmark-suite-isolation-docs.test.ts, tests/unit/benchmark-suite.test.ts
Runtime source discovery, helper fixtures, and tests are updated to assert the new benchmark suite, runtime-proof, and isolation behaviors.

Sequence Diagram(s)

sequenceDiagram
  participant CompareTrace
  participant RuntimeProof
  participant ContextPack
  participant Report

  CompareTrace->>RuntimeProof: buildRuntimeProofAssessment(profile, candidates)
  RuntimeProof-->>CompareTrace: obligations, missing_obligations
  CompareTrace->>ContextPack: buildNativeAgentPrompt(strictRuntimeProof)
  CompareTrace->>Report: mergeCompareReportPackWithTraceFollowUps(...)
  Report-->>CompareTrace: answer_contract, execution_slice, readiness
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • mohanagy/madar#517: Overlaps on scoped graphRoot benchmark wiring and the isolation launcher changes in docs/benchmarks/suite/* and src/infrastructure/benchmark/suite.ts.
  • mohanagy/madar#519: Shares the runtime-proof contract and strict explain-runtime gating work across runtime-proof.json, compare/retrieval plumbing, and benchmark readiness checks.
  • mohanagy/madar#518: Also changes src/runtime/context-pack.ts to alter partial explain answer-ready handling and missing-phase stop conditions.

Poem

I nibble receipts beneath the moon, 🐇
With graph roots scoped and profiles tuned.
I hop through proofs, both strict and bright,
And stash the clues in share-safe light.
Thump thump — the suite now knows the way!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch next

@mohanagy mohanagy merged commit 2199d48 into main Jun 10, 2026
6 of 7 checks passed
mohanagy added a commit that referenced this pull request Jun 10, 2026
Merge pull request #521 from mohanagy/next
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant