fix: Cat9b None-vs-0.0, flat_rarity_mode flag, spec caveats, starter corpus, v0.2.0 by jphein · Pull Request #28 · M0nkeyFl0wer/multipass-structural-memory-eval

jphein · 2026-05-23T20:11:35Z

Summary

Ports five fixes from techempower-org/multipass-structural-memory-eval#14 (fork issues #5-10) that shipped on the fork but were never PR'd upstream. All are small, self-contained, and independently testable.

Cat 9b None vs 0.0 semantic fix — call_through_rate now returns None when the adapter declares no harness manifest, distinguishing "not measured" from "every probe failed" (techempower-org/multipass-structural-memory-eval#7)
flat_rarity_mode flag on gap detection — surfaces when the rarity fallback fires on ≤2-sized components, so consumers know scores are in the flat (misleading) regime (techempower-org/multipass-structural-memory-eval#8)
Spec v8 status caveats — adds a status banner and 🚧 markers for the 6 CLI commands referenced in the spec but not yet implemented (techempower-org/multipass-structural-memory-eval#6)
standard_v0_1/questions.yaml + AUTHORING.md — reference questions and authoring guide for the starter corpus (techempower-org/multipass-structural-memory-eval#9)
Version bump 0.0.1 → 0.2.0 — reflects the actual feature surface after PRs feat(cat9): minimum-viable Category 9 scaffolding + 9b call-through success #1-feat(adapters): RlmAdapter + Qwen-7B/Llama-70B baselines #7 (techempower-org/multipass-structural-memory-eval#10, techempower-org/multipass-structural-memory-eval#5)

Test plan

pytest tests/test_gap_detection.py tests/test_harness_integration.py -v — 22 passed, 1 skipped (Betti-1 optional dep)
ruff check on all changed files — clean
New flat_rarity_mode tests: flagging, no-candidates, format-report text
New Cat 9b test: test_cat9b_empty_manifest_reports_not_applicable — verifies None return

🫏 Generated with Claude Code

…tarter corpus, v0.2.0 Ports five fixes from #14 (fork issues #5-10) that were not yet PR'd upstream: - Cat 9b call_through_rate returns None (not 0.0) when the adapter declares no harness manifest — distinguishes "not measured" from "every probe failed" (#7) - gap_detection surfaces flat_rarity_mode flag when the rarity fallback fires on ≤2-sized components, so consumers know scores are in the flat regime (#8) - sme_spec_v8.md adds status caveat and 🚧 markers for unimplemented CLI commands (snapshot, run, compare, calibrate, viz, report) (#6) - standard_v0_1/questions.yaml + AUTHORING.md — reference questions and authoring guide for the starter corpus (#9) - Version bump 0.0.1 → 0.2.0 (#10, #5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 23, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Cat9b None-vs-0.0, flat_rarity_mode flag, spec caveats, starter corpus, v0.2.0#28

fix: Cat9b None-vs-0.0, flat_rarity_mode flag, spec caveats, starter corpus, v0.2.0#28
jphein wants to merge 1 commit into
M0nkeyFl0wer:mainfrom
techempower-org:fix/fork-issues-5-10

jphein commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jphein commented May 23, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant