Skip to content

fix: Cat9b None-vs-0.0, flat_rarity_mode flag, spec caveats, starter corpus, v0.2.0#28

Open
jphein wants to merge 1 commit into
M0nkeyFl0wer:mainfrom
techempower-org:fix/fork-issues-5-10
Open

fix: Cat9b None-vs-0.0, flat_rarity_mode flag, spec caveats, starter corpus, v0.2.0#28
jphein wants to merge 1 commit into
M0nkeyFl0wer:mainfrom
techempower-org:fix/fork-issues-5-10

Conversation

@jphein
Copy link
Copy Markdown
Contributor

@jphein jphein commented May 23, 2026

Summary

Ports five fixes from techempower-org/multipass-structural-memory-eval#14 (fork issues #5-10) that shipped on the fork but were never PR'd upstream. All are small, self-contained, and independently testable.

  • Cat 9b None vs 0.0 semantic fixcall_through_rate now returns None when the adapter declares no harness manifest, distinguishing "not measured" from "every probe failed" (techempower-org/multipass-structural-memory-eval#7)
  • flat_rarity_mode flag on gap detection — surfaces when the rarity fallback fires on ≤2-sized components, so consumers know scores are in the flat (misleading) regime (techempower-org/multipass-structural-memory-eval#8)
  • Spec v8 status caveats — adds a status banner and 🚧 markers for the 6 CLI commands referenced in the spec but not yet implemented (techempower-org/multipass-structural-memory-eval#6)
  • standard_v0_1/questions.yaml + AUTHORING.md — reference questions and authoring guide for the starter corpus (techempower-org/multipass-structural-memory-eval#9)
  • Version bump 0.0.10.2.0 — reflects the actual feature surface after PRs feat(cat9): minimum-viable Category 9 scaffolding + 9b call-through success #1-feat(adapters): RlmAdapter + Qwen-7B/Llama-70B baselines #7 (techempower-org/multipass-structural-memory-eval#10, techempower-org/multipass-structural-memory-eval#5)

Test plan

  • pytest tests/test_gap_detection.py tests/test_harness_integration.py -v — 22 passed, 1 skipped (Betti-1 optional dep)
  • ruff check on all changed files — clean
  • New flat_rarity_mode tests: flagging, no-candidates, format-report text
  • New Cat 9b test: test_cat9b_empty_manifest_reports_not_applicable — verifies None return

🫏 Generated with Claude Code

…tarter corpus, v0.2.0

Ports five fixes from #14
(fork issues #5-10) that were not yet PR'd upstream:

- Cat 9b call_through_rate returns None (not 0.0) when the adapter
  declares no harness manifest — distinguishes "not measured" from
  "every probe failed" (#7)
- gap_detection surfaces flat_rarity_mode flag when the rarity fallback
  fires on ≤2-sized components, so consumers know scores are in the
  flat regime (#8)
- sme_spec_v8.md adds status caveat and 🚧 markers for unimplemented
  CLI commands (snapshot, run, compare, calibrate, viz, report) (#6)
- standard_v0_1/questions.yaml + AUTHORING.md — reference questions
  and authoring guide for the starter corpus (#9)
- Version bump 0.0.1 → 0.2.0 (#10, #5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 23, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant