Skip to content

test+refactor: adapter contract testkit, registry allowlist, test coverage (#8, #18, #19, #20)#30

Merged
M0nkeyFl0wer merged 5 commits into
M0nkeyFl0wer:mainfrom
techempower-org:upstream/code-quality-issues
May 26, 2026
Merged

test+refactor: adapter contract testkit, registry allowlist, test coverage (#8, #18, #19, #20)#30
M0nkeyFl0wer merged 5 commits into
M0nkeyFl0wer:mainfrom
techempower-org:upstream/code-quality-issues

Conversation

@jphein
Copy link
Copy Markdown
Contributor

@jphein jphein commented May 24, 2026

Summary

Addresses four open code-quality issues from this repo:

146 new tests total, all passing.

Test plan

  • python -m pytest tests/test_adapter_contract.py tests/test_adapter_harness_manifest_contract.py tests/test_cli_adapter_forwarding.py tests/test_graph_mapping.py tests/test_multi_hop.py tests/test_ontology_coherence.py -v
  • Verify no regressions in existing tests: python -m pytest tests/ -x -q
  • Spot-check that sme-eval --adapter flat --db /tmp/test still resolves through the new registry

🫏 Generated with Claude Code

jphein and others added 5 commits May 24, 2026 14:45
…yFl0wer#19)

The ABC default returning [] was added in 3754e8a as part of Cat 9
scaffolding, but the upstream issue tracks two follow-throughs that
were still missing:

- Module docstring + class docstring in `base.py` claimed "Two
  optional" methods; promote to "Three optional" so readers see
  `get_harness_manifest` as part of the contract.
- Spec § Adapter Interface (sme_spec_v8.md) listed only
  `get_flat_retrieval` and `get_ontology_source` as optional. Add
  `get_harness_manifest` alongside them with its default-list rationale.

Adds tests/test_adapter_harness_manifest_contract.py: every shipped
adapter must resolve `get_harness_manifest` (either inherited or
overridden) and the ABC default must return []. Class-level checks
avoid instantiating adapters with heavy constructors. Closes the
silent-AttributeError class entirely for any future adapter that
forgets to override.
…Fl0wer#8)

Parametric test module verifying adapters conform to the SMEAdapter
ABC: query returns QueryResult, graph snapshot is internally
consistent, ingest accepts corpus shape. Covers FlatBaselineAdapter
and a MockAdapter; other adapters opt in by registering a factory.
…er#20)

Replace 8-branch if-chain with _AdapterSpec registry. Each adapter
declares its accepted kwargs; unknown kwargs are silently dropped.
Makes the PR #7 class of drop-list-drift regression structurally
impossible.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…keyFl0wer#18)

Three new test files for measurement-critical modules that had zero
direct tests. Covers hop-bucket grouping, Cat 8 sub-tests, and the
shared graph projection helper. Prevents silent regressions in
published category readings.
Remove HindsightAdapter, Mem0Adapter, OmegaAdapter, RlmAdapter imports
since those adapter files don't exist in upstream repo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@M0nkeyFl0wer
Copy link
Copy Markdown
Owner

Tested locally in a worktree: ruff clean, 338 passed / 15 skipped, the 146-new-test claim verifies exactly.

The _AdapterSpec dataclass with accepts: frozenset[str] is the structural fix #20 was asking for — drop-list drift is now impossible by construction, which is the right shape (vs. the maintenance-burden shape of "be more careful with future drop lists"). The 27-test contract testkit is the abstraction the "same shape three times" pattern wanted, and the get_harness_manifest() ABC + regression tests close out the silent-AttributeError class from #19.

One forward-looking note (not blocking): when new adapters land in subsequent PRs (#32 adds two — random + oracle), the contract testkit needs them parametrized in or the "27 conformance tests" claim quietly stops being true. Worth a one-line note in CONTRIBUTING about that. I'll add it.

Merging this first. Everything else in the stack will benefit from rebasing onto post-#30 main.

@M0nkeyFl0wer
Copy link
Copy Markdown
Owner

One coordination note as I work through the stack — five PRs in this wave (#31, #37, #38, #39, #40) touch `sme/categories/ingestion_integrity.py` or `sme/categories/gap_detection.py`. If they're all branched from pre-#30 main, serial merge will hit conflicts from the third PR onward. Would you be up for rebasing the stack onto post-#30 main in one pass once this lands, so the rest land cleanly? Happy to do it from this end if easier — your call.

Also flagging: when #32 lands (RandomRetrievalAdapter + OracleRetrievalAdapter), the contract testkit parametrize list needs both adapters added or the "27 conformance tests" invariant silently degrades. Mentioned it on #32 too — flagging here since this is the foundation that enforces the invariant.

@M0nkeyFl0wer M0nkeyFl0wer merged commit e5ca6e3 into M0nkeyFl0wer:main May 26, 2026
3 checks passed
@M0nkeyFl0wer
Copy link
Copy Markdown
Owner

Following up on the coordination point I raised on Monday — now that the #31/#33/#34/#35/#36/#37 fixes are all green and ready, the file-overlap question is back on the front burner. Five of the seven open PRs touch sme/categories/ingestion_integrity.py or sme/categories/gap_detection.py, so serial merge will hit conflicts from the second PR onward.

I'm not precious about the order; mostly I want to confirm a sequence so we don't end up doing rebase coordination cold-cache next week. Tentative proposal, push back if you'd prefer different:

  1. Paired bootstrap CIs and Benjamini-Hochberg FDR correction #36 (paired bootstrap + BH-FDR) — new sme/stats/ module, no file overlap with anything else, foundation for downstream readings. Land first.
  2. Cost-per-correct headline metric with versioned pricing (HELM methodology) #34 (cost-per-correct) — new sme/eval/pricing.py, no overlap. Land second.
  3. Cat 7.b: query latency distribution (YCSB percentiles) #33 (Cat 7.b latency) — new sme/categories/latency.py, no overlap. Land third.
  4. LongMemEval judge: K-replicate variance with majority voting #35 (K-replicate judge) — touches sme/eval/longmemeval_judge.py only, no overlap with this group. Land fourth.
  5. Cat 4: surface per-ID edge degree in collision report #31 (Cat 4 collision degrees) — first ingestion_integrity.py touch in this wave. Land fifth.
  6. Cat 4c: edge count alongside component count, sparse annotation #37 (Cat 4c edge counts) — second ingestion_integrity.py touch. Rebases onto Cat 4: surface per-ID edge degree in collision report #31.
  7. feat(cat5): break out isolates by entity_type #38feat(cat5): representative cycles alongside Betti-1 #39feat(cat5): component-size distribution and non-trivial summaries #40 (Cat 5 gap_detection improvements) — all touch gap_detection.py, each rebases onto the previous.

That keeps the rebase work bounded to the actual file overlaps (#37 onto #31, and the #38/#39/#40 stack). Open to a different order if you have a preference — happy to be wrong about which to land first.

Separately, #32 (random+oracle) I'm leaving as its own conversation — see the comment I just left there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants