feat(memory): surface recall scores and add a graded relevance floor by prakashUXtech · Pull Request #249 · qbtrix/soul-protocol

prakashUXtech · 2026-05-21T09:22:25Z

What

soul recall scores memories two separate ways, and the two never meet. Each store's search() keeps a candidate if it shares any token with the query, so one shared word is enough to win a slot. BM25 gets computed too, but only inside compute_activation as the spread term, so it never reaches a result. And recall --json has no score field at all.

This PR puts the activation score on every recalled entry and turns the binary gate into a graded cutoff you can configure.

Why

Issue #247 audited the session-start soul recall mandate. Recall already drops zero-overlap matches, so a vague query comes back empty instead of padded. The part #247 flagged is the floor itself. Today it is a binary gate: any token overlap, however slight, takes a slot. Setting a graded threshold means the score has to reach the result first, and right now it is stuck inside the scorer.

Changes

activation.py: activation_breakdown() returns the activation total along with its component terms. compute_activation() now calls through to it, so existing callers see no change.
recall.py: RecallEngine.recall() scores each candidate once, writes the activation score to recall_score, and drops candidates below relevance_floor. The floor checks raw token overlap rather than the activation total, so a recency or emotion boost cannot carry an off-topic memory past it.
search.py: DEFAULT_RELEVANCE_FLOOR and passes_relevance_floor() replace the bare score > 0.0 literal in each store's search().
types.py: adds MemoryEntry.recall_score — a runtime-only field that stays out of the .soul file.
cli/main.py: recall --json emits a score per result, and a --min-relevance flag exposes the floor.

The default floor stays 0.0, so existing recalls behave the same as before. A higher cutoff is opt-in through relevance_floor or --min-relevance. Raising the default would quietly drop matches that current callers expect to see.

Testing

Smoke test (tests/test_cli/test_recall_score.py): soul recall --json emits a score field per result, a recall returns scored results, and --min-relevance drops a weak match.

End-to-end (tests/test_retrieval.py::TestRecallScoresAndFloor): populate a store, query it, then check that every result carries a score, results come back ordered by score, and a weak match below the floor is dropped while the strong match stays.

Results:

uv run pytest tests/test_retrieval.py -q — 25 passed
uv run pytest tests/ -q --ignore=tests/e2e — 3067 passed, 1 skipped (the skip is a known pre-existing failure, unrelated to this change)

Docs updated in this PR: memory-architecture.md (recall behavior, new "Recall scores and the graded relevance floor" section), cli-reference.md (--min-relevance flag and the score JSON field), api-reference.md (relevance_floor parameter), and the GAP-ANALYSIS.md recall row.

Closes #247

Recall had two scoring mechanisms that never met. Each store's search() gated candidates on a binary token-overlap check (score > 0.0), so a single shared word earned a result slot. BM25 was computed too, but only inside compute_activation as the spread term — it never reached a result, and recall --json had no score field at all. This plumbs the activation score onto every recalled entry and turns the binary gate into a graded, configurable cutoff. - activation.py: add ActivationBreakdown and activation_breakdown(), which returns the activation total alongside its terms. compute_activation() delegates to it, so existing callers are unchanged. - recall.py: RecallEngine.recall() scores each candidate once, writes the activation score to the entry's recall_score field, and drops candidates whose token overlap falls below relevance_floor. The floor checks raw overlap, not the activation total, so recency or emotion cannot lift an off-topic memory past it. - search.py: DEFAULT_RELEVANCE_FLOOR and passes_relevance_floor() replace the bare score > 0.0 literal in each store's search(). - types.py: MemoryEntry gains recall_score, a runtime-only field (exclude=True) so query-dependent scores stay out of the .soul file. - cli/main.py: soul recall --json emits a score per result; a new --min-relevance flag exposes the graded floor. The default floor is 0.0, which keeps the historical "any overlap" behaviour — raising the default would silently drop matches existing callers expect. The graded cutoff is opt-in via relevance_floor / --min-relevance. Closes #247

github-actions · 2026-05-21T09:22:37Z

Issues (must fix)

No evidence of local testing found. Please include terminal output or screenshots.

Heads up

Large PR detected (801 lines across 16 files). Consider splitting into smaller PRs.

Please update your PR to address these points.

github-actions · 2026-05-21T09:22:37Z

Security scan: review needed

Potentially dangerous code patterns detected in changed files. A maintainer should verify these are intentional and safe.### src/soul_protocol/runtime/memory/manager.py

2282:            search_strategy: Optional SearchStrategy for pluggable retrieval (v0.2.2).

src/soul_protocol/runtime/memory/recall.py

33:#   Access timestamps are updated on retrieval (strengthens future recall).

src/soul_protocol/runtime/soul.py

833:            search_strategy: Optional SearchStrategy for pluggable retrieval (v0.2.2).
1050:            search_strategy: Optional SearchStrategy for pluggable retrieval (v0.2.2).
1321:    def last_retrieval(self) -> RetrievalTrace | None:

…nce floor A graph-augmented candidate enters the result set because it matched a related graph entity term, not the original user query. The relevance floor was then checked against the original query, where that candidate's token overlap is typically ~0 — so any positive relevance_floor silently dropped every graph-augmented result and neutered graph augmentation. recall() now captures the candidate IDs the text search produced before graph augmentation runs, and applies the floor only to those. Graph- augmented entries were already validated by the entity-term search and pass through regardless of the floor. Also: - passes_relevance_floor() docstring spells out the boundary semantics: the comparison is inclusive (score >= floor) and a negative floor is clamped to the historical strict-positive gate rather than honoured. - Note the recall_score trade-off at its assignment: it mutates the live store object, which is safe (exclude=True, no .soul corruption) but can race across concurrent recalls. Copying only there would be a half- measure since the access-metadata reinforcement loop mutates the same shared objects on purpose. - test_floor_above_all_matches_returns_empty keeps relevance_floor=1.01 with a comment: the floor is inclusive, so 1.0 still keeps a perfect match — rejecting even a perfect match needs a floor just above 1.0. Tests: add TestGraphAugmentationExemptFromFloor — a graph-augmented entry with zero query overlap survives a 0.5 floor, while a weak direct match is still dropped (the exemption is scoped, not a blanket bypass). Docs: memory-architecture.md and api-reference.md note the exemption and the inclusive-comparison semantics.

github-actions Bot added the needs-work label May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): surface recall scores and add a graded relevance floor#249

feat(memory): surface recall scores and add a graded relevance floor#249
prakashUXtech wants to merge 2 commits into
devfrom
feat/recall-graded-floor

prakashUXtech commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prakashUXtech commented May 21, 2026

What

Why

Changes

Testing

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues (must fix)

Heads up

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security scan: review needed

src/soul_protocol/runtime/memory/recall.py

src/soul_protocol/runtime/soul.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading