Skip to content

memory/kb: widen FTS5 snippet window 24 → 60 + nudge agent to follow up with fs_read#309

Merged
FMXExpress merged 1 commit into
mainfrom
claude/memory-snippet-window
Jun 18, 2026
Merged

memory/kb: widen FTS5 snippet window 24 → 60 + nudge agent to follow up with fs_read#309
FMXExpress merged 1 commit into
mainfrom
claude/memory-snippet-window

Conversation

@FMXExpress

Copy link
Copy Markdown
Owner

Summary

Addresses the actionable finding from PR #308's LOCOMO bench results:

FTS5-bounded snippet window often clips the actual answer line (snippet R@10 ≈ 0.75). That's an actionable finding — either widen snippet windows in PasClaw.Memory.Index or train the agent to follow up with fs_read on retrieved citations more aggressively.

Does both — the two fixes are complementary, not alternatives.

Bench delta

Re-running the PR #308 harness against the bundled alice_synthetic persona with this branch applied:

metric before (width=24) after (width=60)
snippet R@10 0.75 1.0
snippet R@1 0.375 0.875
doc R@10 1.0 1.0 (unchanged)

Doc-level recall was already optimal — the layer found the right file; it just wasn't showing the right line in the snippet. Widening the window closes that gap on this synthetic. The remaining snippet R@1 = 0.875 (1 out of 8 missed) is a BM25 ranking artifact, not a snippet-width issue — leaving for a follow-up that tunes the hybrid ranker.

Changes

1. PasClaw.Memory.Index — lift the magic number, widen to 60

Adds FTS5_SNIPPET_TOKENS = 60 as an interface-section const so PasClaw.KB.Index can reuse it (memory_search and kb_search should agree on snippet width — they share a tokenizer and they share the bench). 64 is FTS5's hard ceiling per call; 60 leaves a sliver of slack for the «...» highlight markup. Both Search() overloads (FPC sqldb and Delphi FireDAC) now interpolate IntToStr(FTS5_SNIPPET_TOKENS).

2. PasClaw.KB.Index — reuse the same const

Two snippet(kb_fts, 2, ...) queries now reference FTS5_SNIPPET_TOKENS (picked up from the existing uses PasClaw.Memory.Index). The uses comment is updated so the cross-unit dependency is grep-able.

3. PasClaw.Agent.Prompt.BuildRulesSection — nudge for citation follow-up

Rule #5 (the memory rule) now explicitly tells the model that memory_search / kb_search return bounded snippets, and that if the cited file is right but the answer line might fall just outside the window, follow up with fs_read (or kb_get for the KB) on the cited path. Frames it as: a snippet showing the right file but not quite the right line is a hit, not a miss — i.e. drive the agent toward iteration on citations instead of giving up.

Why both fixes

  • Wider window lifts the common case — short answer lines that used to fall just outside a 24-token window now show up in the first hit. Cheap and zero-token cost to the model.
  • Prompt nudge handles long-context documents where even 60 tokens can't centre on the right line (e.g. a multi-paragraph note where the relevant fact is several sentences from the search term that anchored the hit). The model now knows to read the underlying file instead of treating the snippet as the final answer.

Test plan


Generated by Claude Code

… up with fs_read

The historical snippet width of 24 tokens routinely clipped the actual
answer line out of FTS5 snippets returned by memory_search / kb_search.
LOCOMO bench numbers on the workspace memory layer:

  snippet R@10: 0.75 -> 1.0
  snippet R@1:  0.375 -> 0.875
  doc R@10:     1.0 (unchanged, was already optimal)

Two changes:

1. Lift the magic 24 into a named interface-section const
   FTS5_SNIPPET_TOKENS = 60 in PasClaw.Memory.Index, and reuse it from
   PasClaw.KB.Index so memory_search and kb_search stay aligned. 64 is
   FTS5's hard ceiling; 60 leaves slack for the highlight markers.

2. Extend rule #5 in PasClaw.Agent.Prompt.BuildRulesSection to tell the
   model that bounded snippets are an index hit, not a final answer --
   if the cited file is right but the answer line might fall just
   outside the window, follow up with fs_read (or kb_get) on the cited
   path before giving up.

The two fixes are complementary: a wider window lifts recall on the
common case, and the prompt nudge handles long-context documents where
even 60 tokens may not centre on the right line.
@FMXExpress FMXExpress merged commit 36c63a2 into main Jun 18, 2026
FMXExpress pushed a commit that referenced this pull request Jun 19, 2026
…ntical, validates PR #309

After fixing the fixture-side bug (commit 01dac9f -- changed staged
prior-session log from .ndjson to .md since PasClaw's SyncDir only
indexes Markdown), re-ran the prior-session shootout. 4 cells:
baseline + lean-edit + stock + max-build.

Result
======

  profile     turns  tools                                trajectory
  baseline      2    8  (no memory_search)              fs_write only*
  lean-edit     4    9  (has memory_search)             search -> read -> write
  stock         4   13                                  search -> read -> write
  max-build     4   17                                  search -> read -> write

  * driver artifact -- the subagent read the staged .md file with its
    own (Claude Code) Read tool and short-circuited the turn loop.
    Fair baseline would be ~5-6 turns.

Three real findings
===================

1. memory_search works on .md files when SyncDir's lazy indexing path
   runs. No `pasclaw memory provision` needed -- the first search
   call triggers the index build automatically. The earlier "memory
   search returns nothing" finding was a fixture file-extension bug,
   not a PasClaw bug.

2. PR #309 (FTS5 snippet width 24 -> 60 + Rule 5 fs_read follow-up)
   is doing its job in the wild. Even at 60-token snippets, the
   snippet on this file truncated before reaching the "Final
   decision: cbor" line -- the query terms ("serialization format
   storage") matched earlier paragraphs and 60 tokens didn't extend
   to the decision sentence. EVERY agent (lean-edit, stock,
   max-build) followed Rule 5 correctly: when the snippet shows the
   right file but not the right line, follow up with fs_read on the
   cited path. Exact behavior the rule trains for. PR #309 wasn't
   "fix the symptom"; the snippet-widening helps and the rule
   handles the residual cases.

3. With memory_search present, profile differences disappear on
   recall-shaped tasks. lean-edit, stock, and max-build all picked
   the same tools in the same order. The 2895-byte/turn max-build
   premium buys ZERO recall-task advantage over lean-edit. Honest
   memory_search savings vs no-memory_search is roughly 1-2 turns
   (fair baseline 5-6 turns vs all-memory-equipped 4 turns).

Cumulative verdict for memory_search
====================================

PRESENT in all of: lean-edit, lean-stock, lean-build, stock,
                   low-token, max-build, all-on
ABSENT in:        baseline (vector_search_enabled=false strips it)
                  security (same)

If you're choosing between lean-edit and max-build, memory_search is
NOT a differentiator -- they both have it. If you're stripping all
the way to baseline (or security in some configs), losing
memory_search costs you 1-2 turns on recall tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants