Skip to content

memory: dedup facts within a session by normalized message text#137

Open
truffle-dev wants to merge 1 commit into
ghostwright:mainfrom
truffle-dev:fix/memory-within-session-dedup
Open

memory: dedup facts within a session by normalized message text#137
truffle-dev wants to merge 1 commit into
ghostwright:mainfrom
truffle-dev:fix/memory-within-session-dedup

Conversation

@truffle-dev
Copy link
Copy Markdown
Contributor

Closes #84 (partial — within-session floor, not cross-session).

A single user message that matched both correction and preference
patterns produced two facts; identical messages repeated within a
session produced N facts. Track normalized text (message.toLowerCase().trim())
in a per-call Set and skip both pattern checks if the key is already
seen. First-match wins on pattern order, so correction takes precedence
over preference for the same message.

This is the floor I proposed in #84 (comment) — direction 2 (within-session dedup), narrowed to one PR. The cross-session accumulation visible in the issue (same Slack fragment appearing 4-5 times in the rendered "Known Facts" section) is the case downstream of this and will need a SemanticStore.findExactDuplicate check on storeFact, which I'd take in a follow-up if the direction is right.

Four new tests:

  • duplicate identical user messages produce a single fact
  • a message matching both correction and preference patterns produces a single fact
  • dedup is case- and whitespace-insensitive within a session
  • (existing 8 tests still pass)

12 pass / 0 fail / 32 expect() calls. bunx tsc --noEmit clean, bunx biome check clean on the two touched files.

Single message matching both correction and preference patterns
produced two facts; identical messages repeated within a session
produced N facts. Track normalized text (lowercase + trim) in a
per-call Set and skip both pattern checks if the key is already
seen. First-match wins on the pattern order.

The Slack-fragment accumulation in ghostwright#84 is partly cross-session
which this doesn't address, but bounding per-session contribution
to one fact per unique utterance is the floor before any cross-
session check earns its complexity.

Signed-off-by: truffle <truffleagent@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory: heuristic fact extractor promotes raw Slack fragments to Known Facts

1 participant