-
Notifications
You must be signed in to change notification settings - Fork 695
feat(proposal-a): Phase 1 recall governance (Issue #569) #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
12aa085
32d29c8
14100f9
6084f73
421f637
d68ee69
5bb2137
36a978f
c6668eb
718fd74
ec4d643
2a8d0e5
0b4aeff
b371f0c
d3d0b71
8990c9d
d33fe19
45247f2
84ddffa
3d9f27d
64767ee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,6 +46,7 @@ import { | |
| import { | ||
| extractReflectionLearningGovernanceCandidates, | ||
| extractInjectableReflectionMappedMemoryItems, | ||
| isRecallUsed, | ||
| } from "./src/reflection-slices.js"; | ||
| import { createReflectionEventId } from "./src/reflection-event-store.js"; | ||
| import { buildReflectionMappedMetadata } from "./src/reflection-mapped-metadata.js"; | ||
|
|
@@ -225,6 +226,22 @@ interface PluginConfig { | |
| skipLowValue?: boolean; | ||
| maxExtractionsPerHour?: number; | ||
| }; | ||
| feedback?: { | ||
| /** Boost importance when a recalled memory is used (default: 0.05) */ | ||
| boostOnUse?: number; | ||
| /** Penalty when a recalled memory is not used after consecutive misses (default: 0.03) */ | ||
| penaltyOnMiss?: number; | ||
| /** Extra boost when user explicitly confirms a recalled memory is correct (default: 0.15) */ | ||
| boostOnConfirm?: number; | ||
| /** Extra penalty when user explicitly corrects a non-recalled memory (default: 0.10) */ | ||
| penaltyOnError?: number; | ||
| /** Minimum recall (injection) count before penalty applies to this memory (default: 2) */ | ||
| minRecallCountForPenalty?: number; | ||
| /** Keywords indicating user confirmation of a recalled memory */ | ||
| confirmKeywords?: string[]; | ||
| /** Keywords indicating user correction/error for a non-recalled memory */ | ||
| errorKeywords?: string[]; | ||
| }; | ||
| } | ||
|
|
||
| type ReflectionThinkLevel = "off" | "minimal" | "low" | "medium" | "high"; | ||
|
|
@@ -2006,6 +2023,32 @@ const memoryLanceDBProPlugin = { | |
| const autoCapturePendingIngressTexts = new Map<string, string[]>(); | ||
| const autoCaptureRecentTexts = new Map<string, string[]>(); | ||
|
|
||
| // ======================================================================== | ||
| // Proposal A Phase 1: Recall Usage Tracking Hooks | ||
| // ======================================================================== | ||
| // Track pending recalls per session for usage scoring | ||
| type PendingRecallEntry = { | ||
| recallIds: string[]; | ||
| responseText: string; | ||
| injectedAt: number; | ||
| /** Summary text lines actually injected into the prompt, used for usage detection. */ | ||
| injectedSummaries: string[]; | ||
| }; | ||
| // P0-1 fix: pendingRecall TTL-based cleanup to prevent unbounded memory growth. | ||
| // Entries older than 10 minutes are cleaned up on each set() call. | ||
| const PENDING_RECALL_MAX_AGE_MS = 10 * 60 * 1000; // 10 minutes | ||
| function cleanupPendingRecall(): void { | ||
| const now = Date.now(); | ||
| for (const [key, entry] of pendingRecall.entries()) { | ||
| if (now - entry.injectedAt > PENDING_RECALL_MAX_AGE_MS) { | ||
| pendingRecall.delete(key); | ||
| } | ||
| } | ||
| } | ||
| const pendingRecall = new Map<string, PendingRecallEntry>(); | ||
| // Clean up on module load (handles re-registration edge cases) | ||
| cleanupPendingRecall(); | ||
|
|
||
| const logReg = isCliMode() ? api.logger.debug : api.logger.info; | ||
| logReg( | ||
| `memory-lancedb-pro@${pluginVersion}: plugin registered (db: ${resolvedDbPath}, model: ${config.embedding.model || "text-embedding-3-small"}, smartExtraction: ${smartExtractor ? 'ON' : 'OFF'})` | ||
|
|
@@ -2469,7 +2512,9 @@ const memoryLanceDBProPlugin = { | |
| const nextBadRecallCount = staleInjected | ||
| ? meta.bad_recall_count + 1 | ||
| : meta.bad_recall_count; | ||
| const shouldSuppress = nextBadRecallCount >= 3 && minRepeated > 0; | ||
| // P2 fix: suppress threshold aligned with scoring path (>= 2). After 2 bad recalls, | ||
| // both the scoring penalty and suppression kick in simultaneously. | ||
| const shouldSuppress = nextBadRecallCount >= 2 && minRepeated > 0; | ||
| await store.patchMetadata( | ||
| item.id, | ||
| { | ||
|
|
@@ -2496,6 +2541,27 @@ const memoryLanceDBProPlugin = { | |
| `memory-lancedb-pro: injecting ${selected.length} memories into context for agent ${agentId}`, | ||
| ); | ||
|
|
||
| // Create or update pendingRecall for this turn so the feedback hook | ||
| // (which runs in the NEXT turn's before_prompt_build after agent_end) | ||
| // sees a matching pair: Turn N recallIds + Turn N responseText. | ||
| // agent_end will write responseText into this same pendingRecall | ||
| // entry (only updating responseText, never clearing recallIds). | ||
| // Include agentId in the key so different agents in the same session do not overwrite each other's pendingRecall. | ||
| const sessionKeyForRecall = `${ctx?.sessionKey || ctx?.sessionId || "default"}:${agentId ?? ""}`; | ||
| // Bug 1 fix: also store the injected summary lines so the feedback hook | ||
| // can detect usage even when the agent doesn't use stock phrases or IDs | ||
| // but directly incorporates the memory content into the response. | ||
| // P1 fix: store summary content only (without prefix) for accurate matching. | ||
| const injectedSummaries = selected.map((item) => item.summary); | ||
| // P0-1 fix: run TTL cleanup before each set to prevent unbounded growth | ||
| cleanupPendingRecall(); | ||
| pendingRecall.set(sessionKeyForRecall, { | ||
| recallIds: selected.map((item) => item.id), | ||
| responseText: "", // Will be populated by agent_end | ||
| injectedAt: Date.now(), | ||
| injectedSummaries, | ||
| }); | ||
|
|
||
| return { | ||
| prependContext: | ||
| `<relevant-memories>\n` + | ||
|
|
@@ -2539,6 +2605,15 @@ const memoryLanceDBProPlugin = { | |
| recallHistory.delete(sessionId); | ||
| turnCounter.delete(sessionId); | ||
| lastRawUserMessage.delete(sessionId); | ||
| // P3 fix: clean all pendingRecall entries for this session. | ||
| // pendingRecall keys use format: sessionKey (or sessionKey:agentId with composite key). | ||
| // P1 fix: use sessionId only when sessionKey is absent to avoid clearing unrelated sessions. | ||
| const sessionKeyToClean = ctx?.sessionKey ?? sessionId; | ||
| for (const key of pendingRecall.keys()) { | ||
| if (key === sessionId || key.startsWith(`${sessionId}:`) || (sessionKeyToClean && key.startsWith(`${sessionKeyToClean}:`))) { | ||
| pendingRecall.delete(key); | ||
| } | ||
| } | ||
| } | ||
| // Also clean by channelId/conversationId if present (shared cache key) | ||
| const cacheKey = ctx?.channelId || ctx?.conversationId || ""; | ||
|
|
@@ -2891,8 +2966,205 @@ const memoryLanceDBProPlugin = { | |
| }; | ||
|
|
||
| api.on("agent_end", agentEndAutoCaptureHook); | ||
|
|
||
| // ======================================================================== | ||
| // Proposal A Phase 1: agent_end hook - Store response text for usage tracking | ||
| // ======================================================================== | ||
| // NOTE: Only writes responseText to an EXISTING pendingRecall entry created | ||
| // by before_prompt_build (auto-recall). Does NOT create a new entry. | ||
| // This ensures recallIds (written by auto-recall in the same turn) and | ||
| // responseText (written here) remain paired for the feedback hook. | ||
| api.on("agent_end", (event: any, ctx: any) => { | ||
| // Use same key format as auto-recall hook (sessionKey:agentId) so we update the right entry. | ||
| const agentIdForKey = resolveHookAgentId(ctx?.agentId, (event as any).sessionKey); | ||
| const sessionKey = `${ctx?.sessionKey || ctx?.sessionId || "default"}:${agentIdForKey ?? ""}`; | ||
| if (!sessionKey) return; | ||
|
|
||
| // Get the last message content | ||
| let lastMsgText: string | null = null; | ||
| if (event.messages && Array.isArray(event.messages)) { | ||
| const lastMsg = event.messages[event.messages.length - 1]; | ||
| if (lastMsg && typeof lastMsg === "object") { | ||
| const msgObj = lastMsg as Record<string, unknown>; | ||
| lastMsgText = extractTextContent(msgObj.content); | ||
| } | ||
| } | ||
|
|
||
| // Only update an existing pendingRecall entry — do NOT create one. | ||
| // This preserves recallIds written by auto-recall earlier in this turn. | ||
| const existing = pendingRecall.get(sessionKey); | ||
| if (existing && lastMsgText && lastMsgText.trim().length > 0) { | ||
| existing.responseText = lastMsgText; | ||
| } | ||
| }, { priority: 20 }); | ||
|
|
||
| // ======================================================================== | ||
| // Proposal A Phase 1: before_prompt_build hook (priority 5) - Score recalls | ||
| // ======================================================================== | ||
| api.on("before_prompt_build", async (event: any, ctx: any) => { | ||
| // Use same key format as auto-recall hook (sessionKey:agentId) so we read the right entry. | ||
| const agentIdForKey = resolveHookAgentId(ctx?.agentId, (event as any).sessionKey); | ||
| const sessionKey = `${ctx?.sessionKey || ctx?.sessionId || "default"}:${agentIdForKey ?? ""}`; | ||
| // P2 fix: also cleanup on read path to handle idle sessions that never trigger set() | ||
| cleanupPendingRecall(); | ||
| const pending = pendingRecall.get(sessionKey); | ||
| if (!pending) return; | ||
|
|
||
| // Guard: only score if responseText has substantial content | ||
| const responseText = pending.responseText; | ||
| if (!responseText || responseText.length <= 24) { | ||
| // Skip scoring for empty or very short responses | ||
| // Bug 5 fix: also clear pendingRecall so the next turn does not | ||
| // re-trigger feedback on stale recallIds / old responseText. | ||
| pendingRecall.delete(sessionKey); | ||
| return; | ||
| } | ||
|
|
||
| // Read recall IDs directly from pendingRecall (populated by auto-recall's | ||
| // before_prompt_build hook from the PREVIOUS turn). This replaces the | ||
| // broken regex-based parsing of prependContext which never matched the | ||
| // actual [category:scope] format used by auto-recall injection. | ||
| const injectedIds = pending.recallIds ?? []; | ||
|
|
||
| // Bug 1 fix: also retrieve the injected summary lines so isRecallUsed can | ||
| // detect when the agent directly incorporates memory content into the response. | ||
| const injectedSummaries = pending.injectedSummaries ?? []; | ||
|
|
||
| // Check if any recall was actually used by checking if the response contains reference to the injected content | ||
| // This is a heuristic - we check if the response shows awareness of injected memories | ||
| let usedRecall = false; | ||
| if (injectedIds.length > 0 || injectedSummaries.length > 0) { | ||
| // Use the real isRecallUsed function from reflection-slices | ||
| usedRecall = isRecallUsed(responseText, injectedIds, injectedSummaries); | ||
| } | ||
|
|
||
| // Read feedback config values with defaults | ||
| // P2 fix: coerce to Number and use ?? to preserve explicit zero values. | ||
| // P2 fix: use nullish coalescing to allow 0 as a valid config value. | ||
| const fb = config.feedback ?? {}; | ||
| const boostOnUse = Number(fb.boostOnUse ?? 0) ?? 0.05; | ||
| const penaltyOnMiss = Number(fb.penaltyOnMiss ?? 0) ?? 0.03; | ||
| const boostOnConfirm = Number(fb.boostOnConfirm ?? 0) ?? 0.15; | ||
| const penaltyOnError = Number(fb.penaltyOnError ?? 0) ?? 0.10; | ||
| const minRecallCountForPenalty = fb.minRecallCountForPenalty ?? 2; | ||
| const confirmKeywords = fb.confirmKeywords ?? ["correct", "right", "yes", "confirmed", "exactly", "對", "沒錯", "正確", "確認", "好的"]; | ||
| const errorKeywords = fb.errorKeywords ?? ["wrong", "incorrect", "not right", "that's wrong", "error", "mistake", "fix it", "change that", "改成", "改為", "不是這樣", "不對", "錯了"]; | ||
|
|
||
| // event.prompt is a plain string in the current hook contract (confirmed by codebase usage). | ||
| // We extract the user's last message from event.messages array instead. | ||
| let userPromptText = ""; | ||
| try { | ||
| if (event.messages && Array.isArray(event.messages)) { | ||
| for (let i = event.messages.length - 1; i >= 0; i--) { | ||
| const msg = event.messages[i]; | ||
| if (msg && msg.role === "user" && typeof msg.content === "string" && msg.content.trim().length > 0) { | ||
| userPromptText = msg.content.trim(); | ||
| break; | ||
| } | ||
| if (msg && msg.role === "user" && Array.isArray(msg.content)) { | ||
| // Handle array-form content | ||
| const text = extractTextContent(msg.content); | ||
| if (text && text.trim().length > 0) { | ||
| userPromptText = text.trim(); | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } catch (_e) { | ||
| userPromptText = ""; | ||
| } | ||
|
|
||
| // Helper: check if text contains any of the keywords (case-insensitive) | ||
| const containsKeyword = (text: string, keywords: string[]): boolean => | ||
| keywords.some((kw) => text.toLowerCase().includes(kw.toLowerCase())); | ||
|
|
||
| // Score the recall - update importance based on usage | ||
| // Score each recall individually — do NOT compute a single usedRecall for the whole batch. | ||
| // Bug 1 fix (P1): when auto-recall injects multiple memories, the agent may use only some of them. | ||
| // Scoring them all with one decision corrupts ranking: unused memories get boosted, used ones get penalized. | ||
| if (injectedIds.length > 0) { | ||
| try { | ||
| // Build lookup: recallId -> injected summary text for this specific recall | ||
| const summaryMap = new Map<string, string>(); | ||
| for (let i = 0; i < injectedIds.length; i++) { | ||
| if (injectedSummaries[i]) { | ||
| summaryMap.set(injectedIds[i], injectedSummaries[i]); | ||
| } | ||
| } | ||
|
|
||
| for (const recallId of injectedIds) { | ||
| const summaryText = summaryMap.get(recallId) ?? ""; | ||
| // Score this specific recall independently | ||
| const usedRecall = isRecallUsed( | ||
| responseText, | ||
| [recallId], | ||
| summaryText ? [summaryText] : [], | ||
| ); | ||
|
|
||
| const entry = await store.getById(recallId, undefined); | ||
| if (!entry) continue; | ||
| const meta = parseSmartMetadata(entry.metadata, entry); | ||
| const currentImportance = meta.importance ?? entry.importance ?? 0.5; | ||
|
|
||
| // P1 fix (Codex): check errorKeywords BEFORE usedRecall. | ||
| // If user explicitly corrects, that overrides the heuristic usage detection. | ||
| // Also set last_confirmed_use_at here to prevent injection path's staleInjected | ||
| // from double-counting in the same turn. | ||
| const hasError = containsKeyword(userPromptText, errorKeywords); | ||
| if (hasError) { | ||
| if ((meta.injected_count || 0) >= minRecallCountForPenalty) { | ||
| await store.update(recallId, { importance: Math.max(0.1, currentImportance - penaltyOnError) }, undefined); | ||
| } | ||
| await store.patchMetadata(recallId, { bad_recall_count: (meta.bad_recall_count || 0) + 1, last_confirmed_use_at: Date.now() }, undefined); | ||
| } else if (usedRecall) { | ||
| // Pure positive use: boost importance | ||
| let newImportance = Math.min(1.0, currentImportance + boostOnUse); | ||
| if (containsKeyword(userPromptText, confirmKeywords)) { | ||
| newImportance = Math.min(1.0, newImportance + boostOnConfirm); | ||
| } | ||
| await store.update(recallId, { importance: newImportance }, undefined); | ||
| await store.patchMetadata(recallId, { last_confirmed_use_at: Date.now(), bad_recall_count: 0 }, undefined); | ||
| } else { | ||
| // P1 fix: align scoring penalty threshold with injection increment. | ||
| // Silent miss: apply penalty if badCount >= 1 (injection path handles increment). | ||
| const badCount = meta.bad_recall_count || 0; | ||
| if (badCount >= 1 && (meta.injected_count || 0) >= minRecallCountForPenalty) { | ||
| await store.update(recallId, { importance: Math.max(0.1, currentImportance - penaltyOnMiss) }, undefined); | ||
| } | ||
| // No increment here - injection path already increments via staleInjected. | ||
| } | ||
| } | ||
| } catch (err) { | ||
| api.logger.warn(`memory-lancedb-pro: recall usage scoring failed: ${String(err)}`); | ||
| } finally { | ||
| pendingRecall.delete(sessionKey); | ||
| } | ||
| } | ||
| }, { priority: 5 }); | ||
| } | ||
|
|
||
| // ======================================================================== | ||
| // Proposal A Phase 1: session_end hook - Clean up pending recalls | ||
| // ======================================================================== | ||
| api.on("session_end", (_event: any, ctx: any) => { | ||
| // P1 fix: clean all pendingRecall entries for this session, including composite keys. | ||
| // When autoCapture is false, the auto-capture session_end (priority 10) is skipped, | ||
| // so this hook must handle composite keys (sessionKey:agentId) as well. | ||
| const sessionId = ctx?.sessionId || ""; | ||
| const sessionKey = ctx?.sessionKey || ""; | ||
| for (const key of pendingRecall.keys()) { | ||
| if ( | ||
| key === sessionKey || | ||
| key === sessionId || | ||
| key.startsWith(`${sessionKey}:`) || | ||
| key.startsWith(`${sessionId}:`) | ||
| ) { | ||
| pendingRecall.delete(key); | ||
| } | ||
| } | ||
| }, { priority: 20 }); | ||
|
|
||
| // ======================================================================== | ||
| // Integrated Self-Improvement (inheritance + derived) | ||
| // ======================================================================== | ||
|
|
@@ -4045,6 +4317,12 @@ export function parsePluginConfig(value: unknown): PluginConfig { | |
| : 30, | ||
| } | ||
| : { skipLowValue: false, maxExtractionsPerHour: 30 }, | ||
| // Bug 3 fix: parse and return the feedback config block so deployments | ||
| // that specify custom feedback values actually take effect instead of | ||
| // falling back to hardcoded defaults. | ||
| feedback: typeof cfg.feedback === "object" && cfg.feedback !== null | ||
| ? { ...(cfg.feedback as Record<string, unknown>) } | ||
|
Comment on lines
+4323
to
+4324
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This shallow copy keeps raw feedback values without type coercion, but the scoring path later performs numeric arithmetic with these fields. If a deployment provides values as strings (common with env-driven config), Useful? React with 👍 / 👎.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both issues fixed in commit 3d9f27d: P1 - autoCapture closing brace: The P2 - feedback config type coercion: Added |
||
| : {}, | ||
| }; | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
if (config.autoCapture !== false)block that starts earlier is no longer closed here, so it now stays open until the end ofregister(around line 4033). That makes unrelated hook registration (self-improvement, reflection, lifecycle/backup setup, etc.) conditional onautoCapture, so deployments withautoCapture: falsewill silently lose those features.Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both issues fixed in commit 3d9f27d:
P1 - autoCapture closing brace: The
if (config.autoCapture !== false)block (agent_end auto-capture, line 2625) was missing its closing brace. Added}after the Phase 1 before_prompt_build hook (priority 5) to properly close the block. Self-improvement, reflection, and lifecycle hooks now run independently of autoCapture.P2 - feedback config type coercion: Added
Number()coercion for all numeric feedback config values to prevent string concatenation.