Plugin Version
1.1.0
OpenClaw Version
2026.3.28
Bug Description
开启 smartExtraction: true 后,正常配置 extractMinMessages: 2,但 auto-capture 几乎所有对话都落入 regex fallback 写入 raw text,导致库内记忆全部是脏数据(l0_abstract == text,无 LLM 蒸馏)。
代码有两条累积路径试图凑够 extractMinMessages,两条都失效:
───
路径A:autoCaptureSeenTextCount diffing(失效)
// index.ts line 2169-2176
const previousSeenCount = autoCaptureSeenTextCount.get(sessionKey) ?? 0;
let newTexts = eligibleTexts; // ← 每次 agent_end 的 eligibleTexts 是"当前事件的消息数",不是"历史累积量"
if (pendingIngressTexts.length > 0) {
newTexts = pendingIngressTexts;
} else if (previousSeenCount > 0 && eligibleTexts.length > previousSeenCount) {
newTexts = eligibleTexts.slice(previousSeenCount); // ← 永远不会触发,因为 eligibleTexts.length === previousSeenCount === 1
}
autoCaptureSeenTextCount.set(sessionKey, eligibleTexts.length); // ← 每次覆盖成"1",diffing 失效
在单轮 DM 场景:
• 事件1:eligibleTexts=1, previousSeenCount=0 → newTexts=1 → smart extraction 跳过(需要≥2)
• 事件2:eligibleTexts=1, previousSeenCount=1 → 1 > 1 为 false → newTexts=1 → 同样跳过
日志佐证:
08:44:28 smart-extractor: extracted 3 candidates ← 历史累积生效过一次(跨会话或特定模式)
08:46:41 regex fallback found 1 capturable text(s) ← 后续全走 regex
───
路径B:pendingIngressTexts 跨消息累积(冷启动失效)
// message_received hook — 累积入口
const conversationKey = buildAutoCaptureConversationKeyFromIngress(channelId, conversationId);
queue.push(normalized); // ← 来自用户发送的 ingress 消息
// agent_end hook — 消费出口
const conversationKey = buildAutoCaptureConversationKeyFromSessionKey(sessionKey); // ← 格式: "agent:::"
const pendingIngressTexts = autoCapturePendingIngressTexts.get(conversationKey) ?? [];
问题:pendingIngressTexts.length > 0 时会用 pending 队列替代当前 texts,但这段代码只在 previousSeenCount > 0 时才可能有意义(否则 pending 队列里的内容永远是那1条刚进门的 ingress 消息)。
且 pending 队列只在 previousSeenCount > 0 && eligibleTexts.length > previousSeenCount 时才被"考虑"——第一次对话永远没有 previousSeenCount,永远用 eligibleTexts,永远凑不到2。
───
结果
| 对话模式 |
eligibleTexts |
smartExtraction |
regex fallback |
结果 |
| 单轮 DM(1条 user msg) |
1 |
❌ 跳过(<2) |
✅ 触发 |
⚠️ 脏数据 |
| 多轮历史累积成功 |
≥2 |
✅ 触发 |
❌ 不触发 |
✅ 正常 |
| LLM extraction 失败 |
≥2 |
❌ 失败 |
✅ 触发 |
⚠️ 脏数据 |
───
日志:
memory-pro: smart-extractor: extracted 3 candidate(s) ← smart extraction 成功
memory-pro: smart-extractor: created [cases] Memory-lanceDB-pro dirty data issue
memory-pro: smart-extractor: created [preferences] Model preference: Yunwu GPT-4o
memory-pro: smart-extracted 2 created, 0 merged, 1 skipped ← 正常
regex fallback found 1 capturable text(s) ← 单轮 DM 落入 fallback
memory-lancedb-pro: auto-captured 1 memories for agent main in scope agent:main ← 脏数据
Expected Behavior
改 extractMinMessages 语义
将 extractMinMessages 从"每轮 eligible texts 数量"改为"smart extraction 触发前需要累积的最小 conversation rounds",并在 session 级别真正做累积计数,而不是依赖 per-event 的 diffing hack。
Steps to Reproduce
以上
Error Logs / Screenshots
Embedding Provider
None
OS / Platform
No response
Plugin Version
1.1.0
OpenClaw Version
2026.3.28
Bug Description
开启 smartExtraction: true 后,正常配置 extractMinMessages: 2,但 auto-capture 几乎所有对话都落入 regex fallback 写入 raw text,导致库内记忆全部是脏数据(l0_abstract == text,无 LLM 蒸馏)。
代码有两条累积路径试图凑够 extractMinMessages,两条都失效:
───
路径A:autoCaptureSeenTextCount diffing(失效)
// index.ts line 2169-2176
const previousSeenCount = autoCaptureSeenTextCount.get(sessionKey) ?? 0;
let newTexts = eligibleTexts; // ← 每次 agent_end 的 eligibleTexts 是"当前事件的消息数",不是"历史累积量"
if (pendingIngressTexts.length > 0) {
newTexts = pendingIngressTexts;
} else if (previousSeenCount > 0 && eligibleTexts.length > previousSeenCount) {
newTexts = eligibleTexts.slice(previousSeenCount); // ← 永远不会触发,因为 eligibleTexts.length === previousSeenCount === 1
}
autoCaptureSeenTextCount.set(sessionKey, eligibleTexts.length); // ← 每次覆盖成"1",diffing 失效
在单轮 DM 场景:
• 事件1:eligibleTexts=1, previousSeenCount=0 → newTexts=1 → smart extraction 跳过(需要≥2)
• 事件2:eligibleTexts=1, previousSeenCount=1 → 1 > 1 为 false → newTexts=1 → 同样跳过
日志佐证:
08:44:28 smart-extractor: extracted 3 candidates ← 历史累积生效过一次(跨会话或特定模式)
08:46:41 regex fallback found 1 capturable text(s) ← 后续全走 regex
───
路径B:pendingIngressTexts 跨消息累积(冷启动失效)
// message_received hook — 累积入口
const conversationKey = buildAutoCaptureConversationKeyFromIngress(channelId, conversationId);
queue.push(normalized); // ← 来自用户发送的 ingress 消息
// agent_end hook — 消费出口
const conversationKey = buildAutoCaptureConversationKeyFromSessionKey(sessionKey); // ← 格式: "agent:::"
const pendingIngressTexts = autoCapturePendingIngressTexts.get(conversationKey) ?? [];
问题:pendingIngressTexts.length > 0 时会用 pending 队列替代当前 texts,但这段代码只在 previousSeenCount > 0 时才可能有意义(否则 pending 队列里的内容永远是那1条刚进门的 ingress 消息)。
且 pending 队列只在 previousSeenCount > 0 && eligibleTexts.length > previousSeenCount 时才被"考虑"——第一次对话永远没有 previousSeenCount,永远用 eligibleTexts,永远凑不到2。
───
结果
───
日志:
memory-pro: smart-extractor: extracted 3 candidate(s) ← smart extraction 成功
memory-pro: smart-extractor: created [cases] Memory-lanceDB-pro dirty data issue
memory-pro: smart-extractor: created [preferences] Model preference: Yunwu GPT-4o
memory-pro: smart-extracted 2 created, 0 merged, 1 skipped ← 正常
regex fallback found 1 capturable text(s) ← 单轮 DM 落入 fallback
memory-lancedb-pro: auto-captured 1 memories for agent main in scope agent:main ← 脏数据
Expected Behavior
改 extractMinMessages 语义
将 extractMinMessages 从"每轮 eligible texts 数量"改为"smart extraction 触发前需要累积的最小 conversation rounds",并在 session 级别真正做累积计数,而不是依赖 per-event 的 diffing hack。
Steps to Reproduce
以上
Error Logs / Screenshots
Embedding Provider
None
OS / Platform
No response