Bug: extractMinMessages=2 + autoCaptureSeenTextCount 累积逻辑失效 → 所有单轮对话都掉入 regex fallback，污染全库为脏数据

### Plugin Version

1.1.0

### OpenClaw Version

2026.3.28

### Bug Description

开启 smartExtraction: true 后，正常配置 extractMinMessages: 2，但 auto-capture 几乎所有对话都落入 regex fallback 写入 raw text，导致库内记忆全部是脏数据（l0_abstract == text，无 LLM 蒸馏）。
代码有两条累积路径试图凑够 extractMinMessages，两条都失效：

───

路径A：autoCaptureSeenTextCount diffing（失效）

// index.ts line 2169-2176
const previousSeenCount = autoCaptureSeenTextCount.get(sessionKey) ?? 0;
let newTexts = eligibleTexts;  // ← 每次 agent_end 的 eligibleTexts 是"当前事件的消息数"，不是"历史累积量"
if (pendingIngressTexts.length > 0) {
  newTexts = pendingIngressTexts;
} else if (previousSeenCount > 0 && eligibleTexts.length > previousSeenCount) {
  newTexts = eligibleTexts.slice(previousSeenCount);  // ← 永远不会触发，因为 eligibleTexts.length === previousSeenCount === 1
}
autoCaptureSeenTextCount.set(sessionKey, eligibleTexts.length);  // ← 每次覆盖成"1"，diffing 失效

在单轮 DM 场景：

• 事件1：eligibleTexts=1, previousSeenCount=0 → newTexts=1 → smart extraction 跳过（需要≥2）
• 事件2：eligibleTexts=1, previousSeenCount=1 → 1 > 1 为 false → newTexts=1 → 同样跳过

日志佐证：

08:44:28 smart-extractor: extracted 3 candidates  ← 历史累积生效过一次（跨会话或特定模式）
08:46:41 regex fallback found 1 capturable text(s)  ← 后续全走 regex

───

路径B：pendingIngressTexts 跨消息累积（冷启动失效）

// message_received hook — 累积入口
const conversationKey = buildAutoCaptureConversationKeyFromIngress(channelId, conversationId);
queue.push(normalized);  // ← 来自用户发送的 ingress 消息

// agent_end hook — 消费出口
const conversationKey = buildAutoCaptureConversationKeyFromSessionKey(sessionKey);  // ← 格式: "agent:<agentId>:<channelId>:<conversationId>"
const pendingIngressTexts = autoCapturePendingIngressTexts.get(conversationKey) ?? [];

问题：pendingIngressTexts.length > 0 时会用 pending 队列替代当前 texts，但这段代码只在 previousSeenCount > 0 时才可能有意义（否则 pending 队列里的内容永远是那1条刚进门的 ingress 消息）。

且 pending 队列只在 previousSeenCount > 0 && eligibleTexts.length > previousSeenCount 时才被"考虑"——第一次对话永远没有 previousSeenCount，永远用 eligibleTexts，永远凑不到2。

───

结果

| 对话模式               | eligibleTexts | smartExtraction | regex fallback | 结果     |
| ------------------ | ------------- | --------------- | -------------- | ------ |
| 单轮 DM（1条 user msg） | 1             | ❌ 跳过（<2）        | ✅ 触发           | ⚠️ 脏数据 |
| 多轮历史累积成功           | ≥2            | ✅ 触发            | ❌ 不触发          | ✅ 正常   |
| LLM extraction 失败  | ≥2            | ❌ 失败            | ✅ 触发           | ⚠️ 脏数据 |

───

日志：
memory-pro: smart-extractor: extracted 3 candidate(s)   ← smart extraction 成功
memory-pro: smart-extractor: created [cases] Memory-lanceDB-pro dirty data issue
memory-pro: smart-extractor: created [preferences] Model preference: Yunwu GPT-4o
memory-pro: smart-extracted 2 created, 0 merged, 1 skipped  ← 正常
regex fallback found 1 capturable text(s)                  ← 单轮 DM 落入 fallback
memory-lancedb-pro: auto-captured 1 memories for agent main in scope agent:main  ← 脏数据

 

### Expected Behavior

改 extractMinMessages 语义
将 extractMinMessages 从"每轮 eligible texts 数量"改为"smart extraction 触发前需要累积的最小 conversation rounds"，并在 session 级别真正做累积计数，而不是依赖 per-event 的 diffing hack。

### Steps to Reproduce

以上

### Error Logs / Screenshots

```shell

```

### Embedding Provider

None

### OS / Platform

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: extractMinMessages=2 + autoCaptureSeenTextCount 累积逻辑失效 → 所有单轮对话都掉入 regex fallback，污染全库为脏数据 #417

Plugin Version

OpenClaw Version

Bug Description

Expected Behavior

Steps to Reproduce

Error Logs / Screenshots

Embedding Provider

OS / Platform

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

对话模式	eligibleTexts	smartExtraction	regex fallback	结果
单轮 DM（1条 user msg）	1	❌ 跳过（<2）	✅ 触发	⚠️ 脏数据
多轮历史累积成功	≥2	✅ 触发	❌ 不触发	✅ 正常
LLM extraction 失败	≥2	❌ 失败	✅ 触发	⚠️ 脏数据

Bug: extractMinMessages=2 + autoCaptureSeenTextCount 累积逻辑失效 → 所有单轮对话都掉入 regex fallback，污染全库为脏数据 #417

Description

Plugin Version

OpenClaw Version

Bug Description

Expected Behavior

Steps to Reproduce

Error Logs / Screenshots

Embedding Provider

OS / Platform

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions