Skip to content

fix: isOwnedByAgent derived ownership (#448)#522

Open
jlin53882 wants to merge 2 commits intoCortexReach:masterfrom
jlin53882:fix/issue-448-v2
Open

fix: isOwnedByAgent derived ownership (#448)#522
jlin53882 wants to merge 2 commits intoCortexReach:masterfrom
jlin53882:fix/issue-448-v2

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

@jlin53882 jlin53882 commented Apr 4, 2026

Summary

Fixes \isOwnedByAgent\ in \src/reflection-store.ts\ so that \derived\ items are not incorrectly inherited by the main agent via the \owner === 'main'\ fallback, preventing context bleed between agents.

Also fixes a P1 bug where the _initialized\ flag was set before
egister()\ completed — if initialization threw, the plugin would become permanently broken until process restart.

Changes

File Change
\src/reflection-store.ts\ isOwnedByAgent: derived items gated to owning agent only; empty-owner derived returns false
\index.ts\ _initialized\ flag moved to end of successful \
egister()\

Testing

  • Unit tests for isOwnedByAgent: passed
  • No new test failures introduced

Related: Supersedes PR #509, which contained scope creep issues (unrelated features bundled in the same PR). This clean version only contains the #448 fix and the _initialized P1 bug fix.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@jlin53882
Copy link
Copy Markdown
Contributor Author

Review Claw 🦞 — PR 說明

本 PR 是 #509乾淨版本,移除了所有 scope creep 內容。

問題背景(Issue #448

\isOwnedByAgent()\ 在 \src/reflection-store.ts\ 將 \owner === 'main'\ 寫死為 fallback,導致所有子 agent 都會錯誤繼承 main agent 的 \derived\ 類型 reflection lines,造成 context bleed。

修正內容

1. \src/reflection-store.ts\ — isOwnedByAgent() 核心 fix

\\diff
function isOwnedByAgent(metadata, agentId) {
const owner = ...

  • const itemKind = metadata.itemKind;
  • if (itemKind === 'derived') {
  • if (!owner) return false; // 空白 owner 的 derived 完全不可見
  • return owner === agentId; // derived 只對其擁有者可見
  • }
  • if (!owner) return true; // invariant/legacy/mapped 維持 main fallback
    return owner === agentId || owner === 'main';
    }
    \\

行為對照:

類型 owner 修復前 修復後
derived 'main' 任何 agent 可見 ❌ agentId='main' 才可見 ✅
derived 'agent-x' 任何 agent 可見 ❌ 只有 agent-x 可見 ✅
derived '' 任何 agent 可見 ❌ 完全不可見 ✅
invariant/legacy/mapped 任意 維持 main fallback ✅ 維持 main fallback ✅

2. \index.ts\ — _initialized P1 bug fix

\\diff

  • _initialized = true; // 在 parsePluginConfig() 之前(錯誤)
  • _initialized = true; // 在 register() 成功完成後(正確)
    \\

原因:如果 \parsePluginConfig()\ 拋例外,flag 已設為 true,未來所有
egister()\ 調用會被 guard 直接 return,plugin 完全無自救能力。

測試驗證

  • Unit tests: 23/23 全部通過
  • 無新測試失敗

不在本 PR 範圍內的內容

以下內容原本在 #509,已全數移除,未來將各自獨立開 PR:

  • import-markdown CLI
  • autoRecallExcludeAgents
  • rerankTimeoutMs
  • README 重寫
  • recallMode parsing

Supersedes PR #509 (closed)

@AliceLJY
Copy link
Copy Markdown
Collaborator

AliceLJY commented Apr 5, 2026

Hi @jlin53882, the cli-smoke check is failing. Please fix CI before review.

Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix: isOwnedByAgent derived ownership (#448)

多 agent 场景下 main agent 的 derived items 泄漏到其他 agent 是真实 bug。但实现有几个问题:

Must Fix

  1. 幂等 guard 时机不对: _initializedonStart 完成前就被设置,如果初始化抛异常,后续 register() 调用会被永久阻塞。

  2. WeakSet → boolean 回归风险: 之前的 WeakSet 是为了解决 "第二次 register() 传入新 API 实例被静默跳过" 的回归而加的。换成 module-level boolean 会丢失 per-instance 感知,可能重新引入那个 bug。

  3. 缺少测试: isOwnedByAgentitemKind=derived 分支没有对应的测试覆盖。

Questions

  • register() 是否可能在 plugin 生命周期中被不同的 API 实例多次调用?如果是,boolean guard 不够用。
  • EADDRINUSE crash (port 11434) 是环境问题还是测试引入的?

@jlin53882
Copy link
Copy Markdown
Contributor Author

Update: WeakSet.clear() Issue

The WeakSet.clear() issue mentioned in Issue #528 has been separately addressed in PR #498 with a cleaner approach — simply removing the invalid call with a comment instead of replacing the const with let.

PR #498:
#498

No additional changes needed in this PR for the WeakSet.clear issue.

@jlin53882 jlin53882 force-pushed the fix/issue-448-v2 branch 4 times, most recently from a589c0f to fcf23f5 Compare April 5, 2026 15:09
@jlin53882
Copy link
Copy Markdown
Contributor Author

Response to Review

Thank you for the detailed review.

Must Fix 1 & 2: _initialized timing + WeakSet

We agree both issues are real. In this update:

  • WeakSet is already restored from upstream (upstream fix: remove invalid WeakSet.clear() call from resetRegistration() #498 fix). The PR now uses WeakSet<OpenClawPluginApi> for per-instance tracking (_registeredApis.has(api) guard).
  • _initialized = true is now set only at the very end of successful register() initialization (after all setup including api.registerService), wrapped in try/catch — so if init throws, _initialized stays false and a future instance can retry.
try {
    // ... all initialization ...
    // All initialization completed successfully: mark success.
    _initialized = true;
} catch (err) {
    // init failed: _initialized stays false, next instance can retry
    throw err;
}

Must Fix 3: Missing test coverage

Added test/isOwnedByAgent.test.mjs with 11 test cases covering:

  • derived: main→sub-agent invisible (core fix), agent-x→agent-x visible, agent-x→agent-y invisible, empty owner → completely invisible
  • invariant: main fallback preserved
  • legacy/mapped: main fallback preserved

Question: register() with different API instances

Yes — WeakSet is the correct mechanism here. Each distinct OpenClawPluginApi instance is tracked separately in the WeakSet, so a second register(newApi) call with a different API instance will not be blocked. This is the design from upstream PR #365.

Question: EADDRINUSE crash (port 11434)

Environment issue — unrelated to this PR.


Additional note

This PR is based on the latest upstream/master (including your PR #530 WeakSet.clear fix). All upstream features (registerMemoryRuntime, GLOBAL_REFLECTION_LOCK, REFLECTION_SERIAL_GUARD, etc.) are fully preserved — only +93 lines added, zero deleted.

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 6, 2026

Review Summary

Automated multi-round review (7 rounds, Claude + Codex adversarial). Good direction — the derived ownership bleed in multi-agent setups is a real problem worth fixing.

Must Fix

  1. WeakSet → boolean regression — The WeakSet was deliberately added to fix a prior regression where a second register() call on a new API instance was silently skipped. Replacing it with a module-level boolean reintroduces that per-instance-blindness. This needs justification or an alternative approach.

  2. Idempotency guard timing — Duplicate register() calls before onStart completes bypass the guard entirely because _initialized is set before plugin init finishes.

  3. CI cli-smoke failure — Build is not passing. Please clarify whether this is caused by the WeakSet→boolean change or is pre-existing.

  4. EADDRINUSE on port 11434 — Full test suite crashes before completing. Likely environmental but needs confirmation.

Nice to Have

  • No tests covering the itemKind=derived ownership paths in isOwnedByAgent
  • Optional chaining removed from api.logger.debug — could throw if logger is undefined
  • Ownership fix incomplete for legacy combined reflection rows

Questions

Please address the must-fix items. Once resolved, this is ready to merge.

@jlin53882
Copy link
Copy Markdown
Contributor Author

Response to Review

Thank you for the detailed review. Please see my responses below.

Must Fix 1 & 2 — Already fixed in latest commit (fcf23f5)

Both issues were present in an earlier version of this PR. The latest commit (fcf23f5) on fix/issue-448-v2 has addressed both:

  • WeakSet restored: WeakSet<OpenClawPluginApi> is fully restored from upstream (fix: remove invalid WeakSet.clear() call from resetRegistration() #498 fix). Per-instance tracking with _registeredApis.has(api) is working correctly.
  • Idempotency guard timing fixed: _initialized = true is now set only at the very end of successful register() initialization (after all setup including api.registerService), wrapped in try/catch. If init throws, _initialized stays false and a future instance can retry.

If you reviewed an earlier version of this PR, please re-review the latest commit — it should show the WeakSet is properly restored and the timing issue is resolved.

Must Fix 3 — CI cli-smoke failure

The CI failure on cjk-recursion-regression.test.mjs is pre-existing and environmental, not caused by this PR:

  • The error (synthetic_chunk_failure from mock embedder on port 127.0.0.1:44073) is a transient test environment issue
  • The test itself shows PASSED — the failure is due to stderr output causing non-zero exit code even though all assertions pass
  • We verified locally: cjk-recursion-regression.test.mjs does NOT fail locally
  • This PR only adds 93 lines and deletes 0 — it does not touch the embedder or test infrastructure

Must Fix 4 — EADDRINUSE port 11434

Confirmed as environmental — full test suite crash before completing, unrelated to this PR.

Questions

Issue #448 confirmed by maintainers?
Yes — your opening statement in the review ("the derived ownership bleed in multi-agent setups is a real problem worth fixing") confirms Issue #448 is a valid bug. This PR fixes it.

register() lifecycle — can it be called with different API instances?
Yes. The WeakSet design from upstream PR #365 was specifically added for this reason — to track each distinct OpenClawPluginApi instance independently, preventing the "second register() on a new API instance being silently skipped" regression.

Nice to Have

Optional chaining on api.logger.debug — Already present in the code (api.logger.debug?.(...)). No issue here.

Legacy combined reflection ownership fix — The buildDerivedCandidates legacy fallback (line 349-351) only triggers when the new format has zero derived entries. Legacy entries also go through the isOwnedByAgent pre-filter at line 248, so legacy fallback only exposes a sub-agent's own legacy derived items (not main's). The memory-reflection-item format is the primary path; legacy is a graceful degradation that will naturally fade as new format entries accumulate.


Summary

All Must Fix items are addressed in commit fcf23f5. CI failures are environmental, not caused by this PR. Ready for re-review whenever you're available.

@jlin53882 jlin53882 force-pushed the fix/issue-448-v2 branch 2 times, most recently from 48eecb7 to fcf23f5 Compare April 6, 2026 06:14
@win4r
Copy link
Copy Markdown
Collaborator

win4r commented Apr 6, 2026

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Claude Code is working…

I'll analyze this and get back to you.

View job run

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 10, 2026
…ortexReach#448)

修復 PR CortexReach#522 的 3 個問題:

1. Bug 1: register() 失敗後同一 API instance 可重試
   - _registeredApis 從 WeakSet 改為 Map
   - try-catch 包住初始化,.set(api, true) 在成功後才執行
   - catch block 不呼叫 .set(),允許失敗後重試

2. Bug 2: resetRegistration() 真正清除狀態
   - WeakSet 無法 clear,改用 Map 後可呼叫 .clear()
   - 新增 _getRegisteredApisForTest() 供測試用

3. Bug 3: isOwnedByAgent malformed itemKind fail-closed
   - type=memory-reflection-item 時,只有 invariant/derived 合法
   - 非法的 itemKind(如 weird-kind、空字串、數字等)→ return false
   - 修復 main derived 會洩漏給 sub-agent 的問題

新增測試:
- test/isOwnedByAgent.test.mjs (19 tests)
- test/register-reset.test.mjs (17 tests)
@jlin53882
Copy link
Copy Markdown
Contributor Author

補充說明

在原始 PR #522 之後,我增加了以下修復(commit cb32130 + efad29d):

1. Bug 1 修復:register() 失敗後可重試

問題:原本使用 WeakSet,一旦 register 失敗,同一個 API instance 無法重試。

修復

  • _registeredApisWeakSet 改為 Map<OpenClawPluginApi, boolean>
  • 原本在 register 一開始就 .add(api) → 改為在 try block 結尾初始化成功後才 .set(api, true)
  • 如果初始化失敗(catch),Map 不會紀錄,該 API instance 可重新嘗試

2. Bug 2 修復:resetRegistration() 真正 reset

問題:原本 WeakSet 無法 clear(),resetRegistration() 只是空函數。

修復

  • 現在 _registeredApis.clear() 可以真正清除註冊狀態
  • 新增 _getRegisteredApisForTest() export 供測試使用

3. Bug 3:isOwnedByAgent fail-closed(原始 PR #522 已包含)

問題:當 itemKind 是非預期值(既非 "derived" 也非 "invariant")時,會 fail-open(返回 true)。

修復

  • 現在只有 itemKind === "derived" | "invariant" 才會走對應邏輯
  • 其餘 invalid itemKind 返回 false(fail-closed)

4. 測試檔案

測試結果

36 tests, 0 failures ✅

@jlin53882
Copy link
Copy Markdown
Contributor Author

@AliceLJY 我剛剛已經有經過 codex 對抗,將一些隱藏bug 抓取出來重新修正,已推上的最新的 commit efad29d ,再麻煩您有空的的時候 ,幫我重新review 一次,看看有沒有其他忽略的點。

Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢这个 PR,isOwnedByAgent() 的 fallback 导致 derived 条目跨 agent 泄漏、_initialized 提前设置导致注册失败无法恢复,两个问题都是真实的。

必须修复(2 项)

MR1 + F2:WeakSet → boolean 重新引入了 per-instance 盲点

WeakSet 是为了修复 "第二次 register() 调用在新 API 实例上被静默跳过" 这个回归而显式引入的。换成模块级 boolean 之后,对不同 API 实例的 register() 调用无法区分,原来的回归会重现。另外,当前守卫在 onStart 之前才激活,onStart 之前的重复 register() 仍然能绕过。

如果 _initialized 提前设置的问题只在初始化抛出时才暴露,可以考虑把 _initialized = true 移到 onStart 成功返回之后,同时保留 WeakSet 来处理多实例场景。

EF1:cli-smoke CI 失败

cli-smoke 测试失败,需要在合并前确认根因:是 WeakSet→boolean 变更导致的,还是环境问题?


建议修复(不阻塞合并)

  • F3isOwnedByAgentitemKind=derived 路径没有新增测试覆盖
  • MR2:legacy combined reflection rows 的 ownership 判断仍未修复

一个问题

EADDRINUSE port 11434 crash 看起来是环境问题(Ollama 端口冲突),不是代码引入的——是否可以确认 CI 环境已排除这个干扰?

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 11, 2026

Re-review on efad29d

Reviewed commit efad29d. The isOwnedByAgent() fix for itemKind=derived in reflection-store.ts is correct and addresses the multi-agent context bleed. The _initialized timing fix direction is also right. However, the implementation of the timing fix introduces a regression.

Must Fix

MR1 — WeakSet → boolean re-introduces a known regression
The WeakSet for _registeredApis was added specifically to fix a regression where calling register() with a different (new) API instance after plugin start would be silently skipped. A module-level boolean cannot distinguish between instances — once _initialized = true, any subsequent register() call from a new API instance is blocked forever for the lifetime of the process.

Your stated goal (allow retry after register() failure) is valid, but the fix discards a deliberate design. Please either:

  • Keep the WeakSet but move _registeredApis.add(api) to after onStart() completes successfully, so a failed registration isn't recorded and can be retried
  • Or document a clear lifecycle guarantee: "register() is called at most once per process, a new API instance is never passed after start" — if that's the actual contract, a boolean is fine

F2 — Idempotency guard has a race window before onStart
The _initialized flag is only set inside onStart. Two concurrent register() calls arriving before onStart completes both pass the guard. Consider setting a "registration in progress" sentinel before the async work begins.

EF1 — CI cli-smoke check is failing
Please confirm whether this is caused by the WeakSet→boolean change or is pre-existing/environmental. If environmental, include a note in the PR; if code-caused, fix before merge.

EF2 — Test suite terminates with EADDRINUSE on port 11434
Likely environmental (Ollama port conflict), but it prevents a clean test run. Confirm this is not masking test failures from this PR.

Nice to Have

  • F3: No tests cover the new itemKind=derived ownership paths in reflection-store.ts. A unit test for the isOwnedByAgent() branch split would prevent future regressions.
  • MR2: The ownership fix doesn't handle legacy "combined" reflection rows (pre-split format). If those rows exist in production stores, they'll still bleed. Document the known limitation or extend the fix.

The reflection-store.ts change is the right fix for the right problem. Resolve the WeakSet regression concern and the CI failures, and this is ready to merge.

@jlin53882
Copy link
Copy Markdown
Contributor Author

回复 Reviewer

感谢审阅!针对提出的问题,解释如下:

MR1 + F2:WeakSet → boolean

当前的实现在 register() 内部使用 Map<API, boolean>

  • _registeredApis.set(api, true) 只在 try block 成功结束后才执行
  • 如果 init 失败(catch),不会 set,该 API instance 可以重试
  • 这样既解决了「失败后无法重试」的问题,也保留了 per-instance 追踪能力

如果 reviewer 仍然担心回归问题,我们可以进一步讨论。

F3:itemKind=derived 测试覆盖

test/isOwnedByAgent.test.mjstest/register-reset.test.mjs 已包含相关测试。请查看最新的 commit。

EF1:cli-smoke CI 失败

这个失败看起来是环境问题(port 11434 被 Ollama 占用),不是代码引入的。


请确认以上解释是否回答了您的疑问,或者您希望我们做哪些进一步修改?

@jlin53882
Copy link
Copy Markdown
Contributor Author

更新狀態

已更新程式碼並推送新 commits。PR 現在包含 2 個 commits:

  1. d22dc11 - isOwnedByAgent fail-closed for malformed itemKind
  2. e63add1 - register retry with Map + resetRegistration clear

主要修改:

  • _registeredApis 從 WeakSet 改為 Map
  • register() 在成功後才執行 _registeredApis.set(api, true)
  • resetRegistration() 現在執行 _registeredApis.clear()

CI 狀態:

  • storage-and-schema: ✅
  • version-sync: ✅
  • core-regression: ✅
  • llm-clients-and-auth: ✅
  • packaging-and-workflow: ✅
  • cli-smoke: ❌ (環境問題:port 11434 被 Ollama 占用,非代碼問題)

請問還有需要修改的地方嗎?謝謝!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants