Skip to content

🔴 P0 rename bug — rename 后消息发不到新 alias 节点 (root cause + Docker 5-case repro + 方案) #146

@s2agi

Description

@s2agi

🔴 P0 rename bug — rename 后消息发不到新 alias 节点

Vincent /goal (telegram 5387, 2026-05-17 07:13 北京): "rename bug 多多,rename 之后根本发消息发不到 rename 后的节点,需要有一个完整的方案及测试案例在 docker 上测试好"

🩹 现象

  • 用户 anet node rename old-alias new-alias
  • rename 后通过 commhub send_task / send_message 到 new-alias 失败 (或 routing 到旧 alias)
  • 实际 SSE session 可能仍以 old-alias 注册 / channel session_id 没更新

🔍 Suspected root causes (待 SDK马 verify)

  1. SSE session_id 不刷net_<networkId>:<alias> 这个 sse_sessions key 是 alias-keyed, rename 后 server 端 session 还挂在 old alias
  2. commhub.db sessions 表 stale — rename 只改了 alias display name,没 cascade update channel routing 表
  3. agent-node config.json — rename 只改 .anet/nodes/old/.anet/nodes/new/ 目录,没重 register hub
  4. client-side alias cache — sender 用的 alias→address resolution cache stale

📋 Repro plan (测试马 Docker)

环境: node:24-alpine + commhub-server@latest (0.8.2) + agent-node@latest (2.4.0)

5-case test matrix:

# 操作 预期
1 rename beforeafter while agent running (resume) send_task to after ✅ 收到
2 rename beforeafter while agent stopped send_task to after ✅ 收到 (重启后)
3 rename 后 sender 不 reload alias cache send_task to after 应有明确 error (not silent timeout)
4 rename purely-created (#110 RFC §4.1 已知 bug) error message clear
5 rename 后 dashboard 显示 after (不是 before) dashboard SSE 真显示新 alias

每 case 输出: rename 命令 + send_task 命令 + curl /api/sessions verify + commhub_get_all_status verify + dashboard playwright screenshot

🛠 方案 (待 SDK马 propose, 复杂度待评估)

  • A: 服务端 cascade — rename 命令触发 server-side session_id key 更新 (热刷新)
  • B: 客户端 force-refresh — rename 后自动 anet node restart 触发 re-register
  • C: Hybrid — server-side 提供 /api/sessions/rename endpoint, CLI 用它 + force re-register

🎯 Ship 路径

  • v0.10.2 patch hotfix (如 root cause 简单, 类似 v0.10.1 PINNED bump 模式)
  • OR fold-in v0.11.0 P0 (如需架构调整, e.g., session_id 永久化方案)

🟡 待 SDK马 + 测试马 dispatch confirm

  • SDK马: root cause investigation + 方案 proposal (~2-3h)
  • 测试马: Docker 5-case repro + verify after fix (~2h)
  • 通信龙: 10min 进度 cron 给 Vincent surface

📎 Related

  • #110 anet node rename 对 purely-created 节点失败 (P2 known bug)
  • #84 anet CLI node rename — RFC-level 方案 (CLOSED, v0.9.0 RFC-010 ship)
  • #62 进度同步 (10min cron 汇报)
  • #144 v0.11.0 stub (此 bug fix 可 fold-in)

Author-Agent: 通信龙 (Vincent /goal 5387 P0 dispatch)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Critical — 阻塞用户/安全/数据丢失

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions