Skip to content

feat(memory): 引入群聊记忆作用域(personal/group/conversation)及可见性模型#2

Merged
piexian merged 6 commits into
masterfrom
feature/group-chat-memory-scope
May 3, 2026
Merged

feat(memory): 引入群聊记忆作用域(personal/group/conversation)及可见性模型#2
piexian merged 6 commits into
masterfrom
feature/group-chat-memory-scope

Conversation

@piexian
Copy link
Copy Markdown
Owner

@piexian piexian commented May 3, 2026

概要

引入三层记忆作用域模型,解决群聊场景下的记忆归属问题。附带模块重构与 CI 集成,版本升至 v0.3.0。

主要变更

群聊记忆作用域

  • personal:用户个人记忆,按 user_id 隔离,仅本人可见
  • group:群组共享记忆,按 session_id 隔离,群内成员可见
  • conversation:当前会话临时上下文,仅当前会话内召回
  • 可见性模型:private(仅所有者)/ group(群内共享),多所有者自动升级
  • 群聊召回自动合并三层记忆,私聊仅召回 personal
  • 旧格式记忆(无 memory_scope 字段)自动兼容为 personal

重构

  • 提取 prompts.py:集中管理 prompt 模板、常量、sanitize 函数
  • _validate_command() 统一命令参数校验,减少 handler 重复代码

CI / 配置

  • 新增 GitHub Actions:ruff lint + format check + 语法编译 + metadata 校验
  • extraction_min_content_length 默认值 500 → 150

文件

文件 变更
main.py 群聊作用域提取/召回、prompt 外移、命令校验统一
memory_manager.py 作用域感知召回(复合过滤器链)、可见性判断、去重
memory_protocol.py MemoryScope 枚举、build_session_id、元数据扩展字段
prompts.py 新增:prompt 模板与常量集中管理
.github/workflows/code-quality.yml 新增:CI 流水线
CHANGELOG.md / README.md 文档同步

由 Sourcery 提供的总结

引入带有可见性规则的作用域化记忆模型,覆盖个人、群组和会话上下文,并调整存储、检索和格式化逻辑,以支持群聊场景和更丰富的元数据。

新功能:

  • 新增记忆作用域级别(personal、group、conversation),并引入可见性模型,用于区分个人与共享记忆,尤其适用于群聊场景。
  • 扩展记忆元数据和格式化能力,引入作用域、所有者、主题、实体、话题以及结构化内容,以提升记忆检索与展示效果。
  • 增强基于 LLM 的记忆抽取逻辑,引入会话作用域上下文和新的结构化字段,以支持多用户/群组场景。

增强改进:

  • 更新记忆检索与列表逻辑,使其具备作用域感知能力,在各上下文中合并相关记忆,同时进行去重并遵守可见性规则。
  • 优化记忆注入与用户可见的记忆列表输出,按作用域对记忆进行分组,并展示作用域和重要性等关键元数据。
  • 增加超时控制,并对检索查询优化以及命令/会话处理进行小幅鲁棒性改进。

构建:

  • 将插件元数据版本提升至 v0.3.0。

文档:

  • 修订 README,描述安装方式、群聊记忆行为,以及包括检索优化超时在内的新配置选项。
  • 更新 v0.3.0 的更新日志,记录作用域化记忆、可见性、元数据扩展以及行为变更。
Original summary in English

由 Sourcery 提供的摘要

在个人、群组和会话上下文中引入带有作用域和可见性的记忆模型,增强存储、检索、格式化和抽取能力,以更好地支持群聊和更丰富的元数据,并同时更新文档和版本。

新特性:

  • 添加多作用域记忆模型(personal、group、conversation),并引入可见性层,用于区分私有记忆与群组共享记忆。
  • 扩展记忆元数据,包括 scope、ownership、visibility、subject、entities、topics 和结构化内容字段,以支持更丰富的回忆与展示。
  • 增强基于 LLM 的记忆抽取,使其能够输出具备 scope 和 subject 感知能力的记忆,并包含 entities/topics,以及针对群聊的特定行为。

增强改进:

  • 更新回忆和列出逻辑,使其具备作用域感知能力,在进行可见性检查和去重的前提下合并 personal、group 和 conversation 记忆。
  • 改进用于注入和用户展示的记忆格式化方式,通过按作用域分组并暴露 scope、owners 和 importance 等关键元数据。
  • 为会话快照添加发送者追踪,并在会话记录中标注发送者信息,以实现更好的归因。
  • 引入可配置的超时和限制,用于回忆查询优化和记忆列表扫描,从而在大型/群组聊天中提升健壮性。

构建:

  • 将插件元数据版本提升至 v0.3.0。

文档:

  • 修订 README,添加安装说明、新的配置选项和群聊记忆行为说明,并在变更日志中记录 v0.3.0 的更新内容。
Original summary in English

Summary by Sourcery

Introduce scoped and visible memory model across personal, group, and conversation contexts, enhancing storage, retrieval, formatting, and extraction to better support group chats and richer metadata, while updating docs and versioning.

New Features:

  • Add multi-scope memory model (personal, group, conversation) with a visibility layer to distinguish private vs group-shared memories.
  • Extend memory metadata with scope, ownership, visibility, subject, entities, topics, and structured content fields to support richer recall and display.
  • Enhance LLM-based memory extraction to output scope- and subject-aware memories with entities/topics, including group-chat specific behavior.

Enhancements:

  • Update recall and listing logic to be scope-aware, merging personal, group, and conversation memories with visibility checks and de-duplication.
  • Improve memory formatting for injection and user display by grouping by scope and exposing key metadata such as scope, owners, and importance.
  • Add sender tracking to conversation snapshots and label conversation lines with sender information for better attribution.
  • Introduce configurable timeouts and limits for recall query optimization and memory listing scans to improve robustness in large/group chats.

Build:

  • Bump plugin metadata version to v0.3.0.

Documentation:

  • Revise README with installation instructions, new configuration options, and group-chat memory behavior, and document v0.3.0 changes in the changelog.

piexian added 2 commits May 3, 2026 18:07
- 新增三层记忆作用域:personal(用户隔离)、group(群共享)、conversation(会话临时)
- 新增可见性模型:private(仅所有者)/ group(群内共享),多所有者自动升级
- memory_protocol: 新增 MemoryScope 枚举、build_session_id、MemoryMetadata 扩展字段
- memory_manager: 作用域感知召回(复合过滤器链)、可见性判断、旧格式兼容、去重
- main: 提取 prompt 加入作用域信息、sender 追踪、scope/subject/entities/topics 解析
- 群聊召回自动合并 personal + group + conversation 三层记忆,私聊仅召回 personal
- 版本升至 v0.3.0
- 新增 prompts.py:集中管理 MEMORY_EXTRACTION_PROMPT、RECALL_QUERY_PROMPT、
  sanitize_memory_content()、SENSITIVE_PATTERNS 及提取上限常量
- main.py:导入 prompts 模块,新增 _validate_command() 统一命令参数校验,
  简化 _strip_json_fence/_normalize_contexts 等工具函数
- 新增 GitHub Actions CI:ruff lint + format check + 语法编译 + metadata 校验
- extraction_min_content_length 默认值 500 → 150
- metadata.yaml 清理尾部空格
Copilot AI review requested due to automatic review settings May 3, 2026 10:24
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 3, 2026

审阅者指南

为聊天记忆引入作用域化的内存与可见性模型(个人 / 群组 / 会话),让存储 / 召回 / 列表逻辑具备作用域感知并保持向后兼容,丰富记忆元数据与格式,同时重构提示词 / 命令处理,并更新配置与 CI 以适配 v0.3.0。

群聊中具备作用域感知的记忆召回时序图

sequenceDiagram
    actor User
    participant AstrBot
    participant SimpleLongMemoryPlugin as PluginMain
    participant MemoryManager
    participant VecDB

    User->>AstrBot: Send message in group chat
    AstrBot->>PluginMain: on_message(event)
    PluginMain->>MemoryManager: recall_memories(event, query, domain, top_k, all_users=false, memory_scope=None)

    activate MemoryManager
    MemoryManager->>MemoryManager: _build_recall_filters(event, global_memory, domain, memory_scope=None)
    Note over MemoryManager: Determine current_user_id
    MemoryManager->>MemoryManager: _scope_filter(scope=personal)
    MemoryManager->>MemoryManager: _legacy_personal_filter()
    MemoryManager->>MemoryManager: _scope_filter(scope=group)
    MemoryManager->>MemoryManager: _scope_filter(scope=conversation)

    loop For each filter in filters_list
        MemoryManager->>VecDB: retrieve(query, top_k, filters)
        VecDB-->>MemoryManager: results[] with metadata
        MemoryManager->>MemoryManager: _retrieve_with_filter(..., legacy_personal, owner_user_id, require_owner_list)
        MemoryManager->>MemoryManager: _is_visible_personal_memory(metadata, owner_user_id, require_owner_list)
        MemoryManager-->>MemoryManager: visible_memories_for_filter
    end

    MemoryManager->>MemoryManager: _dedupe_memories(all_visible_memories)
    MemoryManager-->>PluginMain: memories (personal + group + conversation)
    deactivate MemoryManager

    PluginMain->>PluginMain: format_memory_for_injection(memories)
    PluginMain-->>AstrBot: augmented prompt with grouped memories
    AstrBot-->>User: LLM reply with injected context
Loading

带作用域与所有权的基于 LLM 的记忆抽取时序图

sequenceDiagram
    actor User
    participant AstrBot
    participant SimpleLongMemoryPlugin as PluginMain
    participant MemoryManager
    participant LLM

    User->>AstrBot: Chat message
    AstrBot->>PluginMain: on_llm_response(event, response)

    PluginMain->>PluginMain: _accumulate_request_snapshot(event, request)
    PluginMain->>PluginMain: _build_conversation_from_snapshots(event)
    PluginMain->>LLM: MEMORY_EXTRACTION_PROMPT with platform_id, session_type, session_id, sender_id, conversation
    LLM-->>PluginMain: extraction_result JSON

    PluginMain->>PluginMain: _parse_extracted_memories(extraction_result, session_type)
    Note over PluginMain: For each item
    PluginMain->>PluginMain: _normalize_extracted_scope(scope, session_type)
    PluginMain->>PluginMain: _normalize_subject_ids(subject or subjects)
    PluginMain->>PluginMain: _sanitize_string_list(entities, topics)

    loop For each validated memory
        PluginMain->>PluginMain: choose subject and subjects
        PluginMain->>PluginMain: owner_sender_ids = subjects if scope==personal else []
        PluginMain->>MemoryManager: store_memory(event, content, domain, memory_type, disclosure, importance, memory_scope=scope, subject=subject, entities=entities, topics=topics, owner_sender_ids=owner_sender_ids)
        activate MemoryManager
        MemoryManager->>MemoryManager: _event_scope_ids(event, owner_sender_ids[0] or sender)
        MemoryManager->>MemoryManager: _build_owner_user_ids(platform_id, owner_sender_ids)
        MemoryManager->>MemoryManager: _build_memory_metadata(...)
        MemoryManager->>VecDB: add_documents with metadata(memory_scope, owner_user_ids, owner_session_id, visibility, speaker_id, subject, entities, topics, memory_content)
        deactivate MemoryManager
    end

    PluginMain-->>AstrBot: extraction finished
    AstrBot-->>User: continues conversation
Loading

作用域化内存模型与元数据的类图

classDiagram
    class MemoryDomain {
        <<enumeration>>
        USER_PROFILE : str
        PREFERENCES : str
        FACTS : str
        EVENTS : str
        CONTEXT : str
    }

    class MemoryScope {
        <<enumeration>>
        PERSONAL : str
        GROUP : str
        CONVERSATION : str
    }

    class MemoryVisibility {
        <<enumeration>>
        PRIVATE : str
        GROUP : str
    }

    class MemoryMetadata {
        +str uri
        +str domain
        +str user_id
        +str platform_id
        +str sender_id
        +str umo
        +str session_type
        +str session_id
        +str created_at
        +str last_recalled_at
        +int recall_count
        +int importance
        +bool compressed
        +str memory_scope
        +str owner_user_id
        +list~str~ owner_user_ids
        +str owner_session_id
        +str visibility
        +str speaker_id
        +str subject
        +list~str~ entities
        +list~str~ topics
        +str memory_content
        +str impression
        +str migrated_from
        +str migrated_to
        +to_dict() dict~str, Any~
        +from_dict(data dict~str, Any~) MemoryMetadata
    }

    class MemoryManager {
        +dict config
        +store_memory(event AstrMessageEvent, content str, domain str, memory_type str, disclosure str, importance int, memory_scope str, visibility str, subject str, entities list~str~, topics list~str~, owner_sender_id str, owner_sender_ids list~str~) str
        +recall_memories(event AstrMessageEvent, query str, domain str, top_k int, all_users bool, memory_scope str) list~dict~
        +list_memories(event AstrMessageEvent, domain str, page int, page_size int, all_users bool) tuple
        +get_memory_by_uri(event AstrMessageEvent, uri str) dict~str, Any~
        +forget_memory(event AstrMessageEvent, uri str) bool
        -_event_scope_ids(event AstrMessageEvent, owner_sender_id str) tuple~UMOInfo, str, str~
        -_build_owner_user_ids(platform_id str, owner_sender_ids list~str~) list~str~
        -_scope_filter(event AstrMessageEvent, memory_scope str, global_memory bool) dict~str, Any~
        -_legacy_personal_filter(event AstrMessageEvent, global_memory bool) dict~str, Any~
        -_build_recall_filters(event AstrMessageEvent, global_memory bool, domain str, memory_scope str) list~tuple~dict~str, Any~, bool, str, bool~~
        -_retrieve_with_filter(query str, top_k int, filters dict~str, Any~, legacy_personal bool, owner_user_id str, require_owner_list bool) list~dict~
        -_is_visible_personal_memory(metadata dict~str, Any~, owner_user_id str, require_owner_list bool) bool
        -_dedupe_memories(memories list~dict~) list~dict~
        -_list_visible_user_documents(event AstrMessageEvent, domain str, page int, page_size int) list~dict~
        -_memory_list_scan_limit(page int, page_size int) int
    }

    class MemoryProtocolUtils {
        +normalize_memory_scope(scope str) str
        +build_user_id(platform_id str, sender_id str) str
        +build_session_id(platform_id str, session_id str) str
        +format_memory_content(content str, metadata MemoryMetadata) str
        +format_memory_for_injection(memories list~dict~, max_length int) str
        +format_memory_for_user(memories list~dict~, page int, page_size int, total int) str
    }

    MemoryMetadata --> MemoryScope : uses
    MemoryMetadata --> MemoryVisibility : uses
    MemoryMetadata --> MemoryDomain : uses

    MemoryManager --> MemoryMetadata : stores
    MemoryManager --> MemoryScope : filters_by
    MemoryManager --> MemoryVisibility : checks
    MemoryManager --> MemoryProtocolUtils : calls

    MemoryProtocolUtils --> MemoryMetadata : constructs
    MemoryProtocolUtils --> MemoryScope : reads
    MemoryProtocolUtils --> MemoryDomain : formats
    MemoryProtocolUtils --> MemoryVisibility : formats
Loading

文件级变更

Change Details Files
Add scoped memory & visibility model with enriched metadata and formatting.
  • 定义 MemoryScope、MemoryVisibility、MemoryDomain 枚举以及辅助工具,用于作用域归一化、session_id 构造、列表归一化,并扩展 MemoryMetadata,增加新的字段(scope、owners、visibility、subject、entities、topics、memory_content)。
  • 调整元数据构造和待写入内容的刷新逻辑,填充新的作用域 / 所有权 / 可见性字段,将旧版记忆默认为 personal 作用域,并持久化额外的结构化字段。
  • 更新格式化辅助方法以渲染结构化记忆文本:scope/domain/visibility/subject/owners/entities/topics/importance;在注入时按作用域对记忆分组,并在面向用户的记忆列表中显示作用域,同时改进内容提取。
memory_protocol.py
memory_manager.py
Make recall, listing, and visibility logic scope-aware for personal/group/conversation and legacy memories.
  • 引入具备作用域与会话感知的过滤器(_scope_filter、_legacy_personal_filter、_build_recall_filters),根据会话类型和配置组合 personal、group 和 conversation 记忆;在 all_users 模式下则绕过按用户隔离的限制。
  • 新增 _retrieve_with_filter 和 _is_visible_personal_memory,用于基于 owner_user_id/owner_user_ids 和 visibility 对旧版或群组共享的个人记忆进行后置过滤,并按 uri/文本对召回结果去重。
  • 变更 list_memories,使其在非 all_users 模式下使用具备可见性意识的列表路径,包括 group 可见的个人记忆,且结果经过去重并受可配置的扫描上限约束。
memory_manager.py
Enhance memory extraction to be scope-aware and subject/owner aware, leveraging new prompts and conversation snapshots.
  • 在请求快照中追踪 sender_id,并在重建会话历史时加入发送者标签,使抽取时能够看到按发送者区分的对话行。
  • 扩展 MEMORY_EXTRACTION_PROMPT,加入平台 / 会话元数据以及 scope/subject/subjects/entities/topics 字段,并添加针对 group 与 personal/conversation 作用域选择的显式规则。
  • 解析 LLM 抽取结果中的 scope/subject/subjects/entities/topics;根据会话类型进行归一化,丢弃无效的 group-personal 组合;在存储时按作用域计算 subject(s)、owner_sender_ids,以及 personal/group/conversation 记忆的作用域特定 subject 默认值。
main.py
prompts.py
Improve recall query optimization robustness and configuration plus general sanitization helpers.
  • 为召回查询优化引入可配置的超时时间(optimize_recall_query_timeout),将其限制在安全区间内,并用 asyncio.wait_for 包裹 llm_generate 以处理超时。
  • 增加用于清洗字符串列表、限制超时时间、归一化抽取出的作用域 / subject / subject_ids 以及按不同作用域归一化当前说话人 subject 的辅助方法。
main.py
README.md
_conf_schema.json
Versioning, documentation, and CI/config updates for the new behavior.
  • 更新 README,加入新的安装说明、配置项(max_memory_list_scan、optimize_recall_query_timeout),以及对群聊行为和作用域的描述。
  • 在 CHANGELOG 中记录 v0.3.0 变更,包括作用域 / 可见性模型、元数据字段、列表行为以及新增配置项。
  • 将插件元数据版本提升到 v0.3.0,并新增 / 调整 CI 工作流以进行代码质量检查(ruff、格式化、语法、元数据)。
README.md
CHANGELOG.md
metadata.yaml
.github/workflows/code-quality.yml
_conf_schema.json

提示与命令

与 Sourcery 交互

  • 触发新的审查: 在 Pull Request 上评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub issue: 回复 Sourcery 的某条审查评论,要求其基于该评论创建 issue。你也可以直接回复该评论 @sourcery-ai issue 来创建 issue。
  • 生成 Pull Request 标题: 在 Pull Request 标题中任意位置写上 @sourcery-ai 以在任何时间生成标题。你也可以在 Pull Request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 Pull Request 摘要: 在 Pull Request 正文中任意位置写上 @sourcery-ai summary,即可在指定位置生成 PR 摘要。你也可以评论 @sourcery-ai summary 来在任何时间(重新)生成摘要。
  • 生成审阅者指南: 在 Pull Request 中评论 @sourcery-ai guide,即可在任何时间(重新)生成审阅者指南。
  • 解决所有 Sourcery 评论: 在 Pull Request 中评论 @sourcery-ai resolve,以一次性解决所有 Sourcery 评论。如果你已经处理了所有评论且不希望再看到它们,这会很有用。
  • 忽略所有 Sourcery 审查: 在 Pull Request 中评论 @sourcery-ai dismiss,以忽略所有已有的 Sourcery 审查。尤其适用于你希望从一次全新的审查开始时——记得随后评论 @sourcery-ai review 触发新的审查!

自定义你的体验

访问你的 控制面板 以:

  • 启用或禁用审查功能,例如 Sourcery 自动生成的 Pull Request 摘要、审阅者指南等。
  • 更改审查语言。
  • 添加、移除或编辑自定义审查说明。
  • 调整其他审查设置。

获取帮助

Original review guide in English

Reviewer's Guide

Introduce a scoped memory & visibility model (personal/group/conversation) for chat memories, make storage/recall/listing logic scope-aware with backwards compatibility, enrich memory metadata and formatting, and refactor prompts/command handling plus config & CI updates for v0.3.0.

Sequence diagram for scope-aware memory recall in group chat

sequenceDiagram
    actor User
    participant AstrBot
    participant SimpleLongMemoryPlugin as PluginMain
    participant MemoryManager
    participant VecDB

    User->>AstrBot: Send message in group chat
    AstrBot->>PluginMain: on_message(event)
    PluginMain->>MemoryManager: recall_memories(event, query, domain, top_k, all_users=false, memory_scope=None)

    activate MemoryManager
    MemoryManager->>MemoryManager: _build_recall_filters(event, global_memory, domain, memory_scope=None)
    Note over MemoryManager: Determine current_user_id
    MemoryManager->>MemoryManager: _scope_filter(scope=personal)
    MemoryManager->>MemoryManager: _legacy_personal_filter()
    MemoryManager->>MemoryManager: _scope_filter(scope=group)
    MemoryManager->>MemoryManager: _scope_filter(scope=conversation)

    loop For each filter in filters_list
        MemoryManager->>VecDB: retrieve(query, top_k, filters)
        VecDB-->>MemoryManager: results[] with metadata
        MemoryManager->>MemoryManager: _retrieve_with_filter(..., legacy_personal, owner_user_id, require_owner_list)
        MemoryManager->>MemoryManager: _is_visible_personal_memory(metadata, owner_user_id, require_owner_list)
        MemoryManager-->>MemoryManager: visible_memories_for_filter
    end

    MemoryManager->>MemoryManager: _dedupe_memories(all_visible_memories)
    MemoryManager-->>PluginMain: memories (personal + group + conversation)
    deactivate MemoryManager

    PluginMain->>PluginMain: format_memory_for_injection(memories)
    PluginMain-->>AstrBot: augmented prompt with grouped memories
    AstrBot-->>User: LLM reply with injected context
Loading

Sequence diagram for LLM-based memory extraction with scope and ownership

sequenceDiagram
    actor User
    participant AstrBot
    participant SimpleLongMemoryPlugin as PluginMain
    participant MemoryManager
    participant LLM

    User->>AstrBot: Chat message
    AstrBot->>PluginMain: on_llm_response(event, response)

    PluginMain->>PluginMain: _accumulate_request_snapshot(event, request)
    PluginMain->>PluginMain: _build_conversation_from_snapshots(event)
    PluginMain->>LLM: MEMORY_EXTRACTION_PROMPT with platform_id, session_type, session_id, sender_id, conversation
    LLM-->>PluginMain: extraction_result JSON

    PluginMain->>PluginMain: _parse_extracted_memories(extraction_result, session_type)
    Note over PluginMain: For each item
    PluginMain->>PluginMain: _normalize_extracted_scope(scope, session_type)
    PluginMain->>PluginMain: _normalize_subject_ids(subject or subjects)
    PluginMain->>PluginMain: _sanitize_string_list(entities, topics)

    loop For each validated memory
        PluginMain->>PluginMain: choose subject and subjects
        PluginMain->>PluginMain: owner_sender_ids = subjects if scope==personal else []
        PluginMain->>MemoryManager: store_memory(event, content, domain, memory_type, disclosure, importance, memory_scope=scope, subject=subject, entities=entities, topics=topics, owner_sender_ids=owner_sender_ids)
        activate MemoryManager
        MemoryManager->>MemoryManager: _event_scope_ids(event, owner_sender_ids[0] or sender)
        MemoryManager->>MemoryManager: _build_owner_user_ids(platform_id, owner_sender_ids)
        MemoryManager->>MemoryManager: _build_memory_metadata(...)
        MemoryManager->>VecDB: add_documents with metadata(memory_scope, owner_user_ids, owner_session_id, visibility, speaker_id, subject, entities, topics, memory_content)
        deactivate MemoryManager
    end

    PluginMain-->>AstrBot: extraction finished
    AstrBot-->>User: continues conversation
Loading

Class diagram for scoped memory model and metadata

classDiagram
    class MemoryDomain {
        <<enumeration>>
        USER_PROFILE : str
        PREFERENCES : str
        FACTS : str
        EVENTS : str
        CONTEXT : str
    }

    class MemoryScope {
        <<enumeration>>
        PERSONAL : str
        GROUP : str
        CONVERSATION : str
    }

    class MemoryVisibility {
        <<enumeration>>
        PRIVATE : str
        GROUP : str
    }

    class MemoryMetadata {
        +str uri
        +str domain
        +str user_id
        +str platform_id
        +str sender_id
        +str umo
        +str session_type
        +str session_id
        +str created_at
        +str last_recalled_at
        +int recall_count
        +int importance
        +bool compressed
        +str memory_scope
        +str owner_user_id
        +list~str~ owner_user_ids
        +str owner_session_id
        +str visibility
        +str speaker_id
        +str subject
        +list~str~ entities
        +list~str~ topics
        +str memory_content
        +str impression
        +str migrated_from
        +str migrated_to
        +to_dict() dict~str, Any~
        +from_dict(data dict~str, Any~) MemoryMetadata
    }

    class MemoryManager {
        +dict config
        +store_memory(event AstrMessageEvent, content str, domain str, memory_type str, disclosure str, importance int, memory_scope str, visibility str, subject str, entities list~str~, topics list~str~, owner_sender_id str, owner_sender_ids list~str~) str
        +recall_memories(event AstrMessageEvent, query str, domain str, top_k int, all_users bool, memory_scope str) list~dict~
        +list_memories(event AstrMessageEvent, domain str, page int, page_size int, all_users bool) tuple
        +get_memory_by_uri(event AstrMessageEvent, uri str) dict~str, Any~
        +forget_memory(event AstrMessageEvent, uri str) bool
        -_event_scope_ids(event AstrMessageEvent, owner_sender_id str) tuple~UMOInfo, str, str~
        -_build_owner_user_ids(platform_id str, owner_sender_ids list~str~) list~str~
        -_scope_filter(event AstrMessageEvent, memory_scope str, global_memory bool) dict~str, Any~
        -_legacy_personal_filter(event AstrMessageEvent, global_memory bool) dict~str, Any~
        -_build_recall_filters(event AstrMessageEvent, global_memory bool, domain str, memory_scope str) list~tuple~dict~str, Any~, bool, str, bool~~
        -_retrieve_with_filter(query str, top_k int, filters dict~str, Any~, legacy_personal bool, owner_user_id str, require_owner_list bool) list~dict~
        -_is_visible_personal_memory(metadata dict~str, Any~, owner_user_id str, require_owner_list bool) bool
        -_dedupe_memories(memories list~dict~) list~dict~
        -_list_visible_user_documents(event AstrMessageEvent, domain str, page int, page_size int) list~dict~
        -_memory_list_scan_limit(page int, page_size int) int
    }

    class MemoryProtocolUtils {
        +normalize_memory_scope(scope str) str
        +build_user_id(platform_id str, sender_id str) str
        +build_session_id(platform_id str, session_id str) str
        +format_memory_content(content str, metadata MemoryMetadata) str
        +format_memory_for_injection(memories list~dict~, max_length int) str
        +format_memory_for_user(memories list~dict~, page int, page_size int, total int) str
    }

    MemoryMetadata --> MemoryScope : uses
    MemoryMetadata --> MemoryVisibility : uses
    MemoryMetadata --> MemoryDomain : uses

    MemoryManager --> MemoryMetadata : stores
    MemoryManager --> MemoryScope : filters_by
    MemoryManager --> MemoryVisibility : checks
    MemoryManager --> MemoryProtocolUtils : calls

    MemoryProtocolUtils --> MemoryMetadata : constructs
    MemoryProtocolUtils --> MemoryScope : reads
    MemoryProtocolUtils --> MemoryDomain : formats
    MemoryProtocolUtils --> MemoryVisibility : formats
Loading

File-Level Changes

Change Details Files
Add scoped memory & visibility model with enriched metadata and formatting.
  • Define MemoryScope, MemoryVisibility, MemoryDomain enums and helper utilities for scope normalization, session_id construction, list normalization, and richer MemoryMetadata with new fields (scope, owners, visibility, subject, entities, topics, memory_content).
  • Adjust metadata construction and pending write flushing to populate new scope/ownership/visibility fields, default legacy memories to personal scope, and persist additional structured fields.
  • Update formatting helpers to render structured memory text: scope/domain/visibility/subject/owners/entities/topics/importance, group injected memories by scope, and show scope in user-facing memory lists with improved content extraction.
memory_protocol.py
memory_manager.py
Make recall, listing, and visibility logic scope-aware for personal/group/conversation and legacy memories.
  • Introduce scope- and session-aware filters (_scope_filter, _legacy_personal_filter, _build_recall_filters) that combine personal, group, and conversation memories depending on session type and config, with an all_users mode bypassing per-user isolation.
  • Add _retrieve_with_filter and _is_visible_personal_memory to post-filter legacy or group-shared personal memories based on owner_user_id/owner_user_ids and visibility, and deduplicate recall results by uri/text.
  • Change list_memories to use a visibility-aware listing path in non-all_users mode, including group-visible personal memories, deduplicated and bounded by a configurable scan limit.
memory_manager.py
Enhance memory extraction to be scope-aware and subject/owner aware, leveraging new prompts and conversation snapshots.
  • Track sender_id in request snapshots and include sender labels when rebuilding conversation history, so extraction sees per-sender lines.
  • Extend MEMORY_EXTRACTION_PROMPT with platform/session metadata plus scope/subject/subjects/entities/topics fields and explicit rules for group vs personal/conversation scope selection.
  • Parse LLM extraction output with scope/subject/subjects/entities/topics, normalize by session type, drop invalid group-personal cases, and on storage compute subject(s), owner_sender_ids, and scope-specific subject defaults for personal/group/conversation memories.
main.py
prompts.py
Improve recall query optimization robustness and configuration plus general sanitization helpers.
  • Introduce configurable timeout (optimize_recall_query_timeout) for recall query optimization, clamping to a safe range and wrapping llm_generate in asyncio.wait_for with timeout handling.
  • Add helpers for sanitizing string lists, clamping timeouts, and normalizing extracted scopes/subjects/subject_ids and current speaker subjects for different scopes.
main.py
README.md
_conf_schema.json
Versioning, documentation, and CI/config updates for the new behavior.
  • Update README with new installation instructions, configuration options (max_memory_list_scan, optimize_recall_query_timeout), and a description of group chat behavior and scopes.
  • Document v0.3.0 changes in CHANGELOG including scope/visibility model, metadata fields, listing behavior, and configuration additions.
  • Bump plugin metadata version to v0.3.0 and add/adjust CI workflow for code quality checks (ruff, formatting, syntax, metadata).
README.md
CHANGELOG.md
metadata.yaml
.github/workflows/code-quality.yml
_conf_schema.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些整体反馈:

  • _list_visible_user_documents 中,你对个人和群组查询都硬编码了 limit=10000;建议将其做成可配置项,或者从 page_size 推导出来,以避免在大数据集上出现潜在的性能或内存问题。
  • 可见性值("private", "group")在多个地方以原始字符串的形式使用(例如 normalize_visibility, _build_recall_filters, _list_visible_user_documents);引入一个小型的 Visibility 枚举或类似 MemoryScope 的常量,可以降低拼写错误的风险并提升可读性。
提供给 AI Agent 的提示词
Please address the comments from this code review:

## Overall Comments
-`_list_visible_user_documents` 中,你对个人和群组查询都硬编码了 `limit=10000`;建议将其做成可配置项,或者从 `page_size` 推导出来,以避免在大数据集上出现潜在的性能或内存问题。
- 可见性值(`"private"`, `"group"`)在多个地方以原始字符串的形式使用(例如 `normalize_visibility`, `_build_recall_filters`, `_list_visible_user_documents`);引入一个小型的 Visibility 枚举或类似 `MemoryScope` 的常量,可以降低拼写错误的风险并提升可读性。

## Individual Comments

### Comment 1
<location path="main.py" line_range="653-661" />
<code_context>
+                    DEFAULT_RECALL_QUERY_OPTIMIZATION_TIMEOUT,
+                )
+            )
+            llm_response = await asyncio.wait_for(
+                self.context.llm_generate(
+                    provider_id=provider_id,
</code_context>
<issue_to_address>
**issue (bug_risk):** 超时处理可能有问题,因为 `TimeoutError` 不会捕获 `asyncio.wait_for` 抛出的超时异常。

由于这里使用了 `asyncio.wait_for(...)`,但只捕获内置的 `TimeoutError`,因此该 `except` 代码块实际上永远不会被触发,你会直接落入后面的通用 `Exception` 处理分支。如果没有导入 `TimeoutError`,第一次超时时还会抛出 `NameError`。请显式捕获 `asyncio.TimeoutError`(并确保已导入),或者如果你确实希望处理两种异常,则有意地同时处理这两种异常类型。
</issue_to_address>

Sourcery 对开源项目免费——如果你喜欢我们的代码审查,请考虑分享给他人 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的审查结果。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • In _list_visible_user_documents you hard-code limit=10000 for both personal and group queries; consider making this configurable or deriving it from page_size to avoid potential performance or memory issues on large datasets.
  • The visibility values ("private", "group") are used as raw strings in multiple places (e.g. normalize_visibility, _build_recall_filters, _list_visible_user_documents); introducing a small Visibility enum or constants similar to MemoryScope would reduce the risk of typos and improve readability.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_list_visible_user_documents` you hard-code `limit=10000` for both personal and group queries; consider making this configurable or deriving it from `page_size` to avoid potential performance or memory issues on large datasets.
- The visibility values (`"private"`, `"group"`) are used as raw strings in multiple places (e.g. `normalize_visibility`, `_build_recall_filters`, `_list_visible_user_documents`); introducing a small Visibility enum or constants similar to `MemoryScope` would reduce the risk of typos and improve readability.

## Individual Comments

### Comment 1
<location path="main.py" line_range="653-661" />
<code_context>
+                    DEFAULT_RECALL_QUERY_OPTIMIZATION_TIMEOUT,
+                )
+            )
+            llm_response = await asyncio.wait_for(
+                self.context.llm_generate(
+                    provider_id=provider_id,
</code_context>
<issue_to_address>
**issue (bug_risk):** Timeout handling is likely broken because `TimeoutError` won’t catch `asyncio.wait_for` timeouts.

Because this uses `asyncio.wait_for(...)` but catches the built-in `TimeoutError`, that `except` block will never run and you’ll fall through to the generic `Exception` handler instead. If `TimeoutError` isn’t imported, it will also raise `NameError` on the first timeout. Please catch `asyncio.TimeoutError` explicitly (and import it) or deliberately handle both exception types if that’s what you intend.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread main.py
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a three-layer memory scope model (personal, group, and conversation) to support group chat scenarios, along with enhanced metadata tracking for subjects, entities, and topics. Key updates include scope-aware recall logic, visibility controls, and a configurable timeout for retrieval optimization. Feedback highlights several improvement opportunities: ensuring the conversation scope is included in private chat recall, correcting visibility logic for group-shared memories, and parallelizing database retrievals using asyncio.gather to reduce latency. Additionally, it is recommended to move document filtering to the database level to prevent performance issues and to remove non-semantic metadata from vector text to enhance retrieval quality.

Comment thread memory_manager.py Outdated
Comment on lines +685 to +693
scopes = (
[normalize_memory_scope(memory_scope)]
if memory_scope
else [MemoryScope.PERSONAL]
)
if not memory_scope and parsed.session_type == "group":
scopes.extend([MemoryScope.GROUP, MemoryScope.CONVERSATION])
elif not memory_scope and not global_memory:
scopes.append(MemoryScope.CONVERSATION)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In private chats (parsed.session_type != "group"), if global_memory is enabled (default), the CONVERSATION scope is currently excluded from the default recall list. This means temporary conversation-specific context will be lost in private chats unless global_memory is disabled. CONVERSATION scope should likely be included in the default recall list for both group and private chats.

Suggested change
scopes = (
[normalize_memory_scope(memory_scope)]
if memory_scope
else [MemoryScope.PERSONAL]
)
if not memory_scope and parsed.session_type == "group":
scopes.extend([MemoryScope.GROUP, MemoryScope.CONVERSATION])
elif not memory_scope and not global_memory:
scopes.append(MemoryScope.CONVERSATION)
scopes = (
[normalize_memory_scope(memory_scope)]
if memory_scope
else [MemoryScope.PERSONAL]
)
if not memory_scope:
if parsed.session_type == "group":
scopes.extend([MemoryScope.GROUP, MemoryScope.CONVERSATION])
else:
scopes.append(MemoryScope.CONVERSATION)

Comment thread memory_manager.py Outdated
Comment on lines +760 to +774
def _is_visible_personal_memory(
self,
metadata: dict[str, Any],
owner_user_id: str | None,
require_owner_list: bool = False,
) -> bool:
scope = metadata.get("memory_scope")
if scope not in (None, "", MemoryScope.PERSONAL):
return False
owner_user_ids = metadata.get("owner_user_ids")
if isinstance(owner_user_ids, list) and owner_user_ids:
return owner_user_id in owner_user_ids if owner_user_id else True
if require_owner_list:
return False
return metadata.get("owner_user_id", metadata.get("user_id")) == owner_user_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The visibility logic in _is_visible_personal_memory appears to restrict group-shared memories (visibility: "group") to only those users listed in owner_user_ids. This contradicts the goal of making group memories visible to all members of the group. If a memory is marked with visibility: "group", it should be considered visible to the current user if the session filter (already applied at the DB level) matches.

    def _is_visible_personal_memory(
        self,
        metadata: dict[str, Any],
        owner_user_id: str | None,
        require_owner_list: bool = False,
    ) -> bool:
        scope = metadata.get("memory_scope")
        if scope not in (None, "", MemoryScope.PERSONAL):
            return False
        
        # If shared with group, it's visible to everyone in the session
        if metadata.get("visibility") == "group":
            return True
            
        owner_user_ids = metadata.get("owner_user_ids")
        if isinstance(owner_user_ids, list) and owner_user_ids:
            return owner_user_id in owner_user_ids if owner_user_id else True
        if require_owner_list:
            return False
        return metadata.get("owner_user_id", metadata.get("user_id")) == owner_user_id

Comment thread memory_manager.py Outdated
Comment on lines +659 to +670
results = []
for filters, legacy_personal, owner_user_id, require_owner_list in filters_list:
results.extend(
await self._retrieve_with_filter(
query,
top_k,
filters,
legacy_personal=legacy_personal,
owner_user_id=owner_user_id,
require_owner_list=require_owner_list,
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Executing multiple sequential vector database retrievals in a loop can significantly increase latency. Consider using asyncio.gather to parallelize these requests.

Suggested change
results = []
for filters, legacy_personal, owner_user_id, require_owner_list in filters_list:
results.extend(
await self._retrieve_with_filter(
query,
top_k,
filters,
legacy_personal=legacy_personal,
owner_user_id=owner_user_id,
require_owner_list=require_owner_list,
)
)
tasks = [
self._retrieve_with_filter(
query,
top_k,
filters,
legacy_personal=legacy_personal,
owner_user_id=owner_user_id,
require_owner_list=require_owner_list,
)
for filters, legacy_personal, owner_user_id, require_owner_list in filters_list
]
results_list = await asyncio.gather(*tasks)
results = [item for sublist in results_list for item in sublist]

Comment thread memory_manager.py Outdated
Comment on lines +980 to +983
docs = await self.vec_db.document_storage.get_documents(
metadata_filters=filters,
limit=10000,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Fetching up to 10,000 documents into memory just to perform pagination and visibility filtering in Python is inefficient and poses a performance risk as the database grows. If the underlying vector database supports complex metadata filters (like OR or IN), this logic should be moved to the database level. If not, consider a more reasonable limit or a streaming approach.

Comment thread memory_protocol.py Outdated
Comment on lines +266 to +282
lines = [
f"scope: {meta.memory_scope}",
f"domain: {domain_label}",
f"visibility: {meta.visibility}",
f"memory: {content}",
]
if meta.subject:
lines.append(f"subject: {meta.subject}")
if meta.owner_user_ids:
lines.append(f"owners: {', '.join(meta.owner_user_ids)}")
if meta.disclosure:
lines.append(f"recall_when: {meta.disclosure}")
if meta.entities:
lines.append(f"entities: {', '.join(meta.entities)}")
if meta.topics:
lines.append(f"topics: {', '.join(meta.topics)}")
lines.append(f"importance: {meta.importance}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Including non-semantic metadata like importance and visibility directly in the text content stored in the vector database can introduce noise and potentially degrade retrieval quality. These fields are better handled as metadata filters. Consider removing them from the formatted string while keeping them in the metadata dictionary.

Suggested change
lines = [
f"scope: {meta.memory_scope}",
f"domain: {domain_label}",
f"visibility: {meta.visibility}",
f"memory: {content}",
]
if meta.subject:
lines.append(f"subject: {meta.subject}")
if meta.owner_user_ids:
lines.append(f"owners: {', '.join(meta.owner_user_ids)}")
if meta.disclosure:
lines.append(f"recall_when: {meta.disclosure}")
if meta.entities:
lines.append(f"entities: {', '.join(meta.entities)}")
if meta.topics:
lines.append(f"topics: {', '.join(meta.topics)}")
lines.append(f"importance: {meta.importance}")
lines = [
f"scope: {meta.memory_scope}",
f"domain: {domain_label}",
f"memory: {content}",
]
if meta.subject:
lines.append(f"subject: {meta.subject}")
if meta.owner_user_ids:
lines.append(f"owners: {', '.join(meta.owner_user_ids)}")
if meta.disclosure:
lines.append(f"recall_when: {meta.disclosure}")
if meta.entities:
lines.append(f"entities: {', '.join(meta.entities)}")
if meta.topics:
lines.append(f"topics: {', '.join(meta.topics)}")

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 为 AstrBot 简单长期记忆插件引入群聊场景下的三层记忆作用域(personal/group/conversation)与可见性模型,并配套扩展元数据字段、召回过滤链路与注入/展示格式,同时更新版本与文档说明。

Changes:

  • 新增记忆作用域与可见性相关的元数据字段,并在存储/召回路径中做作用域感知过滤与去重。
  • 提取 prompt 增强:注入会话上下文信息,支持输出 scope/subjects/entities/topics 等结构化字段。
  • 配置与文档更新:新增检索优化超时配置、调整默认提取长度阈值、版本升级到 v0.3.0。

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
prompts.py 扩展记忆提取 prompt:增加会话作用域信息与结构化输出字段/规则。
metadata.yaml 插件版本升级至 v0.3.0
memory_protocol.py 新增作用域/域枚举与元数据字段,调整记忆内容格式化与注入展示分组。
memory_manager.py 增加作用域过滤、可见性判断、召回过滤链、去重与列表逻辑改造。
main.py 提取结果解析支持 scope/subjects/entities/topics;检索优化增加超时;对话快照记录 sender_id。
_conf_schema.json 移除 memory_domains,新增 optimize_recall_query_timeout 配置项。
README.md 文档补充群聊记忆作用域说明与新增配置项。
CHANGELOG.md 更新 v0.3.0 变更说明。
Comments suppressed due to low confidence (1)

memory_manager.py:1645

  • _flush_pending_writes() 的语义去重过滤器固定使用 user_id + memory_scope,会让 group/conversation 作用域的去重范围过窄:同一群里的 group 记忆如果由不同 user_id 写入,将无法互相去重,可能导致重建/迁移后群共享记忆重复膨胀。建议按 scope 选择过滤维度:personal 用 owner_user_id(或 user_id),group 用 owner_session_id,conversation 用 umo(必要时再加 owner_session_id),并始终包含 is_memory_record/deprecated 约束。
                filters: dict[str, Any] = {
                    "user_id": item["user_id"],
                    "memory_scope": item.get("memory_scope", MemoryScope.PERSONAL),
                    "is_memory_record": True,
                    "deprecated": False,
                }
                candidates = await write_kb.vec_db.retrieve(
                    query=content,
                    k=1,
                    metadata_filters=filters,
                )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread main.py Outdated
Comment on lines +828 to +841
subject=subject,
entities=entities,
topics=topics,
owner_sender_ids=owner_sender_ids,
Comment thread memory_manager.py Outdated
Comment on lines +987 to +999
group_filters = {
"memory_scope": MemoryScope.PERSONAL,
"owner_session_id": build_session_id(parsed.platform_id, parsed.session_id),
"visibility": "group",
"is_memory_record": True,
"deprecated": False,
}
if domain:
group_filters["domain"] = domain
group_docs = await self.vec_db.document_storage.get_documents(
metadata_filters=group_filters,
limit=10000,
)
Comment thread memory_manager.py Outdated
Comment on lines +1003 to +1012
for doc in [*docs, *group_docs]:
metadata = _safe_parse_metadata(doc.get("metadata", {}))
uri = metadata.get("uri") or doc.get("text", "")
if uri in seen:
continue
if self._is_visible_personal_memory(
metadata,
current_user_id,
require_owner_list=doc in group_docs,
):
- memory_protocol: 新增 MemoryVisibility 枚举(PRIVATE/GROUP),替换裸字符串
- memory_manager: 使用 MemoryVisibility 常量,新增 max_memory_list_scan 配置
  (限制群聊可见记忆扫描量)及 _memory_list_scan_limit() 计算逻辑
- main: 修复 TimeoutError -> asyncio.TimeoutError
- _conf_schema: 新增 max_memory_list_scan(默认 200,滑块 20-2000)
@piexian
Copy link
Copy Markdown
Owner Author

piexian commented May 3, 2026

@sourcery-ai review

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并且留下了一些整体层面的反馈:

  • 目前有两个功能非常相似的字符串列表清理 helper(memory_protocol.py 中的 _normalize_string_listmain.py 中的 _sanitize_string_list);建议将它们合并成一个共享工具函数,以避免后续行为在细节上逐渐出现差异。
  • 围绕 _build_recall_filters / _scope_filter / _legacy_personal_filter 构建召回过滤器的逻辑已经变得相当复杂;可以考虑把可见性/owner 选择相关的一些规则抽取成更小、命名清晰的 helper,从而让整体检索逻辑更易于理解和维护。
给 AI Agent 的提示词
Please address the comments from this code review:

## Overall Comments
- There are two very similar helpers for cleaning string lists (`_normalize_string_list` in `memory_protocol.py` and `_sanitize_string_list` in `main.py`); consider consolidating them into a shared utility to avoid subtle divergence in behavior over time.
- The recall filter construction around `_build_recall_filters` / `_scope_filter` / `_legacy_personal_filter` has become quite complex; extracting some of the visibility/owner selection rules into smaller, clearly named helpers would make the overall retrieval logic easier to reason about and maintain.

## Individual Comments

### Comment 1
<location path="memory_protocol.py" line_range="202-211" />
<code_context>
     def from_dict(cls, data: dict[str, Any]) -> MemoryMetadata:
         """从字典创建实例(自动忽略多余键、缺失键使用默认值)"""
         valid = {f.name for f in fields(cls)}
-        return cls(**{k: v for k, v in data.items() if k in valid})
+        values = {k: v for k, v in data.items() if k in valid}
+        values["memory_scope"] = normalize_memory_scope(values.get("memory_scope", ""))
+        values["owner_user_id"] = values.get("owner_user_id") or values.get(
+            "user_id", ""
+        )
+        values["owner_user_ids"] = _normalize_string_list(
+            values.get("owner_user_ids", [])
+        )
+        values["speaker_id"] = values.get("speaker_id") or values.get("sender_id", "")
+        values["entities"] = _normalize_string_list(values.get("entities", []))
+        values["topics"] = _normalize_string_list(values.get("topics", []))
+        return cls(**values)


</code_context>
<issue_to_address>
**suggestion (bug_risk):** Visibility 字段未被规范化,可能会导致意外值向下游传播。

在 `MemoryMetadata.from_dict` 中,你已经对多个字段做了规范化处理,但 `visibility` 仍然直接使用输入值。由于下游逻辑期望的是 `MemoryVisibility.PRIVATE | GROUP`,这里如果出现任意或拼写错误的字符串,就有可能在不被察觉的情况下绕过可见性检查。请对 `visibility` 进行规范化(例如通过一个 helper),并在值非法时默认回退为 `PRIVATE`。

建议实现:

```python
    @classmethod
    def from_dict(cls, data: dict[str, Any]) -> MemoryMetadata:
        """从字典创建实例(自动忽略多余键、缺失键使用默认值)"""
        valid = {f.name for f in fields(cls)}
        values = {k: v for k, v in data.items() if k in valid}
        values["memory_scope"] = normalize_memory_scope(values.get("memory_scope", ""))
        values["owner_user_id"] = values.get("owner_user_id") or values.get(
            "user_id", ""
        )
        values["owner_user_ids"] = _normalize_string_list(
            values.get("owner_user_ids", [])
        )
        values["speaker_id"] = values.get("speaker_id") or values.get("sender_id", "")
        values["entities"] = _normalize_string_list(values.get("entities", []))
        values["topics"] = _normalize_string_list(values.get("topics", []))
        # 将 visibility 规范化为受支持的枚举值,非法或空值回退为 PRIVATE
        values["visibility"] = normalize_visibility(
            values.get("visibility", MemoryVisibility.PRIVATE)
        )
        return cls(**values)

```

1.`memory_protocol.py` 中(靠近 `normalize_memory_scope` / `_normalize_string_list`)新增一个 `normalize_visibility` helper,建议逻辑如下:
   - 接受 `str | MemoryVisibility`(或 `Any`),返回一个字符串。
   - 如果值是 falsy,则返回 `MemoryVisibility.PRIVATE`- 如果该值已经等于允许的枚举值之一(`MemoryVisibility.PRIVATE``MemoryVisibility.GROUP`),则直接返回。
   - 如果是字符串,则通过 `.strip().lower()` 做规范化,并将已知别名(例如 `"private"``"group"`)映射到对应枚举值。
   - 其他任何情况(未知或格式错误)都默认回退到 `MemoryVisibility.PRIVATE`2. 确保在定义 `MemoryMetadata.from_dict` 的作用域中可以使用 `normalize_visibility`(大概率在同一模块内,不需要额外导入)。
3. 如果在其他位置也存在类似可见性语义的字段且依赖外部输入的原始字符串,也可以考虑统一使用 `normalize_visibility`,以提升整体一致性。
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得这个 Review 有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈不断改进 Review 质量。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • There are two very similar helpers for cleaning string lists (_normalize_string_list in memory_protocol.py and _sanitize_string_list in main.py); consider consolidating them into a shared utility to avoid subtle divergence in behavior over time.
  • The recall filter construction around _build_recall_filters / _scope_filter / _legacy_personal_filter has become quite complex; extracting some of the visibility/owner selection rules into smaller, clearly named helpers would make the overall retrieval logic easier to reason about and maintain.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There are two very similar helpers for cleaning string lists (`_normalize_string_list` in `memory_protocol.py` and `_sanitize_string_list` in `main.py`); consider consolidating them into a shared utility to avoid subtle divergence in behavior over time.
- The recall filter construction around `_build_recall_filters` / `_scope_filter` / `_legacy_personal_filter` has become quite complex; extracting some of the visibility/owner selection rules into smaller, clearly named helpers would make the overall retrieval logic easier to reason about and maintain.

## Individual Comments

### Comment 1
<location path="memory_protocol.py" line_range="202-211" />
<code_context>
     def from_dict(cls, data: dict[str, Any]) -> MemoryMetadata:
         """从字典创建实例(自动忽略多余键、缺失键使用默认值)"""
         valid = {f.name for f in fields(cls)}
-        return cls(**{k: v for k, v in data.items() if k in valid})
+        values = {k: v for k, v in data.items() if k in valid}
+        values["memory_scope"] = normalize_memory_scope(values.get("memory_scope", ""))
+        values["owner_user_id"] = values.get("owner_user_id") or values.get(
+            "user_id", ""
+        )
+        values["owner_user_ids"] = _normalize_string_list(
+            values.get("owner_user_ids", [])
+        )
+        values["speaker_id"] = values.get("speaker_id") or values.get("sender_id", "")
+        values["entities"] = _normalize_string_list(values.get("entities", []))
+        values["topics"] = _normalize_string_list(values.get("topics", []))
+        return cls(**values)


</code_context>
<issue_to_address>
**suggestion (bug_risk):** Visibility field is not normalized, which may let unexpected values propagate.

In `MemoryMetadata.from_dict` you normalize several fields, but `visibility` is still taken directly from the input. Since downstream logic expects `MemoryVisibility.PRIVATE | GROUP`, arbitrary or misspelled strings here can silently bypass visibility checks. Please normalize `visibility` (e.g., via a helper) and default to `PRIVATE` when the value is invalid.

Suggested implementation:

```python
    @classmethod
    def from_dict(cls, data: dict[str, Any]) -> MemoryMetadata:
        """从字典创建实例(自动忽略多余键、缺失键使用默认值)"""
        valid = {f.name for f in fields(cls)}
        values = {k: v for k, v in data.items() if k in valid}
        values["memory_scope"] = normalize_memory_scope(values.get("memory_scope", ""))
        values["owner_user_id"] = values.get("owner_user_id") or values.get(
            "user_id", ""
        )
        values["owner_user_ids"] = _normalize_string_list(
            values.get("owner_user_ids", [])
        )
        values["speaker_id"] = values.get("speaker_id") or values.get("sender_id", "")
        values["entities"] = _normalize_string_list(values.get("entities", []))
        values["topics"] = _normalize_string_list(values.get("topics", []))
        # 将 visibility 规范化为受支持的枚举值,非法或空值回退为 PRIVATE
        values["visibility"] = normalize_visibility(
            values.get("visibility", MemoryVisibility.PRIVATE)
        )
        return cls(**values)

```

1. Add a `normalize_visibility` helper in `memory_protocol.py` (near `normalize_memory_scope` / `_normalize_string_list`) with logic similar to:
   - Accept `str | MemoryVisibility` (or `Any`) and return a string.
   - If the value is falsy, return `MemoryVisibility.PRIVATE`.
   - If it's already equal to one of the allowed enum values (`MemoryVisibility.PRIVATE`, `MemoryVisibility.GROUP`), return it directly.
   - If it's a string, normalize via `.strip().lower()` and map known aliases (e.g. `"private"`, `"group"`) to the enum values.
   - For anything else (unknown or malformed), default to `MemoryVisibility.PRIVATE`.
2. Ensure `normalize_visibility` is imported or available in the scope where `MemoryMetadata.from_dict` is defined (most likely same module, no extra imports needed).
3. If there are additional visibility-like fields elsewhere that rely on raw strings from external input, consider updating them to use `normalize_visibility` as well for consistency.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread memory_protocol.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread main.py Outdated
Comment on lines +77 to +82
def _normalize_subject_ids(value: Any) -> list[str]:
raw_values = value if isinstance(value, list) else str(value).split(",")
subjects = []
for item in raw_values:
subject = _normalize_subject_id(_sanitize_memory_content(str(item))[:120])
if subject and subject not in {"current_sender", "group", "conversation"}:
Comment thread memory_protocol.py Outdated
Comment on lines +274 to +289
f"scope: {meta.memory_scope}",
f"domain: {domain_label}",
f"visibility: {meta.visibility}",
f"memory: {content}",
]
if meta.subject:
lines.append(f"subject: {meta.subject}")
if meta.owner_user_ids:
lines.append(f"owners: {', '.join(meta.owner_user_ids)}")
if meta.disclosure:
lines.append(f"recall_when: {meta.disclosure}")
if meta.entities:
lines.append(f"entities: {', '.join(meta.entities)}")
if meta.topics:
lines.append(f"topics: {', '.join(meta.topics)}")
lines.append(f"importance: {meta.importance}")
Comment thread memory_manager.py Outdated
Comment on lines +1000 to +1006
group_filters = {
"memory_scope": MemoryScope.PERSONAL,
"owner_session_id": build_session_id(parsed.platform_id, parsed.session_id),
"visibility": MemoryVisibility.GROUP,
"is_memory_record": True,
"deprecated": False,
}
Comment thread memory_manager.py Outdated
Comment on lines +957 to +962
docs = await self._list_visible_user_documents(
event, domain, page=page, page_size=page_size
)
total = len(docs)
offset = (page - 1) * page_size
docs = docs[offset : offset + page_size]
Comment thread memory_manager.py Outdated
Comment on lines +1016 to +1025
for doc in [*docs, *group_docs]:
metadata = _safe_parse_metadata(doc.get("metadata", {}))
uri = metadata.get("uri") or doc.get("text", "")
if uri in seen:
continue
if self._is_visible_personal_memory(
metadata,
current_user_id,
require_owner_list=doc in group_docs,
):
Comment thread memory_manager.py
Comment on lines 1656 to 1662
# 语义去重:召回相似记忆,高相似度则跳过
filters: dict[str, Any] = {
"user_id": item["user_id"],
"memory_scope": item.get("memory_scope", MemoryScope.PERSONAL),
"is_memory_record": True,
"deprecated": False,
}
- memory_manager: recall 和 list 改用 asyncio.gather 并行查询多个过滤器源
- memory_manager: 修复 _memory_list_scan_limit 逻辑(max 替代 min),
  确保能扫描到足够填充当前页的记录
- memory_manager: list_memories 新增 truncated 返回值,扫描被截断时提示用户
- memory_manager: 私聊也纳入 conversation 作用域,统一行为
- memory_manager: _is_visible_personal_memory 增加 GROUP visibility 快速路径
- memory_manager: 重建语义去重过滤器改用作用域专属键(owner_session_id/umo)
- main: _normalize_subject_ids 过滤 None/空值/"none" 字符串
- main: 统一 memory_type=MemoryType.NORMAL,与 domain 字段职责分离
- memory_protocol: 新增 normalize_visibility,from_dict 自动标准化 visibility
- memory_protocol: format_memory_content 精简输出字段
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread memory_manager.py
Comment on lines +592 to 599
visibility=visibility,
subject=subject,
entities=entities,
topics=topics,
memory_content=content,
owner_sender_ids=owner_sender_ids,
speaker_id=owner_sender_ids[0],
)
Comment thread memory_manager.py Outdated
Comment on lines +1081 to +1088
def _memory_list_scan_limit(self, page: int, page_size: int) -> int:
try:
configured = int(self.config.get("max_memory_list_scan", 200))
except (TypeError, ValueError):
configured = 200
configured = max(1, configured)
needed = max(1, page) * max(1, page_size)
return max(configured, needed)
Comment thread memory_protocol.py Outdated
Comment on lines +258 to +292
lines.append(f"entities: {', '.join(meta.entities)}")
if meta.topics:
lines.append(f"topics: {', '.join(meta.topics)}")
return "\n".join(lines)
Comment thread memory_manager.py
Comment on lines +697 to +701
if not memory_scope:
if parsed.session_type == "group":
scopes.extend([MemoryScope.GROUP, MemoryScope.CONVERSATION])
else:
scopes.append(MemoryScope.CONVERSATION)
Comment thread main.py Outdated
Comment on lines +591 to +597
subjects = _normalize_subject_ids(
item.get("subjects", item.get("subject", ""))
)
subject = subjects[0] if subjects else ""
if session_type == "group" and scope == MemoryScope.PERSONAL:
if not subjects:
continue
piexian added 2 commits May 3, 2026 19:57
- memory_manager: 移除 _legacy_personal_filter / _is_visible_personal_memory,
  不再在运行时兼容无 memory_scope 的旧记录;须通过 /memory rebuild 补齐字段
- memory_manager: _build_recall_filters 和 _retrieve_with_filter 简化签名,
  不再携带 legacy_personal/owner_user_id/require_owner_list 参数
- memory_manager: 新增 _normalize_rebuild_record_metadata(),
  重建时自动将旧记录规范化为 v0.3 metadata 结构(含 memory_scope/visibility 等)
- memory_manager: 新增 _delete_rebuild_source_records(),删除时按 kb_id 限定范围,
  同时处理 is_memory_record 和旧格式(仅 uri + kb_id)两类记录
- memory_manager: 重建分批拉取改用 collect_memory_records() 闭包,
  分别按 is_memory_record 和 deprecated=False 拉取,均限定 kb_id
- memory_manager: 重建完整性校验按 kb_id 计数
- main: _normalize_subject_ids 改为先取 subjects 字段,再回退 subject
- README/CHANGELOG: 补充升级说明(从旧版本升级需执行 /memory rebuild)
@piexian piexian merged commit 59ad9c4 into master May 3, 2026
4 checks passed
@piexian piexian deleted the feature/group-chat-memory-scope branch May 13, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants