From 420d27dd8cbc130039f63c79e9a454416c50a0e5 Mon Sep 17 00:00:00 2001 From: withchao <993506633@qq.com> Date: Mon, 15 Jun 2026 14:50:52 +0800 Subject: [PATCH 1/2] feat: add incremental version sync documentation for IM systems --- docs/blog/golang/architectural/6.md | 175 ++++++++++++++++++ .../current/golang/architectural/6.md | 175 ++++++++++++++++++ 2 files changed, 350 insertions(+) create mode 100644 docs/blog/golang/architectural/6.md create mode 100644 i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/6.md diff --git a/docs/blog/golang/architectural/6.md b/docs/blog/golang/architectural/6.md new file mode 100644 index 0000000000..44aa781db9 --- /dev/null +++ b/docs/blog/golang/architectural/6.md @@ -0,0 +1,175 @@ +--- +title: 增量版本同步 +hide_title: true +sidebar_position: 6 +--- + +# 增量版本同步能力介绍 + +在 IM 系统里,消息实时送达只是体验的一部分。真正影响用户日常使用的,还有会话列表是否及时更新、好友关系是否一致、群成员变化是否可靠同步、多端登录后数据是否能快速恢复。 + +随着企业组织规模扩大,用户的好友、群组、会话和群成员数据会持续增长。如果每次登录、切换设备或网络恢复时都重新拉取全量数据,客户端启动会变慢,移动端流量和耗电会增加,服务端也会承受大量重复请求。 + +服务端和 SDK 提供了一套增量版本同步机制。它的目标很明确:让客户端只同步真正变化的数据,同时在弱网、离线、多端和异常数据场景下仍然保持最终一致。 + +## 面向大规模 IM 数据的同步能力 + +增量版本同步适用于 IM 中最常见的列表型数据: + +- 会话列表 +- 好友列表 +- 加入的群列表 +- 群成员列表 + +这些数据有一个共同特点:总量可能很大,但单次数据量变化通常非常小。例如新增一个好友、修改一个群成员角色、群成员昵称变化、会话信息更新等。传统全量同步会把整个列表重新拉一遍,而增量版本同步只关注这次变化本身。 + +对用户来说,表现为登录更快、列表刷新更及时、弱网恢复更自然。对服务端来说,则意味着更少的无效流量、更低的数据库压力和更可控的系统负载。 + +## 服务端与 SDK 协同完成同步 + +这套能力不是单靠服务端或 SDK 独立完成,而是由两端协同实现。 + +服务端负责维护权威数据和版本变化。每当好友、群成员等数据发生新增、更新或删除,或会话信息发生更新时,服务端都会推进对应数据集合的版本,并记录这次变化。SDK 则在客户端本地保存上次同步到的版本状态,并根据服务端返回的变化内容更新本地数据库。 + +这种协同可以做到: + +- 客户端不需要反复全量拉取已有数据。 +- 服务端可以准确判断客户端是否落后。 +- 在线通知丢失后,客户端仍然可以通过版本差异补齐。 +- 本地数据异常时,可以自动回到完整校准流程。 + +简单来说,服务端知道“现在最新是什么”,SDK 知道“我本地同步到哪里了”。两边通过版本状态对齐,就能高效完成数据追赶。 + +## 只同步变化,不重复搬运数据 + +增量版本同步最大的产品价值,是减少重复数据传输。 + +当一个用户拥有大量群组和会话时,真正发生变化的可能只是其中一两个条目。客户端不需要为了这一点变化重新下载整个列表,而是通过新增、更新等变化内容完成本地合并;对于支持删除语义的数据,也可以同步删除结果。 + +例如: + +- 新增好友时,只同步新增的好友信息。 +- 好友资料变化时,只同步变化的好友条目。 +- 会话信息变化时,只同步变化的会话条目。 +- 群成员角色或昵称变化时,只同步对应成员变化。 +- 群成员排序变化时,只刷新必要的顺序信息。 + +这种方式对大型组织尤其重要。用户量越大、群越多、成员越多,增量同步带来的收益越明显。 + +## 登录和冷启动更快 + +IM 客户端启动时,用户最关心的是能不能尽快看到会话、好友和群组。全量同步会让启动流程被大量数据下载拖慢,尤其在移动网络、海外网络或企业私有化部署环境中更明显。 + +增量版本同步让 SDK 可以优先使用本地已有数据展示界面,再在后台检查服务端是否有新变化。大多数情况下,客户端只需要拉取少量变化即可完成刷新。 + +这带来的体验是: + +- 应用打开更快。 +- 列表展示更快。 +- 数据刷新更平滑。 +- 不会因为少量变化阻塞完整页面加载。 + +用户感知到的是“数据很快就出现了”,而不是每次都像重新安装后第一次同步那样等待。 + +## 弱网和离线恢复更可靠 + +移动端环境天然不稳定。用户可能切后台、断网、跨网络切换,也可能长时间离线后重新上线。如果同步机制只依赖在线通知,一旦通知丢失或乱序,客户端就可能漏掉变化。 + +增量版本同步不会把通知当作唯一依据。通知更像是一个提醒:告诉 SDK 可能有新版本需要检查。SDK 会根据本地保存的版本状态和服务端最新版本进行对比,判断是否可以直接应用变化,还是需要重新拉取缺失内容。 + +这意味着: + +- 漏掉一条通知不会导致数据永久不一致。 +- 离线很久后重新上线,也可以从上次版本继续追赶。 +- 网络抖动导致同步中断后,下次仍能恢复。 +- 客户端不会盲目信任过期或乱序通知。 + +对企业 IM 来说,这种能力非常关键。组织关系、群成员权限、会话状态都不能只依赖一次实时通知来保证正确。 + +## 多端数据保持一致 + +企业用户经常同时使用手机、桌面端、网页端等多个终端。不同终端上线时间不同、网络环境不同、本地缓存状态也不同。 + +增量版本同步为每个终端提供独立追赶服务端状态的能力。一个终端离线不会影响另一个终端;某个终端重新上线后,也不需要从头同步所有数据,而是根据自己本地保存的版本继续补齐。 + +这让多端体验更加稳定: + +- 手机端新增好友后,桌面端可以同步到变化。 +- 桌面端处理群成员变更后,移动端可以追上最新状态。 +- 某个端长时间离线后重新登录,也能恢复到服务端一致状态。 +- 不同端不会因为本地缓存差异而长期表现不一致。 + +## 自动识别异常并自我修复 + +增量同步不仅是性能优化,也是一套可靠性机制。 + +在真实环境中,客户端可能出现本地数据库损坏、缓存被清理、历史版本缺失、版本链变化等情况。服务端的历史变更日志也可能因为容量控制或清理策略无法无限保留。 + +在这些情况下,系统不会继续强行套用不完整的增量,而是会自动进入完整校准流程。 + +常见场景包括: + +- 客户端版本与服务端版本不连续。 +- 客户端保存的版本标识与服务端不匹配。 +- 服务端历史变更日志不足以补齐客户端缺口。 +- 客户端本地只有 ID,但缺少对应详情数据。 +- 列表顺序发生变化,需要重新校准顺序。 + +这套机制的价值在于:正常情况下轻量同步,异常情况下自动修复。它既追求效率,也守住数据正确性的底线。 + +## 支持按需补齐,减少等待 + +SDK 本地会保存同步对象的 ID 列表,这让客户端可以先按本地顺序进行分页展示。当某些详情数据缺失时,再按 ID 从服务端补齐。 + +这种方式特别适合群成员列表、好友列表等可能很长的数据。用户进入页面时,不需要一次性下载全部详情;翻页或访问到某段数据时,再补齐缺失项。 + +它带来的好处是: + +- 首屏加载更快。 +- 大列表分页更轻。 +- 本地缓存利用率更高。 +- 网络请求更贴近用户实际访问路径。 + +对移动端和 Web 端来说,这种体验会比一次性全量拉取更加自然。 + +## 服务端压力更可控 + +从服务端角度看,增量版本同步可以显著减少重复读取和重复传输。 + +服务端会维护每个同步集合的最新版本状态,并记录必要的变更日志。客户端请求同步时,服务端优先判断版本是否一致。如果一致,就不需要返回大量数据;如果落后,只返回必要变化;如果无法保证增量完整,再引导客户端全量校准。 + +这种模式让服务端可以更好地控制: + +- 数据库查询压力 +- 网络出口流量 +- 高峰期登录同步压力 +- 大群和大组织的数据同步成本 +- 历史变更日志的存储规模 + +对于私有化部署、企业大客户和高并发场景,这种可控性比单纯堆资源更有长期价值。 + +## 更适合企业级 IM 的长期演进 + +服务端和 SDK 是两个独立代码项目,也可以独立演进。增量版本同步在两端之间形成了一条稳定的能力边界。 + +服务端可以持续优化版本日志、缓存、清理策略和数据查询方式;SDK 可以持续优化本地数据库、同步调度、分页补齐和 UI 刷新体验。只要版本同步语义保持稳定,两端就能在各自方向上独立增强。 + +这对产品长期发展很重要。随着业务扩展,新的列表型数据也可以复用类似的同步思路,而不必每增加一种数据就重新设计一套同步机制。 + +## 产品价值总结 + +增量版本同步能力,解决的是 IM 系统中“数据越来越多,但同步必须越来越轻”的问题。 + +它带来的核心价值包括: + +- 更快的登录和启动体验。 +- 更少的流量和电量消耗。 +- 更稳定的弱网和离线恢复能力。 +- 更可靠的多端一致性。 +- 更可控的服务端资源消耗。 +- 更好的大群、大组织支持能力。 +- 异常情况下自动校准,避免长期数据不一致。 + +对于企业级 IM 产品来说,这不是一个单纯的技术优化点,而是一项直接影响用户体验、系统稳定性和规模化能力的基础功能。 + +通过服务端版本管理和 SDK 本地同步能力的配合,客户端在大多数情况下只同步变化,在必要时自动完整校准,从而在效率和可靠性之间取得平衡。这也是系统能够支撑多端、弱网、大规模组织关系和复杂群场景的重要基础。 diff --git a/i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/6.md b/i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/6.md new file mode 100644 index 0000000000..a24f0af2a5 --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/6.md @@ -0,0 +1,175 @@ +--- +title: Incremental Version Sync +hide_title: true +sidebar_position: 6 +--- + +# Incremental Version Sync Capability + +In an IM system, real-time message delivery is only one part of the user experience. Users also care about whether the conversation list is up to date, whether friend relationships remain consistent, whether group member changes are synchronized reliably, and whether data can recover quickly after logging in on multiple devices. + +As an organization grows, the amount of friend, group, conversation, and group member data keeps increasing. If the client has to pull all data again after every login, device switch, or network recovery, startup becomes slower, mobile data and battery usage increase, and the server has to process a large number of repeated requests. + +The server and SDK provide an incremental version sync mechanism. Its goal is straightforward: the client should synchronize only the data that actually changed, while still maintaining eventual consistency in weak network, offline, multi-device, and abnormal local data scenarios. + +## Sync Built for Large-Scale IM Data + +Incremental version sync applies to the most common list-style data in IM systems: + +- Conversation lists +- Friend lists +- Joined group lists +- Group member lists + +These data sets share one important pattern: the total amount of data can be large, but the amount of data changed in a single update is usually very small. Examples include adding a friend, changing a group member role, updating a group member nickname, or updating conversation information. Traditional full sync would pull the entire list again, while incremental version sync focuses only on the actual change. + +For users, this means faster login, more timely list updates, and smoother recovery in weak network conditions. For the server, it means less redundant traffic, lower database pressure, and more predictable system load. + +## Server and SDK Work Together + +This capability is not implemented by the server or SDK alone. It depends on coordinated behavior on both sides. + +The server maintains authoritative data and version changes. When friend or group member data is inserted, updated, or deleted, or when conversation information is updated, the server advances the version of the corresponding data set and records the change. The SDK stores the last synchronized version state locally and updates the local database according to the changes returned by the server. + +This coordination makes it possible to: + +- Avoid repeatedly pulling data that already exists locally. +- Let the server accurately determine whether the client is behind. +- Allow the client to recover missing changes through version differences after online notifications are lost. +- Automatically fall back to a full correction flow when local data is abnormal. + +In simple terms, the server knows "what is current", and the SDK knows "where the local client stopped". By aligning through version state, both sides can catch up efficiently. + +## Sync Changes Only, Avoid Repeated Data Transfer + +The biggest product value of incremental version sync is reducing repeated data transfer. + +When a user has many groups and conversations, the actual change may involve only one or two items. The client does not need to download the entire list again for such a small change. Instead, it merges local data based on inserted and updated content; for data types that support deletion semantics, delete results can also be synchronized. + +For example: + +- When a friend is added, only the new friend information is synchronized. +- When friend information changes, only the changed friend item is synchronized. +- When conversation information changes, only the changed conversation item is synchronized. +- When a group member role or nickname changes, only the corresponding member change is synchronized. +- When group member ordering changes, only the necessary ordering information is refreshed. + +This is especially important for large organizations. The more users, groups, and members there are, the more obvious the benefit of incremental sync becomes. + +## Faster Login and Cold Start + +When an IM client starts, users expect to see conversations, friends, and groups as quickly as possible. Full sync can slow down startup because it requires downloading a large amount of data, especially on mobile networks, cross-region networks, or private deployment environments. + +Incremental version sync allows the SDK to display existing local data first, then check in the background whether the server has new changes. In most cases, the client only needs to pull a small number of changes to complete the refresh. + +This improves the experience in several ways: + +- The app opens faster. +- Lists appear faster. +- Data refresh feels smoother. +- Small changes do not block the full page from loading. + +From the user's perspective, the data appears quickly instead of making every startup feel like the first sync after a fresh installation. + +## More Reliable Recovery in Weak Networks and Offline Scenarios + +Mobile environments are naturally unstable. Users may switch the app to the background, lose network access, move between networks, or come back online after being offline for a long time. If synchronization depends only on online notifications, a lost or out-of-order notification can cause the client to miss changes. + +Incremental version sync does not treat notifications as the only source of truth. A notification is more like a trigger: it tells the SDK that a new version may need to be checked. The SDK compares the locally stored version state with the latest server version, then decides whether it can apply the change directly or needs to pull missing data again. + +This means: + +- Losing one notification does not cause permanent data inconsistency. +- A client that has been offline for a long time can continue catching up from its last version. +- If synchronization is interrupted by network jitter, the next attempt can still recover. +- The client does not blindly trust expired or out-of-order notifications. + +For enterprise IM, this is critical. Organization relationships, group member permissions, and conversation state should not rely on a single real-time notification for correctness. + +## Consistent Data Across Multiple Devices + +Enterprise users often use mobile, desktop, and web clients at the same time. Different devices may come online at different times, use different networks, and have different local cache states. + +Incremental version sync gives each device an independent way to catch up with server state. One offline device does not affect another device. When a device comes back online, it does not need to synchronize everything from the beginning; it can continue from its locally stored version. + +This makes the multi-device experience more stable: + +- After a friend is added on mobile, the desktop client can synchronize the change. +- After group member changes are handled on desktop, the mobile client can catch up. +- A device that has been offline for a long time can recover to the server's current state after login. +- Different devices do not remain inconsistent because of local cache differences. + +## Detect Abnormal States and Self-Heal + +Incremental sync is not only a performance optimization. It is also a reliability mechanism. + +In real environments, the client may encounter local database corruption, cache cleanup, missing historical versions, or version chain changes. Server-side historical change logs may also be limited by capacity controls or cleanup policies. + +In these cases, the system does not force incomplete incremental changes onto the client. Instead, it automatically enters a full correction flow. + +Common scenarios include: + +- The client version is not continuous with the server version. +- The version identifier stored by the client does not match the server. +- Server-side historical change logs are not enough to fill the client's gap. +- The client has an ID locally but lacks the corresponding detail data. +- List ordering changes and needs to be corrected again. + +The value of this mechanism is clear: normal cases stay lightweight, and abnormal cases can repair themselves. It pursues efficiency while protecting the correctness baseline. + +## Fill Missing Details on Demand + +The SDK stores the ID list of synchronized objects locally. This allows the client to paginate and display data according to local ordering first. If some detail records are missing, the SDK can fetch them from the server by ID. + +This approach is especially useful for long group member lists and friend lists. When users enter a page, the client does not need to download every detail record at once. It can fill missing items only when the user reaches that part of the list. + +The benefits are: + +- Faster first-screen loading. +- Lighter pagination for large lists. +- Better use of local cache. +- Network requests that better match the user's actual access path. + +For mobile and web clients, this feels more natural than pulling everything at once. + +## More Predictable Server Load + +From the server side, incremental version sync significantly reduces repeated reads and repeated data transfer. + +The server maintains the latest version state for each synchronized data set and records the necessary change logs. When the client requests synchronization, the server first checks whether the versions are already aligned. If they are aligned, there is no need to return a large data payload. If the client is behind, the server returns only the necessary changes. If complete incremental sync cannot be guaranteed, the server guides the client into full correction. + +This model helps the server control: + +- Database query pressure +- Network egress traffic +- Login-time sync pressure during traffic peaks +- Sync cost for large groups and large organizations +- Storage size of historical change logs + +For private deployments, enterprise customers, and high-concurrency scenarios, this predictability has more long-term value than simply adding resources. + +## Better for Long-Term Enterprise IM Evolution + +The server and SDK are independent code projects and can evolve independently. Incremental version sync forms a stable capability boundary between the two sides. + +The server can continue optimizing version logs, caches, cleanup strategies, and data query methods. The SDK can continue optimizing local databases, sync scheduling, on-demand detail filling, and UI refresh behavior. As long as the version sync semantics remain stable, both sides can improve independently. + +This matters for long-term product growth. As new business features are added, other list-style data can reuse a similar sync model instead of designing a new synchronization mechanism for every data type. + +## Product Value Summary + +Incremental version sync solves a core IM system problem: data keeps growing, but synchronization must stay lightweight. + +Its core value includes: + +- Faster login and startup experience. +- Lower data and battery consumption. +- More stable recovery in weak network and offline scenarios. +- More reliable multi-device consistency. +- More predictable server resource usage. +- Better support for large groups and large organizations. +- Automatic correction in abnormal cases to avoid long-term inconsistency. + +For enterprise IM products, this is not just a technical optimization. It is a foundational capability that directly affects user experience, system stability, and scalability. + +By combining server-side version management with SDK-side local synchronization, the client synchronizes only changes in most cases and automatically performs full correction when necessary. This balances efficiency and reliability, and it is an important foundation for supporting multiple devices, weak networks, large-scale organization relationships, and complex group scenarios. From dad28a5d3eda8f27f84038126ca02aac362d96e0 Mon Sep 17 00:00:00 2001 From: withchao <993506633@qq.com> Date: Mon, 15 Jun 2026 17:38:00 +0800 Subject: [PATCH 2/2] feat: add Write Diffusion Sync capability documentation --- docs/blog/golang/architectural/7.md | 209 ++++++++++++++++++ .../current/golang/architectural/7.md | 209 ++++++++++++++++++ 2 files changed, 418 insertions(+) create mode 100644 docs/blog/golang/architectural/7.md create mode 100644 i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/7.md diff --git a/docs/blog/golang/architectural/7.md b/docs/blog/golang/architectural/7.md new file mode 100644 index 0000000000..822f18bc60 --- /dev/null +++ b/docs/blog/golang/architectural/7.md @@ -0,0 +1,209 @@ +--- +title: 写扩散同步 +hide_title: true +sidebar_position: 7 +--- + +# 写扩散同步能力介绍 + +> 说明:写扩散同步是高级版 3.8.6 新增能力,开源免费版本不支持。需要同时使用支持该能力的服务端和 SDK,才能获得完整的同步优化效果。 + +在 IM 系统里,会话同步是一条非常核心的链路。用户登录、断线重连、应用从后台回到前台、多端切换时,SDK 都需要确认本地会话状态是否和服务端一致。 + +传统的增量版本同步已经能避免很多全量拉取,但当用户会话数量很多、群消息活跃、未读数和会话排序频繁变化时,客户端仍然可能需要比较较多会话状态。对于大型组织、客服场景、运营账号、机器人账号以及多群活跃用户来说,这类同步成本会被进一步放大。 + +写扩散同步就是为了解决这个问题:在服务端写入消息或会话状态发生变化时,提前把“哪些用户的哪些会话发生变化”记录下来。客户端同步时,不再优先扫描和比较完整会话集合,而是直接领取自己需要处理的变化清单。 + +简单来说,过去更像是“客户端问服务端:我有哪些会话变了?”,现在变成“服务端在变化发生时就把结果记好,客户端上线后直接拿走”。这让同步数据量进一步减少,同步速度也更快。 + +这项能力依赖服务端写入侧的变更标记、写扩散同步服务、消息写入链路、长连接混合同步入口,以及 SDK 本地快速同步处理。它不是单个接口或单个配置项,而是一套服务端与 SDK 协同的同步方案。 + +## 它解决了什么问题 + +会话列表看起来只是一个列表,但它背后包含很多需要同步的状态: + +- 会话是否有新消息。 +- 当前会话最新消息序号。 +- 会话最新活跃时间。 +- 已读序号和未读数。 +- 会话是否置顶。 +- 会话所在分组或相关用户模块是否变化。 + +在普通账号上,这些同步压力可能不明显。但在以下场景中,成本会迅速上升: + +- 一个用户加入大量群。 +- 一个企业账号拥有大量会话。 +- 大群消息非常活跃。 +- 用户长时间离线后重新登录。 +- 多端频繁切换,多个终端都需要追赶状态。 +- 会话列表很长,但真正变化的只是少数几个会话。 + +如果每次都围绕完整会话集合做比较,很多计算和数据传输其实是重复的。写扩散同步的价值,就是把同步目标从“检查所有会话”收缩到“处理已经确认发生变化的会话”。 + +## 写扩散的核心思路 + +写扩散同步的核心是“变化发生时就标记”。 + +当服务端处理消息写入、通知写入、已读序号变化或用户会话关系变化时,会把变化扩散到用户维度的同步记录中。这个记录并不是完整会话数据,而是轻量的变化索引:某个用户的某个会话需要同步。 + +这样一来,SDK 下次同步时可以直接按用户读取变化索引。服务端再根据这些变化会话返回必要的会话同步状态。客户端只处理这批会话,不需要为了少量变化重新比较完整列表。 + +这是一种非常适合 IM 的优化方式。因为 IM 的会话总量可能很大,但单次真正变化的会话通常很少。写扩散同步正是利用了这个特点。 + +## 服务端侧能力 + +服务端在写扩散同步里承担“提前记录变化”的角色。 + +当消息进入存储链路后,服务端会识别消息所属会话和通知会话,并将这些会话作为同步变化写入写扩散链路。对于单聊、普通会话和适合写扩散的群会话,服务端可以将变化标记到相关用户名下。 + +对于群会话,服务端还会结合会话同步策略判断是否采用写扩散。小群或活跃度适中的群更适合写扩散,因为把变化提前标记给成员,后续每个成员同步时都可以直接拿到自己的变化清单。 + +服务端还提供了独立的写扩散同步服务,用来保存和读取用户维度的会话变化。它会维护每个用户的同步版本和变化集合,并在变化过多、版本不连续或同步状态不可靠时让客户端进入完整校准流程。 + +这样设计有几个明显好处: + +- 写入时就沉淀变化,减少后续读侧计算。 +- 同步时只返回变化会话,减少传输数据。 +- 每个用户都有自己的同步游标,适合多端独立追赶。 +- 异常情况下仍然可以回到完整校准,保证正确性。 + +## SDK 侧能力 + +SDK 在写扩散同步里承担“快速消费变化”的角色。 + +SDK 本地会保存会话同步状态,包括本地已经同步到的写扩散版本、会话同步元数据和待处理会话队列。当用户登录、重连或触发快速同步时,SDK 会通过混合同步链路一次性处理多类同步信息。 + +这条混合同步链路会尽量把多个同步动作合并在一次往返里完成,例如: + +- 拉取写扩散会话变化。 +- 同步用户模块变化。 +- 获取置顶会话信息。 +- 必要时进行快照校准。 + +当服务端返回写扩散变化后,SDK 会把变化会话写入本地待处理队列,再由后续消息同步和会话刷新流程消费。已经成功处理的会话会被确认,失败的会话会保留为待重试状态,避免因为网络中断或进程退出导致同步丢失。 + +这让 SDK 的同步行为更轻、更快,也更稳定。 + +## 与普通增量同步的区别 + +普通增量同步的重点是“按版本拉差异”。它已经比全量同步轻很多,但在会话规模很大时,仍然可能需要服务端根据版本、快照或本地状态做较多判断。 + +写扩散同步更进一步,把差异生成的时机提前到了写入阶段。 + +可以这样理解: + +- 普通增量同步:读的时候计算或查询差异。 +- 写扩散同步:写的时候记录差异,读的时候直接领取。 + +这个变化带来的收益很直接。对于客户端来说,同步请求返回的数据更少;对于服务端来说,登录和重连时的读侧压力更低;对于用户来说,会话刷新速度更快。 + +## 混合同步:不只是一条写扩散 + +这套会话同步并不是只依赖写扩散。它采用的是混合同步思路:能走写扩散就优先走写扩散,写扩散不够时再使用快照 hash、分页校准和完整校准作为兜底。 + +这套组合可以覆盖更多真实场景: + +- 写扩散有连续版本时,直接同步少量变化会话。 +- 本地会话状态可能偏离时,用快照校准恢复一致。 +- 会话数量很大时,用分段 hash 判断哪一段发生变化,只拉变化区域。 +- 新安装或本地数据异常时,走完整校准重新建立本地状态。 + +因此,写扩散同步并不是牺牲正确性换速度。它是在正常场景下尽量少同步,在异常场景下仍然能够回到可靠的恢复路径。 + +## 大群场景下的策略切换 + +写扩散并不适合所有群会话。 + +对于小群或中等规模群,把变化写给每个成员是划算的。成员数量有限,服务端写入时多做一点标记,换来的是每个成员后续同步都更快。 + +但对于特别大的群,如果每条消息都向所有成员写扩散,写入成本可能过高。这时系统会根据群规模、近期活跃度、消息量等因素切换同步策略。 + +大体上可以理解为: + +- 适合写扩散的会话:写入时标记到用户,用户同步时直接拿变化。 +- 超大或高活跃会话:减少写入侧扩散压力,更多依赖读侧校准和快照能力。 +- 活跃度下降后:可以再回到写扩散模式,让后续同步更轻。 + +这种策略切换让系统可以在“写入成本”和“同步速度”之间动态平衡,而不是对所有群采用同一种固定方案。 + +## 为什么同步数据量会进一步减少 + +写扩散同步减少数据量的原因主要有三点。 + +第一,变化范围更小。 +客户端不再围绕完整会话集合做同步,而是优先处理服务端已经标记好的变化会话。一个用户有几千个会话,但本轮只变了几个会话时,同步数据量可以明显下降。 + +第二,变化内容更轻。 +写扩散记录本身是轻量索引,真正返回给 SDK 的也是必要的会话同步状态,而不是完整会话列表或大量无变化数据。 + +第三,兜底更精细。 +即使需要校准,也不一定全量拉取。SDK 可以通过分段 hash 判断本地和服务端哪些区域一致,只有不一致的区域才需要进一步拉取。 + +这三个因素叠加后,登录、重连、前后台切换、多端追赶等场景都会更轻。 + +## 对用户体验的提升 + +写扩散同步最终改善的是用户能感知到的体验。 + +用户打开应用时,会话列表可以更快恢复到最新状态。网络不稳定时,SDK 不需要反复拉取大量会话数据。长时间离线后重新上线,客户端也可以优先追赶实际变化的会话,而不是对所有会话重新做一轮重比较。 + +对于大型企业用户,这种体验尤其明显: + +- 登录后会话列表刷新更快。 +- 未读数和最新消息状态更新更及时。 +- 多端切换后状态追赶更平滑。 +- 大量会话账号不容易在同步阶段卡顿。 +- 弱网下不必要的数据传输更少。 + +## 对服务端的价值 + +写扩散同步不仅优化客户端,也优化服务端。 + +传统同步压力往往集中在用户上线、重连、批量登录或网络恢复时。大量客户端同时询问“我有哪些会话变了”,服务端需要做比较、查询和组装结果。 + +写扩散把一部分工作前移到写入阶段,让读侧同步请求变得更简单。服务端可以直接根据用户维度的变化记录返回结果,从而降低同步高峰时的数据库和计算压力。 + +同时,服务端通过同步策略控制哪些会话适合写扩散,哪些会话更适合读侧校准,避免在超大群场景下把写入链路压得过重。 + +## 异常恢复能力 + +写扩散同步必须保证一件事:快,但不能丢正确性。 + +因此,这套能力保留了多层兜底: + +- 版本不连续时,进入完整校准。 +- 写扩散变化过多时,回退到快照同步。 +- 本地会话状态不可信时,使用快照校准修正。 +- 同步过程中失败的会话会进入待重试队列。 +- 下游处理成功后才确认状态,避免中途失败导致数据丢失。 + +这让写扩散同步具备自我修复能力。它不是只在理想网络下有效,而是能适应移动端常见的断网、重启、切后台和多端并发使用。 + +## 适用场景 + +写扩散同步尤其适合以下场景: + +- 企业组织会话数量多。 +- 用户加入大量群。 +- 账号经常多端登录。 +- 会话列表很长,但单次变化很少。 +- 客服、运营、机器人等账号需要快速追赶会话状态。 +- 私有化部署中需要降低登录和重连同步压力。 +- 对弱网恢复速度和会话一致性要求较高。 + +如果业务规模较小,普通增量同步已经可以满足大多数需求。但当会话规模、群规模和多端使用频率上来后,写扩散同步的收益会更加明显。 + +## 总结 + +写扩散同步是在原有增量版本同步基础上的进一步升级。它把“差异发现”从读侧前移到写侧,让服务端在消息和会话变化发生时就记录用户维度的变化索引,SDK 同步时直接领取并处理。 + +它的核心价值包括: + +- 进一步减少同步数据量。 +- 加快登录、重连和前后台恢复时的会话同步速度。 +- 降低服务端同步高峰压力。 +- 支持大规模会话和多端追赶。 +- 对异常场景保留快照、分页校准和完整校准兜底。 +- 通过策略切换兼顾小群写扩散效率和大群写入成本。 + +对于需要支撑大组织、大量会话、多端登录和更快同步体验的企业级场景,这项能力可以显著提升会话同步效率和系统稳定性。 diff --git a/i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/7.md b/i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/7.md new file mode 100644 index 0000000000..5b260ca7e1 --- /dev/null +++ b/i18n/en/docusaurus-plugin-content-docs-blog/current/golang/architectural/7.md @@ -0,0 +1,209 @@ +--- +title: Write Diffusion Sync +hide_title: true +sidebar_position: 7 +--- + +# Write Diffusion Sync Capability + +> Note: Write diffusion sync is a new capability in Advanced Edition 3.8.6. It is not supported in the open-source free edition. The server and SDK must both support this capability to achieve the complete end-to-end sync optimization. + +In an IM system, conversation sync is a core part of the user experience. When users log in, reconnect, return from the background, or switch between devices, the SDK needs to confirm that the local conversation state is consistent with the server. + +Traditional incremental version sync already avoids many full pulls, but when a user has many conversations, active groups, frequently changing unread counts, and constantly changing conversation ordering, the client may still need to compare a large amount of conversation state. For large organizations, customer service scenarios, operation accounts, bot accounts, and users active in many groups, this sync cost can become even more noticeable. + +Write diffusion sync is designed to solve this problem. When messages are written or conversation state changes on the server, the system records in advance which conversations changed for which users. During sync, the client no longer needs to start by scanning and comparing the full conversation set. Instead, it directly consumes the list of changes that it needs to handle. + +In simple terms, the old model was closer to "the client asks the server which conversations changed." The new model becomes "the server records the result when the change happens, and the client picks it up after coming online." This further reduces the amount of sync data and makes synchronization faster. + +This capability depends on coordinated work across the server write path, change marking, the write diffusion sync service, the message write flow, the hybrid sync entry over the persistent connection, and local fast sync processing in the SDK. It is not a single API or configuration item, but a full sync solution implemented jointly by the server and SDK. + +## What It Solves + +A conversation list looks like a simple list, but it contains many states that need to stay synchronized: + +- Whether the conversation has new messages. +- The latest message sequence in the conversation. +- The latest active time of the conversation. +- Read sequence and unread count. +- Whether the conversation is pinned. +- Whether conversation groups or related user modules changed. + +For ordinary accounts, this pressure may not be obvious. In the following scenarios, however, the cost grows quickly: + +- A user joins a large number of groups. +- An enterprise account owns many conversations. +- Large groups are highly active. +- A user logs in again after being offline for a long time. +- Multiple devices switch frequently, and every device needs to catch up. +- The conversation list is long, but only a few conversations actually changed. + +If every sync still revolves around comparing the complete conversation set, much of the computation and data transfer is repeated work. The value of write diffusion sync is to shrink the sync target from "check all conversations" to "process conversations that are already known to have changed." + +## Core Idea + +The core idea of write diffusion sync is to mark changes when they happen. + +When the server processes message writes, notification writes, read sequence changes, or user-conversation relationship changes, it diffuses those changes into user-level sync records. These records are not full conversation data. They are lightweight change indexes that indicate a specific conversation needs to be synchronized for a specific user. + +As a result, the SDK can read the change index directly by user during the next sync. The server then returns the necessary conversation sync state based on those changed conversations. The client only processes this set of conversations and does not need to compare the full list again for a small number of changes. + +This optimization fits IM systems very well. The total number of conversations can be large, while the number of conversations that actually change in a single sync is usually small. Write diffusion sync takes advantage of exactly that pattern. + +## Server-Side Capability + +In write diffusion sync, the server is responsible for recording changes in advance. + +After a message enters the storage path, the server identifies the message conversation and notification conversation, then writes those conversations into the write diffusion path as sync changes. For one-to-one conversations, ordinary conversations, and group conversations suitable for write diffusion, the server can mark the changes under the related users. + +For group conversations, the server also evaluates the conversation sync policy to decide whether write diffusion should be used. Small groups or groups with moderate activity are well suited for this model because the server can mark changes for members at write time, and each member can later receive a direct change list during sync. + +The server also provides an independent write diffusion sync service to store and read user-level conversation changes. It maintains each user's sync version and change set, and when there are too many changes, a discontinuous version, or an unreliable sync state, it lets the client enter a full correction flow. + +This design brings several clear benefits: + +- Changes are captured at write time, reducing later read-side computation. +- Sync returns only changed conversations, reducing data transfer. +- Each user has an independent sync cursor, which fits multi-device catch-up. +- Abnormal cases can still fall back to full correction to preserve correctness. + +## SDK-Side Capability + +In write diffusion sync, the SDK is responsible for consuming changes quickly. + +The SDK stores local conversation sync state, including the write diffusion version that has already been synchronized, conversation sync metadata, and the pending conversation queue. When the user logs in, reconnects, or triggers fast sync, the SDK uses a hybrid sync path to process multiple types of sync information in one flow. + +This hybrid sync path tries to combine multiple sync actions into a single round trip, such as: + +- Pulling write diffusion conversation changes. +- Synchronizing user module changes. +- Fetching pinned conversation information. +- Performing snapshot correction when needed. + +After the server returns write diffusion changes, the SDK writes the changed conversations into the local pending queue. Later message sync and conversation refresh flows consume that queue. Conversations that have been processed successfully are acknowledged, while failed conversations remain in a retry state so that network interruption or process exit does not cause sync loss. + +This makes SDK sync lighter, faster, and more stable. + +## Difference From Ordinary Incremental Sync + +Ordinary incremental sync focuses on pulling differences by version. It is already much lighter than full sync, but when the conversation scale becomes large, the server may still need to make more decisions based on versions, snapshots, or local state. + +Write diffusion sync goes one step further by moving difference generation to the write stage. + +One way to understand it is: + +- Ordinary incremental sync: calculate or query differences during reads. +- Write diffusion sync: record differences during writes and consume them directly during reads. + +The benefit is direct. For the client, sync responses contain less data. For the server, read-side pressure during login and reconnect is lower. For users, conversation refresh becomes faster. + +## Hybrid Sync: More Than Write Diffusion + +This conversation sync approach does not rely only on write diffusion. It uses a hybrid sync strategy: use write diffusion first when possible, and fall back to snapshot hash, paged correction, and full correction when write diffusion is not enough. + +This combination covers more real-world scenarios: + +- When write diffusion versions are continuous, synchronize a small number of changed conversations directly. +- When local conversation state may have drifted, use snapshot correction to restore consistency. +- When the conversation count is large, use segmented hash comparison to identify changed ranges and pull only those ranges. +- For fresh installs or abnormal local data, use full correction to rebuild local state. + +Therefore, write diffusion sync does not trade correctness for speed. It minimizes sync work in normal cases while preserving reliable recovery paths in abnormal cases. + +## Strategy Switching for Large Groups + +Write diffusion is not suitable for every group conversation. + +For small and medium-sized groups, writing changes to each member is worthwhile. The member count is limited, and doing a little more work during writes makes later sync faster for every member. + +For very large groups, however, diffusing every message to every member may create too much write-side cost. In these cases, the system can switch sync strategies according to group size, recent activity, and message volume. + +At a high level: + +- Conversations suitable for write diffusion: mark changes under users at write time, then let users pull changes directly during sync. +- Very large or highly active conversations: reduce write-side diffusion pressure and rely more on read-side correction and snapshot capabilities. +- After activity decreases: return to write diffusion mode so later sync becomes lighter again. + +This strategy switching lets the system balance write cost and sync speed dynamically, instead of applying one fixed approach to every group. + +## Why Sync Data Is Further Reduced + +Write diffusion sync reduces data volume for three main reasons. + +First, the change scope is smaller. +The client no longer synchronizes around the complete conversation set. It first processes conversations that the server has already marked as changed. When a user has thousands of conversations but only a few changed in the current round, sync data can drop significantly. + +Second, the change content is lighter. +The write diffusion record itself is a lightweight index, and what the SDK receives is the necessary conversation sync state rather than a full conversation list or a large amount of unchanged data. + +Third, fallback is more precise. +Even when correction is needed, it does not always require a full pull. The SDK can use segmented hash comparison to determine which ranges are consistent between the local client and the server, then pull only the inconsistent ranges. + +Together, these factors make login, reconnect, foreground recovery, and multi-device catch-up much lighter. + +## User Experience Improvements + +Write diffusion sync ultimately improves the experience users can feel. + +When users open the app, the conversation list can recover to the latest state faster. In unstable networks, the SDK does not need to repeatedly pull a large amount of conversation data. After being offline for a long time, the client can prioritize conversations that actually changed instead of re-comparing every conversation. + +For large enterprise users, the improvement is especially clear: + +- Conversation lists refresh faster after login. +- Unread counts and latest message states update more promptly. +- State catch-up is smoother when switching between devices. +- Accounts with many conversations are less likely to pause during sync. +- Less unnecessary data is transferred under weak network conditions. + +## Value for the Server + +Write diffusion sync optimizes not only the client, but also the server. + +Traditional sync pressure often concentrates when users come online, reconnect, log in in batches, or recover from network interruptions. Many clients ask the server at the same time which conversations changed, and the server needs to compare, query, and assemble results. + +Write diffusion moves part of that work to the write stage, making read-side sync requests simpler. The server can return results directly from user-level change records, reducing database and computation pressure during sync peaks. + +At the same time, the server controls which conversations are suitable for write diffusion and which are better handled by read-side correction, avoiding excessive pressure on the write path in very large group scenarios. + +## Abnormal Recovery + +Write diffusion sync must guarantee one thing: it can be fast, but it cannot lose correctness. + +Therefore, this capability keeps multiple fallback paths: + +- Enter full correction when versions are not continuous. +- Fall back to snapshot sync when there are too many write diffusion changes. +- Use snapshot correction when local conversation state is unreliable. +- Keep failed conversations in a retry queue during sync. +- Acknowledge state only after downstream processing succeeds, avoiding data loss caused by interruption. + +This gives write diffusion sync self-recovery capability. It is not useful only under ideal network conditions. It can handle common mobile scenarios such as disconnection, restart, background switching, and concurrent multi-device usage. + +## Applicable Scenarios + +Write diffusion sync is especially suitable for: + +- Enterprise organizations with many conversations. +- Users who join many groups. +- Accounts that often log in on multiple devices. +- Long conversation lists where each sync changes only a few conversations. +- Customer service, operation, and bot accounts that need to catch up quickly. +- Private deployments that need to reduce login and reconnect sync pressure. +- Products with high requirements for weak-network recovery speed and conversation consistency. + +If the business scale is small, ordinary incremental sync can already cover most needs. As conversation scale, group scale, and multi-device usage increase, however, the benefits of write diffusion sync become more obvious. + +## Summary + +Write diffusion sync is a further upgrade on top of incremental version sync. It moves "difference discovery" from the read side to the write side, allowing the server to record user-level change indexes when messages and conversation states change, and allowing the SDK to consume those changes directly during sync. + +Its core value includes: + +- Further reducing sync data volume. +- Speeding up conversation sync during login, reconnect, and foreground recovery. +- Reducing server pressure during sync peaks. +- Supporting large-scale conversations and multi-device catch-up. +- Keeping snapshot, paged correction, and full correction as fallback paths for abnormal cases. +- Balancing small-group write diffusion efficiency and large-group write cost through strategy switching. + +For enterprise scenarios that need to support large organizations, many conversations, multi-device login, and faster sync experiences, this capability can significantly improve conversation sync efficiency and system stability.