diff --git a/README.md b/README.md
index 0c640fd..a67de33 100644
--- a/README.md
+++ b/README.md
@@ -32,7 +32,7 @@ Fluxon is designed around these problems. It separates data-plane resources, obj
- **MQ (Elastic message queue)**: Decouples system dependencies and supports elastic message transport across heterogeneous resource pools
- **FS (`S3`-compatible file, object, and cache acceleration system)**: Unifies multi-form storage so one system can cache key-value, file, and object data, while supporting remote access, `S3` forwarding, and large-scale cross-cluster migration for AI data and model files
-
+
diff --git a/README_CN.md b/README_CN.md
index 3f978e7..a138d86 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -21,8 +21,7 @@ Fluxon 的设计正是围绕这些问题展开。它将数据面资源、对象
- **MQ(弹性消息队列)**:解耦系统依赖,支撑异构资源池之间的弹性消息传输
- **FS(兼容 `S3` 的文件、对象与缓存加速系统)**:统一键值、文件、对象三类缓存能力,并支持 AI 数据与模型文件的远端访问、`S3` 转发和跨集群大规模迁移
-
-
+
diff --git "a/fluxon_doc_cn/blog/blog_2_\344\270\200\346\254\241 AI \345\244\247 Payload \346\266\210\346\201\257\351\230\237\345\210\227\347\232\204\346\216\247\345\210\266\351\235\242\351\207\215\346\236\204.md" "b/fluxon_doc_cn/blog/blog_2_\344\270\200\346\254\241 AI \345\244\247 Payload \346\266\210\346\201\257\351\230\237\345\210\227\347\232\204\346\216\247\345\210\266\351\235\242\351\207\215\346\236\204.md"
new file mode 100644
index 0000000..f3e4f45
--- /dev/null
+++ "b/fluxon_doc_cn/blog/blog_2_\344\270\200\346\254\241 AI \345\244\247 Payload \346\266\210\346\201\257\351\230\237\345\210\227\347\232\204\346\216\247\345\210\266\351\235\242\351\207\215\346\236\204.md"
@@ -0,0 +1,164 @@
+# FluxonMQ:一次 AI 大 Payload 消息队列的控制面重构
+
+AI 训练和推理系统里的消息队列,处理的已经不再是几 KB 的业务事件。在 VAE 解耦训练、数据处理流水线、多模态中间态传递和跨资源池任务交接里,producer 传出去的往往是几十 MB 甚至更大的张量 Payload。consumer 可能动态加入、退出、扩缩容,也可能分布在不同机器、资源池或子集群。FluxonMQ 服务的就是这类场景:让 producer 和 consumer 通过消息语义解耦,同时让大 Payload 继续利用 Fluxon KV owner 的共享内存和跨节点传输路径。
+
+在这个设计里,MQ 层负责消息状态,KV 数据面负责 Payload。其中消息状态覆盖消息可见性、in-flight 归属、提交确认、失败重投和清理确认。Payload 保存在 KV owner 管理的内存和传输路径中,consumer 拿到消息后通过 Payload key 读取数据。这种分工让 MQ 可以承载大对象交接。
+
+早期 FluxonMQ 使用 etcd 推进消息状态。producer 写入 Payload 后,把消息可见状态写到 etcd;consumer 从 etcd 扫描和抢占消息,读取 Payload 后再写回消费进度。这条路径结构清晰,也复用了 etcd 的一致性和租约能力。问题出现在高并发热路径上:每条消息周围的 ready、claim、inflight、offset、commit 都会形成控制面读写。Payload 传输还在 KV owner 中进行,但消息能否被及时发现、抢占和提交,开始受 etcd 状态推进速度限制。
+
+这次 broker 优化针对的就是这段控制面热路径。etcd 仍然负责成员发现、租约、broker 发现和 channel 长期元数据;broker 接管每条消息的排队、抢占、提交、失败放回和清理确认;KV owner 继续负责 Payload 存储和传输。这个拆分把低频集群元数据和高频队列状态分开,让消息推进从外部 KV 存储操作转为 broker 内存状态更新。
+
+
+
+## 基础链路:Payload 在 KV,状态在队列
+
+早期链路的关键是把 Payload 和消息状态分离。producer 先把大对象写进 KV owner,再把指向 Payload 的消息状态写入 etcd。consumer 从 etcd 扫描可消费消息,完成抢占后拿到 Payload key,再从 KV owner 读取实际数据,处理完成后把消费进度写回 etcd。etcd 只保存消息状态和进度,避免承担大对象存储压力。
+
+随着 producer 和 consumer 数量增加,队列状态推进会成为更明显的成本。consumer 为了保持吞吐,会提高 batch size 和 prefetch 深度。prefetch 可以提前发起查找和抢占,但它并没有减少 etcd 上的控制面操作,只是把这些操作前移。高并发下,本地 inflight 能否填深,取决于 etcd 能否持续快速完成可见消息查找、抢占和提交推进。
+
+broker 链路把这些状态推进移到 broker 内部。producer 写入前先向 broker 申请 reservation。reservation 是一次写入尝试的占位,broker 返回 `reservation_id` 和 `msg_id`,并记录这条消息预计占用的 Payload bytes。Payload 写入 KV owner 成功后,producer 调用 `publish`,消息进入可消费队列。Payload 写失败时,producer 调用 `abort`,broker 释放占位和字节预算。这个顺序保证了 consumer 只能看到已经写入成功的 Payload。
+
+consumer 通过 `fetch` 获取消息。broker 将消息从可消费队列移动到 in-flight,并返回 Payload key。in-flight 表示消息已经被某个 consumer 拿走,但还没有确认消费完成。consumer 读取 Payload 并完成处理后调用 `commit`,这一步成功后,broker 才认为这条消息已经完成消费。consumer 返回 Payload 后,Rust 后台任务异步删除 KV Payload;删除完成后再由内部 cleanup 路径释放 broker 的 Payload byte budget。consumer 失败、超时或被取消时,未 commit 的消息会重新放回可消费队列,等待后续投递。
+
+这个流程把每条消息的状态推进留在 broker 内存中。`fetch`、`commit` 和 `requeue` 通过 P2P RPC 调用 broker,broker 更新本地状态后返回结果;cleanup 只作为 Rust 内部清理路径继续维护容量统计。etcd 从消息热路径中退出,只处理成员、租约和发现这类低频职责。
+
+## broker 的进程边界
+
+broker 作为独立进程运行,长期维护 MQ 队列状态。它的生命周期独立于 producer、consumer 和 KV owner。master 继续负责集群控制、租约和 owner 管理,broker 负责高频消息排队。把 broker 放在独立进程里,可以避免 MQ 热路径占用 master,并减少 master 故障和 MQ 队列状态之间的耦合。
+
+当前实现中,broker 底层通信身份复用 external client,没有新增 closed runtime 角色。MQ 业务身份通过 member metadata 中的 `fluxon_mq_component=broker` 标记。broker 不注册 segment,不贡献共享内存,也不拥有 Payload。producer 和 consumer 通过 broker discovery 找到 broker,再用 P2P RPC 调用 broker。
+
+这个边界保留了 Fluxon 现有通信层结构。broker 不会被 master 当作 KV owner 等待 segment 注册,P2P relay 和 external client 接入规则也可以继续复用。MQ 增加了一个控制面进程,但没有扩展一套新的底层角色体系。
+
+## 实现结构
+
+Rust 侧的 broker 状态位于 `fluxon_rs/fluxon_mq/src/broker.rs`。这部分实现沿用 KV 设计里的角色边界:`master` 维护集群控制面和路由,`owner` 承载共享内存、对象副本和跨节点传输,producer、consumer 和 broker 都以 `external_client` 身份接入,不贡献 owner 容量。这个边界在 [KV 设计 1 - 概览与分层](../design/kv_1_概览与分层.md) 里有完整说明。
+
+broker 保存的是消息控制面状态和 Payload 引用。Payload bytes 仍然由 KV owner 管理,broker 只记录 `payload_key`、`payload_bytes`、消息信封和队列位置。
+
+```rust
+pub struct LocalBroker {
+ state: BrokerState, // broker 内存状态
+}
+
+struct BrokerState {
+ channels: HashMap
, // 按 channel_id 保存队列状态
+ payload_byte_capacity: u64, // broker 维度的 Payload byte budget 上限
+ used_payload_bytes: u64, // 当前未释放消息占用的 Payload byte budget
+}
+
+struct ChannelState {
+ config: BrokerChannelConfig, // channel_id 和 capacity
+ next_reservation_id: u64, // channel 内递增的 reservation 编号
+ next_msg_by_producer: HashMap, // 每个 producer_id 的下一个 msg_id
+ pending: HashMap, // 已 reserve、尚未 publish 的消息
+ visible: VecDeque, // 已写入 Payload、可被 consumer fetch 的消息
+ inflight: HashMap, // 已被 consumer 取走、尚未 commit 的消息
+ inflight_order: VecDeque, // inflight 消息顺序
+ cleanup: VecDeque, // 已 commit、等待 Payload 清理的消息
+ cleanup_inflight: HashMap, // 已分配给清理任务、等待内部清理确认的消息
+ used_slots: i64, // channel 当前占用的消息槽位
+ reserve_waiters: VecDeque, // 因容量不足等待 reserve 的请求
+ fetch_waiters: VecDeque, // 因可见消息不足等待 fetch 的请求
+}
+```
+
+重复 `commit` 不再依赖单独的 committed 集合。broker 直接从 `cleanup` 和 `cleanup_inflight` 判断这条 reservation 是否已经完成首次提交但清理尚未结束;清理完成后,消息生命周期结束,再次提交会按不存在的 in-flight delivery 处理。
+
+broker RPC 和内部状态机使用的主要消息结构如下:
+
+```rust
+pub struct BrokerChannelConfig {
+ pub channel_id: i64, // channel 标识
+ pub capacity: i64, // channel 消息槽位上限
+}
+
+pub struct BrokerReserveRequest {
+ pub channel_id: i64, // 目标 channel
+ pub producer_id: String, // producer 标识
+ pub category: MqCategory, // MPSC 或 MPMC 子队列类型
+ pub payload_bytes: u64, // 本条消息预计占用的 Payload bytes
+ pub now_ms: i64, // reserve 时间
+}
+
+pub struct BrokerFetchRequest {
+ pub channel_id: i64, // 目标 channel
+ pub consumer_id: String, // consumer 标识
+ pub now_ms: i64, // fetch 时间
+}
+
+pub struct BrokerEnvelope {
+ pub channel_id: i64, // channel 标识
+ pub producer_id: String, // producer 标识
+ pub msg_id: i64, // producer 内递增消息编号
+ pub reservation_id: u64, // 本次写入 reservation 编号
+ pub payload_key: String, // KV owner 中的 Payload key
+ pub payload_bytes: u64, // Payload byte budget 计数
+ pub reserved_at_ms: i64, // reserve 时间
+ pub published_at_ms: Option, // publish 时间,未 publish 时为空
+}
+
+pub struct BrokerCommitOutcome {
+ pub first_commit: bool, // 本次 commit 是否首次生效
+ pub cleanup: Option, // 首次 commit 后生成的清理任务
+}
+
+pub struct BrokerCommitBatchOutcome {
+ pub first_commit_count: usize, // batch 中首次 commit 成功的数量
+ pub cleanup: Vec, // batch 生成的清理任务
+}
+```
+
+状态流转可以简化为下面这条链路:
+
+
+
+producer 热路径位于 `fluxon_rs/fluxon_mq/src/producer.rs`。broker 路径的写入顺序是 `reserve -> KV put -> publish`。`reserve` 成功后,broker 已经生成 `payload_key` 并扣减 `payload_bytes`;producer 随后把实际 Payload 写入 KV owner。只有 KV 写入成功后,`publish` 才会把消息从 `pending` 推到 `visible`,因此 consumer 只能 fetch 到已经完成 Payload 写入的消息。如果 KV 写入失败,producer 会调用 `abort` 释放 reservation 和 byte budget。
+
+当 channel 满或 `payload_byte_capacity` 不足时,producer 在 Rust 热路径里按 `BrokerError::ChannelFull` 或 `BrokerError::PayloadBytesFull` 做退避重试。这个重试发生在 broker reserve 阶段,等待条件直接来自 `used_slots` 和 `used_payload_bytes`,比 Python 外层固定 sleep 更贴近真实队列状态。
+
+consumer 热路径位于 `fluxon_rs/fluxon_mq/src/consumer.rs` 和 `fluxon_rs/fluxon_pyo3/src/mpsc.rs`。consumer 先通过 broker `fetch` 取得 `BrokerEnvelope`,再用其中的 `payload_key` 从 KV owner 读取 Payload。`commit` 成功后,Payload 立即返回给上层;Rust 后台清理任务随后删除 KV Payload,并通过 broker 内部清理确认释放 byte budget。Python 层主要负责 API 包装、bench 编排和 teardown;消息推进、背压等待和 cleanup 状态已经收敛到 Rust broker 路径。
+
+
+
+MPMC bench 的清理逻辑位于 `fluxon_py/tests/test_api_chan_mpmc/test_mpmc_simple_bench.py`。teardown 时会删除本轮 MPMC 子 MPSC channel,并继续删除 broker 返回的 Payload keys。这里需要同时处理两类资源:broker 侧的 `used_payload_bytes` 和 KV owner 侧的真实 Payload。前者靠 `cleanup_ack`、`abort` 或 `delete_channel` 释放;后者靠对 `payload_key` 执行 KV delete 释放。两边都释放后,连续 case 才不会被上一轮残留数据占住 owner pool 或 broker byte budget。
+
+## 性能结果
+
+测试环境为单机,owner pool 为 `100GB`,channel capacity 为 `4096`,低日志运行,Payload 为 DLPack 数据。对比对象是 etcd 队列推进和 broker 队列推进,两边使用相同的 producer、consumer、batch、prefetch 和 Payload 参数。
+
+
+
+| case | P/C | batch/prefetch | Payload | etcd MB/s | broker MB/s | 变化 |
+| --- | ---: | ---: | --- | ---: | ---: | ---: |
+| 01 | 16/8 | 40/40 | 4.8MB | 7660.80 | 8010.24 | +4.6% |
+| 02 | 16/12 | 40/40 | 4.8MB | 7372.80 | 9496.80 | +28.8% |
+| 03 | 24/8 | 40/40 | 4.8MB | 7046.40 | 7350.24 | +4.3% |
+| 04 | 16/8 | 40/120 | 4.8MB | 6931.20 | 9791.52 | +41.3% |
+| 05 | 16/4 | 40/40 | 4.8MB | 7756.80 | 8294.40 | +6.9% |
+| 06 | 16/2 | 40/40 | 4.8MB | 6201.60 | 5875.20 | -5.3% |
+| 07 | 16/4 | 48/48 | 4.8MB | 7925.76 | 8155.68 | +2.9% |
+| 08 | 16/4 | 64/64 | 4.8MB | 7802.88 | 8382.24 | +7.4% |
+| 09 | 16/4 | 48/48 | 8MB | 12441.60 | 14153.60 | +13.8% |
+| 10 | 16/4 | 48/48 | 12MB | 17625.60 | 18356.40 | +4.1% |
+| 11 | 16/4 | 48/48 | 16MB | 22041.60 | 26102.40 | +18.4% |
+| 12 | 16/4 | 48/48 | 20MB | 26016.00 | 18222.00 | -30.0% |
+| 13 | 16/4 | 48/48 | 24MB | 29030.40 | 46552.80 | +60.4% |
+| 14 | 16/4 | 48/48 | 32MB | 34252.80 | 56624.00 | +65.3% |
+| 15 | 24/4 | 48/48 | 32MB | 42393.60 | 44067.20 | +3.9% |
+| 16 | 32/4 | 48/48 | 32MB | 35328.00 | 42198.40 | +19.4% |
+| 17 | 24/2 | 48/48 | 32MB | 17817.60 | 36969.60 | +107.5% |
+| 18 | 24/4 | 48/48 | 40MB | 51264.00 | 63656.00 | +24.2% |
+| 19 | 24/4 | 48/48 | 48MB | 54835.20 | 54451.20 | -0.7% |
+| 20 | 24/4 | 48/48 | 56MB | 57792.00 | 85254.40 | +47.5% |
+| 21 | 24/4 | 48/48 | 64MB | 48844.80 | 89952.00 | +84.2% |
+
+小 Payload 下,broker 的收益取决于并发组织。`16p/12c b40/pf40 4.8MB` 从 `7372.80 MB/s` 提升到 `9496.80 MB/s`,提升 `28.8%`;`16p/8c b40/pf120 4.8MB` 从 `6931.20 MB/s` 提升到 `9791.52 MB/s`,提升 `41.3%`。这些点的共同特征是 consumer 或 prefetch 对控制面推进的需求更强,broker 能让本地 inflight 更稳定地填起来。
+
+大 Payload 下,控制面阻塞减少后,数据面更容易持续跑满。`24MB` 从 `29030.40 MB/s` 提升到 `46552.80 MB/s`,`32MB` 从 `34252.80 MB/s` 提升到 `56624.00 MB/s`,`56MB` 从 `57792.00 MB/s` 提升到 `85254.40 MB/s`,`64MB` 从 `48844.80 MB/s` 提升到 `89952.00 MB/s`。纯 etcd 路径的最佳点是 `24p/4c b48/pf48 dlpack 56MB`,稳态吞吐 `57792.00 MB/s`;broker 路径的最佳点是 `24p/4c b48/pf48 dlpack 64MB`,稳态吞吐 `89952.00 MB/s`。
+
+## 结尾
+
+FluxonMQ broker 优化把每条消息的高频状态推进从 etcd 迁到 broker,etcd 保留成员、租约、发现和长期元数据职责,KV owner 继续承载大 Payload 数据面。这个调整让 MQ 控制面更贴近消息运行时状态,也让 Payload 传输继续复用 Fluxon 的共享内存和跨节点数据路径。
+
+在单机 `100GB` owner pool 测试中,etcd 路径最高 `57.79GB/s`,broker 路径最高 `89.95GB/s`。更重要的是,队列推进已经从外部 KV 存储读写变成内存状态机更新,为后续多 broker 分片、批量 RPC、跨节点 MQ 和更细粒度容量治理提供了更清晰的演进基础。
diff --git "a/fluxon_doc_cn/user_doc/\347\224\250\346\210\267 - 1 - \346\236\266\346\236\204\345\222\214\346\246\202\345\277\265.md" "b/fluxon_doc_cn/user_doc/\347\224\250\346\210\267 - 1 - \346\236\266\346\236\204\345\222\214\346\246\202\345\277\265.md"
index fc1dd8b..47313f0 100644
--- "a/fluxon_doc_cn/user_doc/\347\224\250\346\210\267 - 1 - \346\236\266\346\236\204\345\222\214\346\246\202\345\277\265.md"
+++ "b/fluxon_doc_cn/user_doc/\347\224\250\346\210\267 - 1 - \346\236\266\346\236\204\345\222\214\346\246\202\345\277\265.md"
@@ -8,7 +8,7 @@
### 系统全景架构
-
+
组件视角的全景图,用来定位各组件的职责和依赖关系。
diff --git a/fluxon_doc_en/user_doc/User - 1 - Architecture and Concepts.md b/fluxon_doc_en/user_doc/User - 1 - Architecture and Concepts.md
index f0a6417..2a8fd0a 100644
--- a/fluxon_doc_en/user_doc/User - 1 - Architecture and Concepts.md
+++ b/fluxon_doc_en/user_doc/User - 1 - Architecture and Concepts.md
@@ -8,7 +8,7 @@ This page explains the core concepts and config fields that appear throughout th
### System Overview
-
+
- Control plane / metadata: `etcd + Master` for members, leases, routing, and connection-state metadata
- Data plane: `shared memory + transfer engine` for same-host reuse and cross-node data transfer
diff --git a/fluxon_py/_api_ext_chan/mpmc.py b/fluxon_py/_api_ext_chan/mpmc.py
index 4ddbc1e..085e76c 100644
--- a/fluxon_py/_api_ext_chan/mpmc.py
+++ b/fluxon_py/_api_ext_chan/mpmc.py
@@ -96,18 +96,34 @@
LOCAL_MEMBER_ID_RANGE_SIZE = 32
MPMC_CREATE_LOCK_TTL_SECONDS = 10
MPMC_CREATE_LOCK_TIMEOUT_SECONDS = 10.0
+MPMC_CLEANUP_ETCD_TIMEOUT_SECONDS = 2.0
-def new_etcd_client(api: KvClient) -> Result[etcd3.Etcd3Client, ApiError]:
+def _close_lease_handle(handle: Optional[object], label: str) -> None:
+ if handle is None:
+ return
+ try:
+ handle.close() # type: ignore[attr-defined]
+ except Exception as e: # noqa: BLE001
+ logging.warning("failed to close lease handle %s: %s", label, e)
+
+
+def new_etcd_client(
+ api: KvClient, *, timeout_seconds: Optional[float] = None
+) -> Result[etcd3.Etcd3Client, ApiError]:
"""Create etcd client"""
etcd_config: List[str] = api.get_etcd_config()
first_address: str = etcd_config[0]
host: str
port_str: str
host, port_str = first_address.split(":")
- print(f"new_etcd_client: {host}:{port_str}")
try:
- client: etcd3.Etcd3Client = etcd3.client(host=host, port=int(port_str))
+ kwargs: Dict[str, Any] = {}
+ if timeout_seconds is not None:
+ kwargs["timeout"] = float(timeout_seconds)
+ client: etcd3.Etcd3Client = etcd3.client(
+ host=host, port=int(port_str), **kwargs
+ )
return Result.new_ok(client)
except Exception as e:
return Result.new_error(
@@ -136,8 +152,10 @@ def stable_revoke_lease(api: KvClient, lease_id: int) -> Result[OkNone, ApiError
endpoint = endpoints[0] if endpoints else None
errors: List[str] = []
- for attempt in range(3):
- client_res = new_etcd_client(api)
+ for attempt in range(2):
+ client_res = new_etcd_client(
+ api, timeout_seconds=MPMC_CLEANUP_ETCD_TIMEOUT_SECONDS
+ )
if not client_res.is_ok():
err = client_res.unwrap_error()
errors.append(str(err))
@@ -183,8 +201,10 @@ def stable_delete_ready_keys_for_member(
member_id_str = str(member_id)
errors: List[str] = []
- for attempt in range(3):
- client_res = new_etcd_client(api)
+ for attempt in range(2):
+ client_res = new_etcd_client(
+ api, timeout_seconds=MPMC_CLEANUP_ETCD_TIMEOUT_SECONDS
+ )
if not client_res.is_ok():
err = client_res.unwrap_error()
errors.append(str(err))
@@ -203,22 +223,7 @@ def stable_delete_ready_keys_for_member(
for key in keys_to_delete:
client.delete(key)
- # Verify: keys should be gone immediately after delete on the same prefix.
- remaining: List[bytes] = []
- for value, meta in client.get_prefix(prefix):
- if value is None:
- continue
- if value.decode() != member_id_str:
- continue
- remaining.append(meta.key)
-
- if len(remaining) == 0:
- return Result.new_ok(OK_NONE)
-
- errors.append(
- f"attempt={attempt}: remaining ready keys after delete: {remaining!r}"
- )
- time.sleep(0.1)
+ return Result.new_ok(OK_NONE)
except Exception as e: # noqa: BLE001
errors.append(f"attempt={attempt}: {e}")
time.sleep(0.1)
@@ -1802,19 +1807,16 @@ def close(self) -> Result[OkNone, ApiError]:
except Exception as e: # noqa: BLE001
logging.warning(f"MPMC channel {self.mpmc_id} stop_watching failed: {e}")
- # Drop PyLease handles to stop keepalive; etcd leases with
- # revoke_on_drop=False are intentionally not revoked.
- # Setting to None drops the PyO3 handle immediately in CPython,
- # which releases the underlying Rust RAII and unregisters from
- # the keepalive actor.
- if hasattr(self, "_lm_mpmc_member"):
- self._lm_mpmc_member = None # type: ignore[assignment]
- if hasattr(self, "_lm_mpmc_global"):
- self._lm_mpmc_global = None # type: ignore[assignment]
- if hasattr(self, "_lm_cluster_long"):
- self._lm_cluster_long = None # type: ignore[assignment]
- if hasattr(self, "_lm_kv_payload"):
- self._lm_kv_payload = None # type: ignore[assignment]
+ # Close lease handles explicitly so keepalive entries are unregistered
+ # before the owning KvClient starts shutting down.
+ _close_lease_handle(self._lm_mpmc_member, "mpmc_member")
+ self._lm_mpmc_member = None
+ _close_lease_handle(self._lm_mpmc_global, "mpmc_global")
+ self._lm_mpmc_global = None
+ _close_lease_handle(self._lm_cluster_long, "mpmc_cluster_long")
+ self._lm_cluster_long = None
+ _close_lease_handle(self._lm_kv_payload, "mpmc_kv_payload")
+ self._lm_kv_payload = None
# Return a minimal Ok result to satisfy the explicit Result API contract
return Result.new_ok(OK_NONE)
@@ -2025,6 +2027,12 @@ def _record_mpsc_producer(self, mpsc_producer: MPSCChanProducer):
def put_data(
self, value: Dict[str, Union[int, float, bool, str, bytes, DLPacked]]
+ ) -> Result[bool, ApiError]:
+ return self._put_data_impl(value)
+
+ def _put_data_impl(
+ self,
+ value: Dict[str, Union[int, float, bool, str, bytes, DLPacked]],
) -> Result[bool, ApiError]:
"""Put data to the MPMC channel.
@@ -2051,9 +2059,11 @@ def put_data(
)
)
+ capacity = int(self.chan_config["capacity"])
+ assert capacity > 0, f"invalid MPMC channel capacity: {capacity}"
+
# Do not hold _op_lock while performing network-heavy operations (count_prefix/put_data).
# Otherwise close() may block behind a long RPC and tests like MQ capacity+auto-clean can hang.
- capacity = int(self.chan_config["capacity"]) # validated upfront
while True:
if self.shutdown_ctl.closed:
return Result[bool, ApiError].new_error(
@@ -2159,6 +2169,20 @@ def put_data(
return Result[bool, ApiError].new_ok(True)
err = put_result.unwrap_error()
+ if isinstance(err, MessageBufferFullError):
+ blocking_observed_unix_ms = int(time.time() * 1000)
+ try:
+ candidate.record_blocking_put_observed(blocking_observed_unix_ms)
+ except Exception as e: # noqa: BLE001
+ logging.warning(
+ "MPMCChanProducer mpmc_id=%s failed to record broker backpressure on mpsc_id=%s producer_idx=%s: %s",
+ self.mpmc_id,
+ candidate.get_chan_id(),
+ candidate.get_producer_id(),
+ e,
+ )
+ time.sleep(0.02)
+ continue
logging.error(
"MPMCChanProducer mpmc_id=%s failed to put data on mpsc_id=%s producer_idx=%s: %s",
self.mpmc_id,
@@ -2352,12 +2376,25 @@ def __init__(
self.mpsc_consumer: Optional[MPSCChanConsumer] = None
self.bound_mpsc_id: Optional[str] = None
- # Get next available channel and bind to it
- fails=[]
- for i in range(10):
+ # Get next available channel and bind to it. Concurrent consumers may
+ # lose a claim/create race; retry those bounded authority-state races.
+ fails: List[ApiError] = []
+ max_bind_attempts = 10
+ for i in range(max_bind_attempts):
next_channel_result = self.mpmc_channel.get_next_available_channel(self.api, self.chan_config)
if not next_channel_result.is_ok():
- raise ValueError(f"Failed to get next available channel: {next_channel_result.unwrap_error()}")
+ err = next_channel_result.unwrap_error()
+ if isinstance(err, (ChanCreateError, ChanBindError)):
+ logging.warning(
+ "MPMC consumer failed to get next channel on attempt %s/%s; retrying: %s",
+ i + 1,
+ max_bind_attempts,
+ err,
+ )
+ fails.append(err)
+ time.sleep(0.1)
+ continue
+ raise ValueError(f"Failed to get next available channel: {err}")
next_channel = next_channel_result.unwrap()
if next_channel is None:
@@ -2380,13 +2417,15 @@ def __init__(
# claimed inside MPMCChannel return with _mpmc_ready_claimed=True.
res=self.mark_channel_ready(next_channel.get_chan_id())
if not res.is_ok():
- logging.warning(f"Failed to mark channel ready: {res.unwrap_error()}")
+ err = res.unwrap_error()
+ logging.warning(f"Failed to mark channel ready: {err}")
# Close the just-created/bound MPSC consumer to avoid dangling consumers
try:
next_channel.release_local_handle().unwrap()
except Exception as e:
logging.debug(f"close leaked MPSC consumer error: {e}")
- fails.append(res.unwrap_error())
+ fails.append(err)
+ time.sleep(0.1)
continue
if res.unwrap():
self.mpsc_consumer = next_channel
@@ -2402,12 +2441,15 @@ def __init__(
next_channel.release_local_handle().unwrap()
except Exception as e:
logging.debug(f"close leaked MPSC consumer error: {e}")
- fails.append("transaction failed")
+ fails.append(ChanBindError("ready channel claim transaction failed"))
+ time.sleep(0.1)
continue
else:
raise ValueError(f"Unexpected channel type: {type(next_channel)}")
- raise ValueError(f"Failed to mark channel ready with {len(fails)} fails: {fails}")
+ raise ValueError(
+ f"Failed to bind MPMC consumer after {max_bind_attempts} attempts: {fails}"
+ )
def request_shutdown(self) -> None:
if self.shutdown_ctl.closed:
@@ -2415,7 +2457,7 @@ def request_shutdown(self) -> None:
self.shutdown_ctl.closed = True
if self.mpsc_consumer is not None and hasattr(self.mpsc_consumer, "request_shutdown"):
self.mpsc_consumer.request_shutdown()
-
+
def get_chan_id(self) -> str:
"""
Get the channel id.
@@ -2431,6 +2473,18 @@ def get_consumer_id(self) -> str:
def get_data(
self, batch_size: int = 1, try_time: Optional[int] = None, prefetch_num: int = 0
+ ) -> Result[List[Dict[str, Union[int, float, bool, str, bytes, DLPacked]]], ApiError]:
+ del prefetch_num
+ return self._get_data_impl(
+ batch_size=batch_size,
+ try_time=try_time,
+ )
+
+ def _get_data_impl(
+ self,
+ *,
+ batch_size: int,
+ try_time: Optional[int],
) -> Result[List[Dict[str, Union[int, float, bool, str, bytes, DLPacked]]], ApiError]:
"""Get data from the bound MPSC channel.
@@ -2463,22 +2517,7 @@ def get_data(
# Get data from MPSC consumer (will automatically return producer info when MPSC acts as submodule)
from .mpsc import ConsumedMessage
- # # Map MPMC-level prefetch to per-MPSC prefetch: divide by active MPMC consumers, ceil, min divisor=1
- # try:
- # active_consumers = self.mpmc_channel._get_active_consumer_count()
- # except Exception as e: # noqa: BLE001
- # logging.warning(
- # f"[Unreachable] Failed to get active consumer count: {e}"
- # )
- # active_consumers = 0
-
- # # ceil division without importing math: (a + b - 1) // b
- # mapped_prefetch = 0
- # if prefetch_num > 0 and active_consumers > 0:
- # mapped_prefetch = (prefetch_num + active_consumers - 1) // active_consumers
- result = self.mpsc_consumer.get_data(
- batch_size, try_time, prefetch_num=prefetch_num
- )
+ result = self.mpsc_consumer.get_data(batch_size, try_time=try_time)
if not result.is_ok():
err = result.unwrap_error()
if self.shutdown_ctl.closed:
@@ -2548,6 +2587,18 @@ def close(self) -> Result[OkNone, ApiError]:
f"MPMCChanConsumer {self.get_consumer_id()} before_close on underlying MPSC consumer failed: {e}"
)
+ # Close the underlying MPSC consumer first so local keepalive/prefetch
+ # tasks stop before lease revoke and ready-key cleanup.
+ try:
+ if self.mpsc_consumer is not None:
+ self.mpsc_consumer.release_local_handle().unwrap()
+ except Exception as e: # noqa: BLE001
+ logging.warning(
+ f"MPMCChanConsumer {self.get_consumer_id()} failed to close underlying MPSC consumer: {e}"
+ )
+ finally:
+ self.mpsc_consumer = None
+
# Delete ready keys for this consumer (best-effort).
mpmc_id = self.mpmc_id
assert mpmc_id is not None, "MPMC channel ID is None"
@@ -2599,17 +2650,6 @@ def close(self) -> Result[OkNone, ApiError]:
f"MPMCChanConsumer {self.get_consumer_id()} failed to revoke member lease: {e}"
)
- # Close the underlying MPSC consumer and drop the handle.
- try:
- if self.mpsc_consumer is not None:
- self.mpsc_consumer.release_local_handle().unwrap()
- except Exception as e: # noqa: BLE001
- logging.warning(
- f"MPMCChanConsumer {self.get_consumer_id()} failed to close underlying MPSC consumer: {e}"
- )
- finally:
- self.mpsc_consumer = None
-
# Optional sub-component cleanup.
try:
if hasattr(self, 'rate_limiter') and self.rate_limiter is not None:
diff --git a/fluxon_py/_api_ext_chan/mpsc.py b/fluxon_py/_api_ext_chan/mpsc.py
index 1eeac76..7905c4e 100644
--- a/fluxon_py/_api_ext_chan/mpsc.py
+++ b/fluxon_py/_api_ext_chan/mpsc.py
@@ -8,10 +8,9 @@
Old Python implementations (ChanManager, etcd watchers, prefetch
queues) have been removed.
-Currently this shim focuses on wiring up leases and identities. Data
-path operations (`put_data`/`get_data`) are intentionally left as
-placeholders and should be implemented in Rust and exposed via
-`fluxon_pyo3` in follow-up work.
+Broker-backed data-path operations are the default public contract.
+The old direct MPSC data path is kept only behind private helpers for
+short-lived internal checks.
"""
from __future__ import annotations
@@ -55,6 +54,11 @@
logging = init_logger(__name__)
+MPSC_PREFETCH_TARGET_MAX = 256
+MPSC_KVCLIENT_KEEPALIVE_RETRY_SLEEP_SECONDS = 0.05
+MPSC_KVCLIENT_KEEPALIVE_RETRIES = 3
+_LEASE_BACKEND_CALLBACK_LOCKS: Dict[str, threading.Lock] = {}
+_LEASE_BACKEND_CALLBACK_LOCKS_GUARD = threading.Lock()
# ---------------------------------------------------------------------------
# Test-only GC close markers
@@ -269,6 +273,11 @@ def _ensure_kvclient_lease_backend(api: KvClient, cluster: str) -> Any:
message="KvClient must implement KvLeaseApi for MPSC payload lease",
)
+ with _LEASE_BACKEND_CALLBACK_LOCKS_GUARD:
+ callback_lock = _LEASE_BACKEND_CALLBACK_LOCKS.setdefault(
+ cluster, threading.Lock()
+ )
+
def allocate_cb(ttl_seconds: int) -> int:
"""Bridge to KvLeaseApi.allocate_lease for the given TTL.
@@ -279,7 +288,8 @@ def allocate_cb(ttl_seconds: int) -> int:
Do NOT raise ApiError dataclasses here (they are not Exceptions) to
avoid PyErr(TypeError: exceptions must derive from BaseException).
"""
- res = api.allocate_lease(int(ttl_seconds))
+ with callback_lock:
+ res = api.allocate_lease(int(ttl_seconds))
if not res.is_ok():
# Raise a real Python Exception so PyO3 converts it to Err(...)
raise RuntimeError(
@@ -297,8 +307,21 @@ def keepalive_cb(lease_id: int) -> None:
cause type conversion errors in PyO3. See logs: "exceptions must derive
from BaseException" when raising non-Exception ApiError values.
"""
- # Keepalive must not alter TTL; do not pass custom_ttl
- res = api.keepalive_lease(int(lease_id))
+ # Keepalive must not alter TTL; do not pass custom_ttl. The PyO3
+ # KvClient object uses mutable Rust borrows, so serialize callbacks
+ # from the lease actor to avoid re-entering the same client handle.
+ for attempt in range(MPSC_KVCLIENT_KEEPALIVE_RETRIES):
+ with callback_lock:
+ res = api.keepalive_lease(int(lease_id))
+ if res.is_ok():
+ _ = res.unwrap()
+ return None
+ err = res.unwrap_error()
+ if "Already mutably borrowed" in str(err) and attempt + 1 < MPSC_KVCLIENT_KEEPALIVE_RETRIES:
+ time.sleep(MPSC_KVCLIENT_KEEPALIVE_RETRY_SLEEP_SECONDS)
+ continue
+ break
+
if not res.is_ok():
err = res.unwrap_error()
# When the client is shutting down, background keepalive calls can race with the
@@ -311,9 +334,6 @@ def keepalive_cb(lease_id: int) -> None:
raise RuntimeError(
f"kvclient keepalive_lease failed for cluster={cluster}: {err}"
)
- # Success: consume Ok(None) to satisfy strict Result policy
- _ = res.unwrap()
- # Success path: return None explicitly to map to Rust ()
return None
# Inject kvclient allocate/keepalive callbacks while constructing LeaseBackendUid.
@@ -403,6 +423,11 @@ def new_consumer(
parent_mpmc_member_id_opt,
)
+ def delete_broker_channel(self, chan_id: str) -> list[str]:
+ if not isinstance(chan_id, str) or not chan_id.isdigit():
+ raise ValueError(f"invalid broker channel id: {chan_id!r}")
+ return list(self._inner.delete_broker_channel(int(chan_id)))
+
def close(self) -> None:
self._inner.close()
@@ -503,11 +528,13 @@ def __init__(
# through the Rust MPSC layer.
self._payload_lease_id = self._handle.payload_lease_id() # type: ignore[attr-defined]
+ self._handle.init_broker() # type: ignore[attr-defined]
+
# Expose chan_id for legacy call sites that accessed the attribute.
self.chan_id = self._chan_id
logging.info(
- "%s initialized via Rust MPSC: chan_id=%s, producer_idx=%s",
+ "%s initialized via Rust MPSC broker path: chan_id=%s, producer_idx=%s",
self.dbg_tag(),
self.get_chan_id(),
self.get_producer_id(),
@@ -543,6 +570,25 @@ def record_blocking_put_observed(self, unix_ms: int) -> None:
def put_data(
self, value: Dict[str, Union[int, float, bool, str, bytes, DLPacked]]
+ ) -> Result[bool, ApiError]:
+ return self._put_data_with_writer(
+ value,
+ self._handle.put_flat_dict_ptrs, # type: ignore[attr-defined]
+ )
+
+ def _put_data_legacy_for_internal_check(
+ self, value: Dict[str, Union[int, float, bool, str, bytes, DLPacked]]
+ ) -> Result[bool, ApiError]:
+ """Use the old direct MPSC write path for temporary internal checks only."""
+ return self._put_data_with_writer(
+ value,
+ self._handle.put_flat_dict_ptrs_legacy_for_internal_check, # type: ignore[attr-defined]
+ )
+
+ def _put_data_with_writer(
+ self,
+ value: Dict[str, Union[int, float, bool, str, bytes, DLPacked]],
+ writer: Any,
) -> Result[bool, ApiError]:
"""Put data into the channel via Rust backend.
@@ -576,7 +622,7 @@ def put_data(
dlpack_capsules: List[object] = []
try:
ptrs = _fluxon_kv.build_flat_dict_ptrs(value, keepalive, dlpack_capsules)
- self._handle.put_flat_dict_ptrs(ptrs) # type: ignore[attr-defined]
+ writer(ptrs)
except Exception as e: # pragma: no cover - thin shim
if _is_close_during_put_error(e):
self.shutdown_ctl.closed = True
@@ -608,6 +654,10 @@ def put_data(
# If Rust changes LeaseMgrError variants or mappings, update:
# 1) The LeaseMgrError mapping in py_error_from_kv_error;
# 2) The check here and its corresponding tests.
+ if e.__class__.__name__ == "MessageBufferFullError":
+ logging.debug("%s put_flat_dict_ptrs backpressured: %s", self.dbg_tag(), e)
+ return Result[bool, ApiError].new_error(e) # type: ignore[arg-type]
+
logging.error("%s put_flat_dict_ptrs failed: %s", self.dbg_tag(), e)
if isinstance(e, PayloadLeaseNotFoundError):
# Mark closed and best-effort notify Rust side to stop callbacks/holds.
@@ -817,11 +867,12 @@ def __init__(
else:
self._handle.init_payload_callback(self._build_get_payload()) # type: ignore[attr-defined]
self._handle.init_delete_callback(self._build_delete_callback()) # type: ignore[attr-defined]
+ self._handle.init_broker() # type: ignore[attr-defined]
# Guard to make close idempotent without relying on None checks.
self._closed_local: bool = False
logging.info(
- "%s initialized via Rust MPSC: chan_id=%s, consumer_idx=%s, payload_backend=%s",
+ "%s initialized via Rust MPSC broker path: chan_id=%s, consumer_idx=%s, payload_backend=%s",
self._dbg_tag,
self._chan_id,
self._consumer_id,
@@ -1080,38 +1131,144 @@ def get_data(
List[Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]],
ApiError,
]:
- """Unified prefetch-first get API.
+ return self._get_data_broker(
+ batch_size=batch_size,
+ try_time=try_time,
+ prefetch_num=prefetch_num,
+ )
+
+ def _get_data_legacy_for_internal_check(
+ self,
+ batch_size: int = 1,
+ try_time: Optional[int] = None,
+ prefetch_num: int = 0,
+ ) -> Result[
+ List[Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]],
+ ApiError,
+ ]:
+ """Use the old prefetch MPSC read path for temporary internal checks only."""
+ return self._get_data_legacy_prefetch(
+ batch_size=batch_size,
+ try_time=try_time,
+ prefetch_num=prefetch_num,
+ )
+
+ def _get_data_broker(
+ self,
+ *,
+ batch_size: int,
+ try_time: Optional[int],
+ prefetch_num: int,
+ ) -> Result[
+ List[Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]],
+ ApiError,
+ ]:
+ """Get data via the broker-backed public path."""
+ timeout_ms = self._get_timeout_ms(try_time)
+ prefetch_target = min(
+ batch_size + max(prefetch_num, 0),
+ MPSC_PREFETCH_TARGET_MAX,
+ )
+ try:
+ batch = self._handle.get_batch( # type: ignore[attr-defined]
+ batch_size,
+ prefetch_target,
+ timeout_ms,
+ )
+ except Exception as e:
+ if self.shutdown_ctl.closed:
+ api_err: ApiError = ChannelClosedError(
+ message="Consumer is closed.",
+ channel_id=self._chan_id,
+ )
+ elif isinstance(e, ApiError):
+ api_err = e
+ else:
+ api_err = MqGetDataUnknownError.from_exception(
+ e, channel_id=self._chan_id, consumer_id=self._consumer_id
+ )
+ if isinstance(api_err, (MessageConsumptionNoNewMessageError, ChannelClosedError)):
+ logging.debug("%s get_batch finished without payload: %s", self.dbg_tag(), api_err)
+ else:
+ logging.error("%s get_batch failed: %s", self.dbg_tag(), api_err)
+ return Result[
+ List[
+ Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]
+ ],
+ ApiError,
+ ].new_error(api_err)
+
+ if not batch:
+ return Result[
+ List[
+ Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]
+ ],
+ ApiError,
+ ].new_error(
+ MessageConsumptionNoNewMessageError("No message available")
+ )
+
+ return Result(batch)
+
+ def _get_data_legacy_prefetch(
+ self,
+ *,
+ batch_size: int,
+ try_time: Optional[int],
+ prefetch_num: int,
+ ) -> Result[
+ List[Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]],
+ ApiError,
+ ]:
+ """Get data through the old direct MPSC prefetch path."""
+ timeout_ms = self._get_timeout_ms(try_time)
+
+ return self._get_data_with_fetcher(
+ batch_size=batch_size,
+ fetch_one=lambda prefetch_target, _timeout_ms: self._handle.get_one_legacy_for_internal_check( # type: ignore[attr-defined]
+ prefetch_target,
+ timeout_ms,
+ ),
+ prefetch_target=min(
+ batch_size + max(prefetch_num, 0),
+ MPSC_PREFETCH_TARGET_MAX,
+ ),
+ timeout_ms=timeout_ms,
+ )
+
+ def _get_timeout_ms(self, try_time: Optional[int]) -> Optional[int]:
+ if try_time is None:
+ return None
+ t_sec = try_time if try_time > 0 else 1
+ timeout_ms = int(t_sec * 1000)
+ assert timeout_ms > 0
+ return timeout_ms
+
+ def _get_data_with_fetcher(
+ self,
+ *,
+ batch_size: int,
+ fetch_one: Any,
+ prefetch_target: int = 0,
+ timeout_ms: Optional[int] = None,
+ ) -> Result[
+ List[Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]],
+ ApiError,
+ ]:
+ """Common get loop used by broker and internal legacy checks.
Semantics:
- If it returns Ok([...]), each element is from a successful get_one call.
- - If any get_one in this batch raises an error, the entire batch fails and
- returns Err(ApiError) immediately (no "partial success" Ok list).
-
- The window size is mapped to `batch_size + prefetch_num`, so the underlying
- Rust actor maintains a local prefetch queue of that size.
+ - NoNewMessage/ChannelClosed only fail the call when the batch is still empty.
+ Already-consumed items must be returned to avoid losing partial progress.
+ - Payload/decode/unknown errors still fail immediately.
"""
- prefetch_target = batch_size + max(prefetch_num, 0)
-
- # Inline minimal fetch loop with explicit prefetch_target to keep
- # ChannelConsumer.try_get_data signature aligned while still
- # honoring the calculated window size here.
results: List[Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]] = []
- # try_time is seconds in Python; Rust get_one expects milliseconds.
- timeout_ms: Optional[int]
- if try_time is None:
- timeout_ms = None
- else:
- # Compatibility: try_time must not be 0; if callers pass 0, treat it as 1 second.
- t_sec = try_time if try_time > 0 else 1
- timeout_ms = int(t_sec * 1000)
- assert timeout_ms > 0
-
+
for _ in range(batch_size):
try:
- # Pass timeout_ms (converted from try_time seconds) to Rust.
- obj = self._handle.get_one(prefetch_target, timeout_ms) # type: ignore[attr-defined]
+ obj = fetch_one(prefetch_target, timeout_ms)
except Exception as e:
- logging.error("%s get_one failed: %s", self.dbg_tag(), e)
# Rust is expected to raise an extension-layer ApiError. To avoid carrying
# arbitrary Exception types in Result, wrap non-ApiError into
# MqGetDataUnknownError to keep the error taxonomy narrow.
@@ -1126,6 +1283,12 @@ def get_data(
api_err = MqGetDataUnknownError.from_exception(
e, channel_id=self._chan_id, consumer_id=self._consumer_id
)
+ if isinstance(api_err, (MessageConsumptionNoNewMessageError, ChannelClosedError)):
+ logging.debug("%s get_one finished without payload: %s", self.dbg_tag(), api_err)
+ if results:
+ return Result(results)
+ else:
+ logging.error("%s get_one failed: %s", self.dbg_tag(), api_err)
return Result[
List[
Union[Dict[str, Union[int, float, bool, str, bytes, DLPacked]], ConsumedMessage]
diff --git a/fluxon_py/kvclient/fluxon.py b/fluxon_py/kvclient/fluxon.py
index 1325e3d..6a4dacc 100644
--- a/fluxon_py/kvclient/fluxon.py
+++ b/fluxon_py/kvclient/fluxon.py
@@ -299,6 +299,9 @@ def __init__(self, config: FluxonKvClientConfig):
self._client: Optional[fluxon_pyo3.KvClient] = None
self._config = config
self._init_error: Optional[ApiError] = None
+ self._client_op_lock = threading.RLock()
+ self._closing = False
+ self._closed = False
cluster_name = config.fluxonkv_spec_cluster_name
self._blocking_put_outer_total_log_window = _BlockingPutOuterTotalLogWindow(
f"FluxonKVCacheStore[{cluster_name}]"
@@ -776,20 +779,31 @@ def instance_key(self) -> Result[str, ApiError]:
def close(self) -> Result[OkNone, ApiError]:
"""Close and tear down the store."""
try:
- # Backend returns a Result; MUST be explicitly consumed to avoid
- # leaking an unconsumed Result that triggers __del__ assertion.
- res = self._client.close()
- if not res.is_ok():
- # Propagate backend error (already an ApiError)
- return Result.new_error(res.unwrap_error())
- # Consume Ok(None-like) to satisfy strict consumption policy
- _ = res.unwrap()
- unregister_store_from_cleanup(self)
- # English note:
- # After a successful close, clear the backend handle to prevent any further calls and
- # allow deterministic resource release without relying on Python GC timing.
- self._client = None
+ with self._client_op_lock:
+ if self._closed:
+ return Result.new_ok(OkNone())
+ self._closing = True
+ if self._client is None:
+ self._closed = True
+ unregister_store_from_cleanup(self)
+ return Result.new_ok(OkNone())
+ # Backend returns a Result; MUST be explicitly consumed to avoid
+ # leaking an unconsumed Result that triggers __del__ assertion.
+ res = self._client.close()
+ if not res.is_ok():
+ # Propagate backend error (already an ApiError)
+ return Result.new_error(res.unwrap_error())
+ # Consume Ok(None-like) to satisfy strict consumption policy
+ _ = res.unwrap()
+ unregister_store_from_cleanup(self)
+ # English note:
+ # After a successful close, clear the backend handle to prevent any further calls and
+ # allow deterministic resource release without relying on Python GC timing.
+ self._client = None
+ self._closed = True
return Result.new_ok(OkNone())
+ except KeyboardInterrupt as e:
+ return Result.new_error(GeneralError(f"Store close interrupted: {str(e)}"))
except Exception as e:
return Result.new_error(GeneralError(f"Failed to close client: {str(e)}"))
@@ -892,7 +906,10 @@ def metrics_snapshot(self) -> MetricSnapshot:
# --- Fluxon-kv lease helpers (synchronous) ---
def allocate_lease(self, ttl_seconds: int) -> Result[int, ApiError]:
try:
- inner = self._client.allocate_lease(ttl_seconds)
+ with self._client_op_lock:
+ if self._closing or self._closed or self._client is None:
+ return Result.new_error(GeneralError("allocate_lease called after store close started"))
+ inner = self._client.allocate_lease(ttl_seconds)
if not inner.is_ok():
return Result.new_error(inner.unwrap_error())
lease_id = inner.unwrap()
@@ -903,7 +920,10 @@ def allocate_lease(self, ttl_seconds: int) -> Result[int, ApiError]:
def keepalive_lease(self, lease_id: int) -> Result[OkNone, ApiError]:
try:
- inner = self._client.keepalive_lease(lease_id, "kvclient")
+ with self._client_op_lock:
+ if self._closing or self._closed or self._client is None:
+ return Result.new_ok(OkNone())
+ inner = self._client.keepalive_lease(lease_id, "kvclient")
if not inner.is_ok():
return Result.new_error(inner.unwrap_error())
# Success returns a None-like sentinel from PyO3; normalize to OkNone
diff --git a/fluxon_py/runtime/__init__.py b/fluxon_py/runtime/__init__.py
index 692b741..fda3b65 100644
--- a/fluxon_py/runtime/__init__.py
+++ b/fluxon_py/runtime/__init__.py
@@ -8,6 +8,10 @@
"run_kv_master_service_blocking",
"start_kv_master_process",
"start_kv_master_process_with_config_b64",
+ "run_broker_blocking",
+ "run_broker_service_blocking",
+ "start_broker_process",
+ "start_broker_process_with_config_b64",
"run_owner_kvclient_blocking",
"run_owner_kvclient_service_blocking",
"start_owner_kvclient_process",
@@ -37,6 +41,10 @@
"run_kv_master_service_blocking": ("start_master", "run_kv_master_service_blocking"),
"start_kv_master_process": ("start_master", "start_kv_master_process"),
"start_kv_master_process_with_config_b64": ("start_master", "start_kv_master_process_with_config_b64"),
+ "run_broker_blocking": ("start_broker", "run_kv_broker_blocking"),
+ "run_broker_service_blocking": ("start_broker", "run_kv_broker_service_blocking"),
+ "start_broker_process": ("start_broker", "start_kv_broker_process"),
+ "start_broker_process_with_config_b64": ("start_broker", "start_kv_broker_process_with_config_b64"),
"run_owner_kvclient_blocking": ("start_owner_kvclient", "run_owner_kvclient_blocking"),
"run_owner_kvclient_service_blocking": ("start_owner_kvclient", "run_owner_kvclient_service_blocking"),
"start_owner_kvclient_process": ("start_owner_kvclient", "start_owner_kvclient_process"),
diff --git a/fluxon_py/runtime/start_broker.py b/fluxon_py/runtime/start_broker.py
new file mode 100644
index 0000000..dd7a70e
--- /dev/null
+++ b/fluxon_py/runtime/start_broker.py
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+
+from __future__ import annotations
+
+import argparse
+import subprocess
+from pathlib import Path
+import yaml
+
+from fluxon_py.tool import import_fluxon_pyo3_local
+
+from .process_runner import (
+ bind_current_process_parent_death_sigterm,
+ build_runtime_singleton_spec,
+ RuntimeConfigInput,
+ decode_runtime_config_b64,
+ encode_runtime_config_b64,
+ resolve_runtime_config_path,
+ run_singleton_process,
+ start_python_module_process,
+ start_python_module_process_with_config_b64,
+)
+
+
+BROKER_MODULE_NAME = "fluxon_py.runtime.start_broker"
+STOP_EXISTING_BROKER_TIMEOUT_SECONDS = 30
+BROKER_RUNTIME_CONFIG_FILENAME = "kv_broker.runtime.yaml"
+
+
+def run_kv_broker_blocking(
+ *,
+ workdir: Path,
+ config: RuntimeConfigInput | None = None,
+ config_path: Path | None = None,
+) -> None:
+ resolved_workdir = workdir.resolve()
+ resolved_config = resolve_runtime_config_path(
+ workdir=resolved_workdir,
+ runtime_config_filename=BROKER_RUNTIME_CONFIG_FILENAME,
+ config=config,
+ config_path=config_path,
+ )
+ singleton_spec = build_runtime_singleton_spec(
+ module_name=BROKER_MODULE_NAME,
+ entrypoint_path=Path(__file__),
+ workdir=workdir,
+ )
+ run_singleton_process(
+ config_path=resolved_config,
+ singleton_spec=singleton_spec,
+ stop_timeout_seconds=STOP_EXISTING_BROKER_TIMEOUT_SECONDS,
+ start_fn=lambda: run_kv_broker_service_blocking(
+ config_path=resolved_config,
+ workdir=resolved_workdir,
+ ),
+ )
+
+
+def run_kv_broker_service_blocking(*, config_path: Path, workdir: Path) -> None:
+ fluxon_pyo3 = import_fluxon_pyo3_local()
+ result = fluxon_pyo3.run_broker_blocking(str(config_path))
+ if not result.is_ok():
+ raise RuntimeError(f"run_broker_blocking failed: {result.unwrap_error()}")
+
+ _ = result.unwrap()
+
+
+def run_kv_broker_service_blocking_from_yaml_text(*, config_yaml: str) -> None:
+ config = yaml.safe_load(config_yaml)
+ if not isinstance(config, dict):
+ raise TypeError(f"broker config must decode to dict, got {type(config).__name__}")
+ fluxon_pyo3 = import_fluxon_pyo3_local()
+ result = fluxon_pyo3.run_broker_blocking(config)
+ if not result.is_ok():
+ raise RuntimeError(f"run_broker_blocking failed: {result.unwrap_error()}")
+
+ _ = result.unwrap()
+
+
+def start_kv_broker_process(
+ *,
+ workdir: Path | None = None,
+ config: RuntimeConfigInput | None = None,
+ config_path: Path | None = None,
+ log_path: Path | None = None,
+) -> subprocess.Popen[bytes]:
+ if config_path is None and isinstance(config, dict) and workdir is None:
+ return start_kv_broker_process_with_config_b64(config=config, log_path=log_path)
+ if workdir is None:
+ raise ValueError("workdir is required when config is not a dict and config_path is not provided")
+ resolved_workdir = workdir.resolve()
+ resolved_config = resolve_runtime_config_path(
+ workdir=resolved_workdir,
+ runtime_config_filename=BROKER_RUNTIME_CONFIG_FILENAME,
+ config=config,
+ config_path=config_path,
+ )
+ return start_python_module_process(
+ module_name=BROKER_MODULE_NAME,
+ config_path=resolved_config,
+ workdir=resolved_workdir,
+ extra_cli_args=(),
+ log_path=log_path,
+ )
+
+
+def start_kv_broker_process_with_config_b64(
+ *,
+ config: dict,
+ log_path: Path | None = None,
+) -> subprocess.Popen[bytes]:
+ return start_python_module_process_with_config_b64(
+ module_name=BROKER_MODULE_NAME,
+ config_b64=encode_runtime_config_b64(config),
+ extra_cli_args=(),
+ log_path=log_path,
+ )
+
+
+def main() -> None:
+ bind_current_process_parent_death_sigterm()
+ parser = argparse.ArgumentParser(description="Start Fluxon KV broker (blocking)")
+ parser.add_argument("-c", "--config", type=Path, required=False, help="Path to broker YAML config")
+ parser.add_argument("-w", "--workdir", type=Path, required=False, help="Working directory")
+ parser.add_argument("--config-b64", required=False, help="Base64-encoded YAML config")
+ args = parser.parse_args()
+ if args.config_b64 is not None:
+ # Keep the same config transport contract as other runtime entrypoints.
+ run_kv_broker_service_blocking_from_yaml_text(
+ config_yaml=decode_runtime_config_b64(args.config_b64)
+ )
+ return
+ if args.config is None or args.workdir is None:
+ raise ValueError("--config and --workdir are required when --config-b64 is not used")
+ run_kv_broker_blocking(config=args.config, workdir=args.workdir)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/fluxon_py/tests/test_api_chan_mpmc/test_api_chan_mpmc_base.py b/fluxon_py/tests/test_api_chan_mpmc/test_api_chan_mpmc_base.py
index f992c2d..8135242 100644
--- a/fluxon_py/tests/test_api_chan_mpmc/test_api_chan_mpmc_base.py
+++ b/fluxon_py/tests/test_api_chan_mpmc/test_api_chan_mpmc_base.py
@@ -45,6 +45,7 @@ def _find_project_root(start: Path) -> Path:
sys.path.insert(0, str(PROJECT_ROOT))
from typing import Dict, List, Optional, Tuple
+from types import SimpleNamespace
import etcd3
@@ -649,6 +650,17 @@ def scenario_dynamic_producer_consumer(
recovered_consumers: List[str] = []
test_mpmc_id: Optional[str] = None
+ def _print_process_log_tail(log_file: str, *, max_lines: int = 200) -> None:
+ print(f"=== subprocess log tail: {log_file} ===", flush=True)
+ try:
+ with open(log_file, "rb") as handle:
+ lines = handle.readlines()[-max_lines:]
+ for raw in lines:
+ print(raw.decode("utf-8", "replace").rstrip("\n"), flush=True)
+ except Exception as exc: # noqa: BLE001
+ print(f"failed to read subprocess log {log_file}: {exc}", flush=True)
+ print(f"=== end subprocess log tail: {log_file} ===", flush=True)
+
def fail_fast_on_subprocess_error(*, process_type_filter: Optional[str] = None) -> None:
for identifier, (process_type, proc, log_file) in process_handles_by_id.items():
if process_type_filter is not None and process_type != process_type_filter:
@@ -657,6 +669,7 @@ def fail_fast_on_subprocess_error(*, process_type_filter: Optional[str] = None)
if rc is None:
continue
if rc != 0:
+ _print_process_log_tail(log_file)
raise RuntimeError(
f"{process_type} {identifier} exited early with return code {rc}. "
f"Check log file for details: {log_file}"
@@ -680,6 +693,7 @@ def wait_all_of_type(process_type: str, *, timeout_s: int) -> None:
print(f"{ptype} {identifier} completed successfully")
print(f"Log file: {log_file}")
continue
+ _print_process_log_tail(log_file)
raise RuntimeError(
f"{ptype} {identifier} failed with return code {proc.returncode}."
f" Check log file for details: {log_file}"
@@ -1399,6 +1413,33 @@ def test_mpmc_dynamic_suite() -> None:
run_with_argmatrix(_test_mpmc_dynamic_suite_once)
+def test_mpmc_get_data_prefetch_is_per_consumer_not_divided() -> None:
+ calls: List[Tuple[int, Optional[int], int]] = []
+
+ class _DummyInnerConsumer:
+ def get_data(
+ self,
+ batch_size: int,
+ try_time: Optional[int] = None,
+ prefetch_num: int = 0,
+ ) -> Result[List[Dict[str, object]], ApiError]:
+ calls.append((batch_size, try_time, prefetch_num))
+ return Result.new_ok([])
+
+ consumer = object.__new__(MPMCChanConsumer)
+ consumer.shutdown_ctl = mpsc.MqShutdownCtl()
+ consumer.mpmc_id = "123"
+ consumer.mpmc_channel = SimpleNamespace(
+ _get_active_consumer_count=lambda: 8,
+ )
+ consumer.mpsc_consumer = _DummyInnerConsumer()
+
+ res = consumer.get_data(batch_size=40, try_time=2, prefetch_num=40)
+
+ assert res.is_ok()
+ assert calls == [(40, 2, 40)]
+
+
if __name__ == "__main__":
diff --git a/fluxon_py/tests/test_api_chan_mpmc/test_mpmc_simple_bench.py b/fluxon_py/tests/test_api_chan_mpmc/test_mpmc_simple_bench.py
index a29c46f..903ba7f 100644
--- a/fluxon_py/tests/test_api_chan_mpmc/test_mpmc_simple_bench.py
+++ b/fluxon_py/tests/test_api_chan_mpmc/test_mpmc_simple_bench.py
@@ -52,10 +52,12 @@ def _find_project_root(start: Path) -> Path:
from fluxon_py import FluxonKvClientConfig, new_store # noqa: E402
from fluxon_py.api_error import ( # noqa: E402
ChannelClosedError,
+ KeyNotFoundError,
MessageConsumptionNoNewMessageError,
ProducerClosedError,
)
from fluxon_py.api_ext_chan import ChanType # noqa: E402
+from fluxon_py._api_ext_chan.mpsc import MpscContext # noqa: E402
from fluxon_py.kvclient import KvClientType # noqa: E402
from fluxon_py.kvclient.nonzerocopy_encode import DLPackBytesView # noqa: E402
from fluxon_py.logging import init_logger # noqa: E402
@@ -382,17 +384,23 @@ def _run_one_case(
)
_put_etcd_key(stop_key, b"1")
time.sleep(SUMMARY_STOP_GRACE_SECONDS)
- _signal_live_processes(worker_processes, signum=signal.SIGINT)
try:
_wait_for_processes_exit(worker_processes, timeout_seconds=WORKER_EXIT_TIMEOUT_SECONDS)
except RuntimeError as err:
+ _signal_live_processes(worker_processes, signum=signal.SIGINT)
logging.warning("[bench] worker shutdown timeout bench_id=%s error=%s", bench_id, err)
+ raise
else:
- _warn_if_worker_exited_nonzero(worker_processes, bench_id=bench_id)
+ _raise_if_worker_exited_nonzero(worker_processes, bench_id=bench_id)
finally:
_terminate_processes(worker_processes)
_delete_etcd_key(stop_key)
_clear_etcd_prefix(f"{SUMMARY_KEY_PREFIX}{bench_id}/")
+ if bootstrap_store is not None and bootstrap_producer is not None:
+ _best_effort_delete_case_broker_channels(
+ store=bootstrap_store,
+ mpmc_id=str(bootstrap_producer.get_chan_id()),
+ )
if bootstrap_producer is not None:
_best_effort_close(bootstrap_producer, role="bootstrap_producer")
_best_effort_close(bootstrap_store, role="bootstrap_store")
@@ -985,18 +993,18 @@ def _index_summaries_by_consumer_id(summaries: list[dict[str, Any]]) -> dict[str
return indexed
-def _warn_if_worker_exited_nonzero(processes: list[subprocess.Popen[str]], *, bench_id: str) -> None:
+def _raise_if_worker_exited_nonzero(processes: list[subprocess.Popen[str]], *, bench_id: str) -> None:
+ failures: list[str] = []
for proc in processes:
return_code = proc.poll()
if return_code is None:
continue
if return_code != 0:
- logging.warning(
- "[bench] worker exited non-zero during teardown bench_id=%s pid=%s code=%s",
- bench_id,
- proc.pid,
- return_code,
- )
+ failures.append(f"pid={proc.pid} code={return_code}")
+ if failures:
+ raise RuntimeError(
+ f"worker exited non-zero during teardown bench_id={bench_id}: {', '.join(failures)}"
+ )
def _maybe_write_consumer_summary(
@@ -1215,6 +1223,71 @@ def _clear_etcd_prefix(prefix: str) -> None:
etcd_client.delete(meta.key)
+def _best_effort_delete_case_broker_channels(*, store: Any, mpmc_id: str) -> None:
+ if not isinstance(mpmc_id, str) or not mpmc_id.isdigit():
+ logging.warning("[bench] skip broker cleanup for invalid mpmc_id=%r", mpmc_id)
+ return
+
+ channels_key = f"/mpmc_channels/{mpmc_id}/mpsc_channels"
+ try:
+ with etcd3.client(ETCD_HOST, ETCD_PORT) as etcd_client:
+ raw, _ = etcd_client.get(channels_key)
+ if raw is None:
+ return
+ loaded = json.loads(raw.decode("utf-8"))
+ if not isinstance(loaded, list):
+ raise TypeError(f"{channels_key} must contain a list, got {type(loaded).__name__}")
+
+ ctx = MpscContext(store)
+ payload_key_count = 0
+ payload_delete_ok = 0
+ payload_delete_failed = 0
+ try:
+ for chan_id in loaded:
+ if not isinstance(chan_id, str) or not chan_id.isdigit():
+ raise ValueError(f"invalid sub-MPSC channel id in {channels_key}: {chan_id!r}")
+ payload_keys = ctx.delete_broker_channel(chan_id)
+ payload_key_count += len(payload_keys)
+ for payload_key in payload_keys:
+ res = store.remove(payload_key)
+ if res.is_ok():
+ _ = res.unwrap()
+ payload_delete_ok += 1
+ continue
+ err = res.unwrap_error()
+ if isinstance(err, KeyNotFoundError):
+ payload_delete_ok += 1
+ continue
+ payload_delete_failed += 1
+ logging.warning(
+ "[bench] broker payload cleanup failed key=%s err=%s",
+ payload_key,
+ err,
+ )
+ finally:
+ ctx.close()
+ logging.info(
+ "[bench] deleted broker channels for mpmc_id=%s count=%s payload_keys=%s payload_delete_ok=%s payload_delete_failed=%s",
+ mpmc_id,
+ len(loaded),
+ payload_key_count,
+ payload_delete_ok,
+ payload_delete_failed,
+ )
+ print(
+ "BENCH_BROKER_CLEANUP "
+ f"mpmc_id={mpmc_id} channels={len(loaded)} payload_keys={payload_key_count} "
+ f"payload_delete_ok={payload_delete_ok} payload_delete_failed={payload_delete_failed}",
+ flush=True,
+ )
+ except Exception as err: # noqa: BLE001
+ logging.warning(
+ "[bench] broker channel cleanup failed for mpmc_id=%s: %s",
+ mpmc_id,
+ err,
+ )
+
+
def _best_effort_close(obj: Any, *, role: str) -> None:
close_res = obj.close()
if close_res.is_ok():
diff --git a/fluxon_py/tests/test_api_chan_mpsc/test_api_chan_mpsc_base.py b/fluxon_py/tests/test_api_chan_mpsc/test_api_chan_mpsc_base.py
index 884c748..f40a046 100644
--- a/fluxon_py/tests/test_api_chan_mpsc/test_api_chan_mpsc_base.py
+++ b/fluxon_py/tests/test_api_chan_mpsc/test_api_chan_mpsc_base.py
@@ -30,6 +30,8 @@
ChanKeyNotFoundError,
ChanMessageConsumptionError,
ChanMessageProduceError,
+ ChannelClosedError,
+ MessageConsumptionNoNewMessageError,
ConsumerRegistrationError,
ProducerRegistrationError,
)
@@ -54,6 +56,7 @@
from fluxon_py._api_ext_chan.mpsc import ( # noqa: E402
_new_produce_offset_of_all_producer_key,
)
+from fluxon_py._api_ext_chan import mpsc # noqa: E402
from fluxon_py.logging import init_logger # noqa: E402
from fluxon_py.tests.test_lib import ( # noqa: E402
KV_SVC_IP,
@@ -1601,6 +1604,119 @@ def test_mpsc_channel_suite() -> None:
run_with_argmatrix(_test_mpsc_channel_suite_once)
+def test_mpsc_get_data_clamps_prefetch_target() -> None:
+ consumer = object.__new__(MPSCChanConsumer)
+ consumer.shutdown_ctl = mpsc.MqShutdownCtl()
+ consumer._chan_id = "1"
+ consumer._consumer_id = "2"
+ consumer._dbg_tag = "[MPSCChanConsumer chan_id=1 consumer_idx=2]"
+ consumer._closed_local = True
+
+ observed_targets: List[int] = []
+
+ class _DummyHandle:
+ def get_one_legacy_for_internal_check(
+ self,
+ prefetch_target: int,
+ timeout_ms: Optional[int],
+ ) -> Dict[str, bytes]:
+ observed_targets.append(prefetch_target)
+ return {"payload": b"x"}
+
+ consumer._handle = _DummyHandle()
+
+ res = consumer._get_data_legacy_for_internal_check(
+ batch_size=40,
+ try_time=1,
+ prefetch_num=400,
+ )
+
+ assert res.is_ok()
+ assert observed_targets
+ assert all(target == mpsc.MPSC_PREFETCH_TARGET_MAX for target in observed_targets)
+
+
+def test_mpsc_get_data_returns_partial_batch_on_no_message() -> None:
+ consumer = object.__new__(MPSCChanConsumer)
+ consumer.shutdown_ctl = mpsc.MqShutdownCtl()
+ consumer._chan_id = "1"
+ consumer._consumer_id = "2"
+ consumer._dbg_tag = "[MPSCChanConsumer chan_id=1 consumer_idx=2]"
+ consumer._closed_local = True
+
+ class _DummyHandle:
+ def get_batch(
+ self,
+ batch_size: int,
+ prefetch_target: int,
+ timeout_ms: Optional[int],
+ ) -> List[Dict[str, bytes]]:
+ del batch_size, prefetch_target, timeout_ms
+ return [{"payload": b"x"}]
+
+ consumer._handle = _DummyHandle()
+
+ res = consumer.get_data(batch_size=8, try_time=1, prefetch_num=0)
+
+ assert res.is_ok()
+ assert res.unwrap() == [{"payload": b"x"}]
+
+
+def test_mpsc_get_data_returns_partial_batch_on_channel_closed() -> None:
+ consumer = object.__new__(MPSCChanConsumer)
+ consumer.shutdown_ctl = mpsc.MqShutdownCtl()
+ consumer._chan_id = "1"
+ consumer._consumer_id = "2"
+ consumer._dbg_tag = "[MPSCChanConsumer chan_id=1 consumer_idx=2]"
+ consumer._closed_local = True
+
+ class _DummyHandle:
+ def get_batch(
+ self,
+ batch_size: int,
+ prefetch_target: int,
+ timeout_ms: Optional[int],
+ ) -> List[Dict[str, bytes]]:
+ del batch_size, prefetch_target, timeout_ms
+ return [{"payload": b"x"}]
+
+ consumer._handle = _DummyHandle()
+
+ res = consumer.get_data(batch_size=8, try_time=1, prefetch_num=0)
+
+ assert res.is_ok()
+ assert res.unwrap() == [{"payload": b"x"}]
+
+
+def test_mpsc_get_data_broker_passes_prefetch_target_to_batch() -> None:
+ consumer = object.__new__(MPSCChanConsumer)
+ consumer.shutdown_ctl = mpsc.MqShutdownCtl()
+ consumer._chan_id = "1"
+ consumer._consumer_id = "2"
+ consumer._dbg_tag = "[MPSCChanConsumer chan_id=1 consumer_idx=2]"
+ consumer._closed_local = True
+
+ observed: List[int] = []
+
+ class _DummyHandle:
+ def get_batch(
+ self,
+ batch_size: int,
+ prefetch_target: int,
+ timeout_ms: Optional[int],
+ ) -> List[Dict[str, bytes]]:
+ del batch_size, timeout_ms
+ observed.append(prefetch_target)
+ return [{"payload": b"x"}]
+
+ consumer._handle = _DummyHandle()
+
+ res = consumer.get_data(batch_size=40, try_time=1, prefetch_num=400)
+
+ assert res.is_ok()
+ assert observed == [mpsc.MPSC_PREFETCH_TARGET_MAX]
+
+
def test_new_or_bind_unique_key_namespace_collision() -> None:
setup_test_environment(logging)
env = create_channel_env()
diff --git a/fluxon_py/tests/test_lib.py b/fluxon_py/tests/test_lib.py
index 9be7003..41e4557 100644
--- a/fluxon_py/tests/test_lib.py
+++ b/fluxon_py/tests/test_lib.py
@@ -173,10 +173,11 @@ def setup_test_environment(logger: Logger, print_config: bool = True):
# except RuntimeError as e:
# print(f"Failed to set start method to spawn: {e}, current start method: {multiprocessing.get_start_method()}")
- loglevel_str="DEBUG"
+ loglevel_str = os.environ.get("FLUXON_LOG") or os.environ.get("LOG_LEVEL") or "DEBUG"
+ loglevel_str = str(loglevel_str).upper()
os.environ["LOG_LEVEL"] = loglevel_str
os.environ["FLUXON_LOG"] = loglevel_str
- LOGGING_LEVEL= logging.DEBUG
+ LOGGING_LEVEL = getattr(logging, loglevel_str, logging.DEBUG)
update_log_level(loglevel_str)
print("=================================================")
@@ -190,7 +191,7 @@ def emit(self, record):
self.flush() # Flush immediately for every log record
handler = FlushStreamHandler(sys.stdout)
- handler.setLevel(logging.DEBUG)
+ handler.setLevel(LOGGING_LEVEL)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
diff --git a/fluxon_py/tests/test_mq/test_example_ctrl_c_exit.py b/fluxon_py/tests/test_mq/test_example_ctrl_c_exit.py
index c1b3193..8242f77 100644
--- a/fluxon_py/tests/test_mq/test_example_ctrl_c_exit.py
+++ b/fluxon_py/tests/test_mq/test_example_ctrl_c_exit.py
@@ -51,6 +51,7 @@ def _find_project_root(start: Path) -> Path:
CHAN_CONFIG_TEST = {"capacity": 10, "ttl_seconds": 90, "weight": 1}
MASTER_SCRIPT = [sys.executable, "-m", "fluxon_py.runtime.start_master"]
+BROKER_SCRIPT = [sys.executable, "-m", "fluxon_py.runtime.start_broker"]
KVCLIENT_SCRIPT = [sys.executable, "-m", "fluxon_py.runtime.start_owner_kvclient"]
ETCD_BIN = PROJECT_ROOT / "fluxon_release" / "ext_images" / "etcd" / "etcd"
GREPTIME_BIN = PROJECT_ROOT / "fluxon_release" / "ext_images" / "greptime" / "greptime"
@@ -191,7 +192,7 @@ def _on_ctrlc(reason: str) -> None:
import yaml
from fluxon_py.api_ext_chan import ChanRole, ChanType, MPMCChanConsumer, new_or_bind_with_unique_key
-from fluxon_py.api_error import ChannelClosedError
+from fluxon_py.api_error import ChannelClosedError, MessageConsumptionNoNewMessageError
from fluxon_py.config import FluxonKvClientConfig
from fluxon_py.kvclient import new_store
from fluxon_py.logging import init_logger
@@ -279,6 +280,10 @@ def _on_ctrlc(reason: str) -> None:
if isinstance(err, ChannelClosedError):
logger.info("[consumer] close observed, exit loop")
break
+ if isinstance(err, MessageConsumptionNoNewMessageError):
+ if shutdown_requested.wait(0.2):
+ break
+ continue
raise SystemExit(f"get_data failed: {err}")
for raw in res.unwrap() or []:
payload = raw.get("payload", b"") if isinstance(raw, dict) else raw
@@ -463,6 +468,7 @@ def _build_example_config(
share_mem_path: str,
greptime_http_port: int,
master_port: int,
+ broker_port: int,
) -> dict[str, Any]:
capacity = max(128, int(CHAN_CONFIG_TEST["capacity"]))
ttl_seconds = max(90, int(CHAN_CONFIG_TEST["ttl_seconds"]))
@@ -475,6 +481,14 @@ def _build_example_config(
"log_dir": str((Path(share_mem_path).parent / "log" / "master").resolve()),
"monitoring": _monitoring_block(greptime_http_port=greptime_http_port),
},
+ "broker": {
+ "instance_key": f"example_ctrlc_broker_{unique_suffix}",
+ "fluxonkv_spec": {
+ "cluster_name": cluster_name,
+ "share_mem_path": share_mem_path,
+ "p2p_listen_port": broker_port,
+ },
+ },
"kvclient": {
"instance_key": f"example_ctrlc_owner_{unique_suffix}",
"contribute_to_cluster_pool_size": {"dram": 1073741824, "vram": {}},
@@ -589,6 +603,7 @@ def _start_local_stack(*, temp_root: Path, config_path: Path) -> list[tuple[subp
cluster_name = f"example_ctrlc_cluster_{unique_suffix}"
share_mem_path = str((temp_root / "sharemem").resolve())
master_port = _pick_free_port()
+ broker_port = _pick_free_port()
config = _build_example_config(
unique_suffix=unique_suffix,
cluster_name=cluster_name,
@@ -596,14 +611,17 @@ def _start_local_stack(*, temp_root: Path, config_path: Path) -> list[tuple[subp
share_mem_path=share_mem_path,
greptime_http_port=greptime_http_port,
master_port=master_port,
+ broker_port=broker_port,
)
config_path.write_text(
yaml.safe_dump(config, sort_keys=False),
encoding="utf-8",
)
master_config_path = temp_root / "master.yaml"
+ broker_config_path = temp_root / "broker.yaml"
kvclient_config_path = temp_root / "kvclient.yaml"
_write_runtime_subconfig(path=master_config_path, config=config, key="master")
+ _write_runtime_subconfig(path=broker_config_path, config=config, key="broker")
_write_runtime_subconfig(path=kvclient_config_path, config=config, key="kvclient")
master_proc = _spawn_logged(
@@ -643,8 +661,25 @@ def _start_local_stack(*, temp_root: Path, config_path: Path) -> list[tuple[subp
proc=kvclient_proc,
log_path=kvclient_log,
)
+
+ broker_log = temp_root / "log" / "broker.log"
+ broker_proc = _spawn_logged(
+ cmd=[
+ *BROKER_SCRIPT,
+ "-c",
+ str(broker_config_path),
+ "-w",
+ str((temp_root / "broker_work").resolve()),
+ ],
+ workdir=PROJECT_ROOT,
+ log_path=broker_log,
+ env=env,
+ )
+ time.sleep(2.0)
+ _require_process_running(broker_proc, label="broker", log_path=broker_log)
return [
(kvclient_proc, kvclient_log),
+ (broker_proc, broker_log),
(master_proc, master_log),
(etcd_proc, etcd_log),
(greptime_proc, greptime_log),
diff --git a/fluxon_rs/Cargo.lock b/fluxon_rs/Cargo.lock
index a4b0ecd..964cd8c 100644
--- a/fluxon_rs/Cargo.lock
+++ b/fluxon_rs/Cargo.lock
@@ -1230,6 +1230,7 @@ dependencies = [
"fluxon_commu",
"fluxon_framework",
"fluxon_framework_compiled",
+ "fluxon_mq",
"fluxon_observability",
"fluxon_util",
"futures",
@@ -1275,6 +1276,7 @@ version = "0.2.1"
dependencies = [
"anyhow",
"async-trait",
+ "bitcode",
"downcast-rs",
"etcd-client",
"fluxon_commu",
diff --git a/fluxon_rs/fluxon_commu/src/facade/p2p.rs b/fluxon_rs/fluxon_commu/src/facade/p2p.rs
index 8bcc169..79114f1 100644
--- a/fluxon_rs/fluxon_commu/src/facade/p2p.rs
+++ b/fluxon_rs/fluxon_commu/src/facade/p2p.rs
@@ -93,6 +93,19 @@ pub mod __hidden {
self.view.upgrade()
}
+ pub fn try_with_cluster_manager(
+ &self,
+ f: impl FnOnce(&crate::cluster_manager::ClusterManager) -> R,
+ ) -> Option {
+ let arc_view = self.view.upgrade()?;
+ unsafe {
+ let ptr =
+ std::ptr::NonNull::new(Arc::as_ptr(&arc_view) as *const _ as *mut _).unwrap();
+ let view_ref: &dyn P2pModuleViewTrait = ptr.as_ref();
+ Some(f(view_ref.cluster_manager()))
+ }
+ }
+
pub fn resource_registry(&self) -> &ResourceRegistry {
let arc_view = self.view.upgrade().expect(
"view of module P2pModule has been dropped when accessing resource registry",
@@ -489,11 +502,6 @@ impl P2pModule {
return true;
}
let view = self.module_view();
- let cm = view.cluster_manager();
- let self_info = cm.get_self_info();
- if self_info.node_role() != crate::NodeRole::External {
- return false;
- }
let snapshot = self.cached_tier_snapshot();
let Some(peer_gen) = snapshot.peer_gen(logical_peer) else {
return false;
@@ -501,24 +509,31 @@ impl P2pModule {
if !snapshot.is_send_ready_intra_effective(&peer_gen) {
return false;
}
- let Some(owner_id) = self_info
- .metadata
- .get(crate::META_KEY_SHARED_STORAGE_NODE_ID)
- else {
- return false;
- };
- if logical_peer.as_ref() == owner_id.as_str() {
- return false;
- }
- let Some(handle) = cm.ipc_bandwidth_attributor_handle() else {
- return false;
- };
- match direction {
- "tx" => handle.record_rx_bytes(bytes),
- "rx" => handle.record_tx_bytes(bytes),
- _ => return false,
- }
- true
+ view.try_with_cluster_manager(|cm| {
+ let self_info = cm.get_self_info();
+ if self_info.node_role() != crate::NodeRole::External {
+ return false;
+ }
+ let Some(owner_id) = self_info
+ .metadata
+ .get(crate::META_KEY_SHARED_STORAGE_NODE_ID)
+ else {
+ return false;
+ };
+ if logical_peer.as_ref() == owner_id.as_str() {
+ return false;
+ }
+ let Some(handle) = cm.ipc_bandwidth_attributor_handle() else {
+ return false;
+ };
+ match direction {
+ "tx" => handle.record_rx_bytes(bytes),
+ "rx" => handle.record_tx_bytes(bytes),
+ _ => return false,
+ }
+ true
+ })
+ .unwrap_or(false)
}
}
diff --git a/fluxon_rs/fluxon_commu/src/facade/transfer_engine.rs b/fluxon_rs/fluxon_commu/src/facade/transfer_engine.rs
index 878e5c6..e5353a5 100644
--- a/fluxon_rs/fluxon_commu/src/facade/transfer_engine.rs
+++ b/fluxon_rs/fluxon_commu/src/facade/transfer_engine.rs
@@ -74,12 +74,10 @@ impl ClosedLocalSegmentLeaseRegistry {
where
G: Send + Sync + 'static,
{
- let boxed = self
- .guards
- .lock()
- .await
- .remove(&handle)
- .ok_or_else(|| format!("closed sdk local segment lease handle {handle} not found"))?;
+ let boxed =
+ self.guards.lock().await.remove(&handle).ok_or_else(|| {
+ format!("closed sdk local segment lease handle {handle} not found")
+ })?;
boxed.downcast::().map(|guard| *guard).map_err(|_| {
format!(
"closed sdk local segment lease handle {handle} has unexpected runtime guard type"
@@ -461,7 +459,7 @@ impl ClientTransferEngineCore {
len,
seg_guard,
)
- .await
+ .await
}
}
@@ -482,7 +480,11 @@ impl ClientTransferEngineCore {
let initial_local_segment_guard = match seg_guard {
Some(guard) => Some(guard),
None if runtime.supports_local_segment_transfer() => {
- let local_addr = if peer_src_or_target { target_addr } else { src_addr };
+ let local_addr = if peer_src_or_target {
+ target_addr
+ } else {
+ src_addr
+ };
match runtime.ensure_local_segment_guard(local_addr, None).await {
Ok(guard) => Some(guard),
Err(_) => None,
diff --git a/fluxon_rs/fluxon_commu_closed_sdk_consumer/src/lib.rs b/fluxon_rs/fluxon_commu_closed_sdk_consumer/src/lib.rs
index 6fab54e..caad34b 100644
--- a/fluxon_rs/fluxon_commu_closed_sdk_consumer/src/lib.rs
+++ b/fluxon_rs/fluxon_commu_closed_sdk_consumer/src/lib.rs
@@ -11,9 +11,9 @@ use fluxon_commu_contract::{
ClosedRuntimeCallRawObservedOutputView, ClosedRuntimeClusterEventStreamItem,
ClosedRuntimeClusterManagerCall, ClosedRuntimeClusterManagerResponse,
ClosedRuntimeClusterRdmaResolvedConfigStreamItem, ClosedRuntimeDesiredTransferPeer,
- ClosedRuntimeDispatchRequestView,
- ClosedRuntimeDispatchResponse, ClosedRuntimeDispatchTransportPolicy, ClosedRuntimeError,
- ClosedRuntimeHandle, ClosedRuntimeHostCallbackHandle, ClosedRuntimeP2pCall,
+ ClosedRuntimeDispatchRequestView, ClosedRuntimeDispatchResponse,
+ ClosedRuntimeDispatchTransportPolicy, ClosedRuntimeError, ClosedRuntimeHandle,
+ ClosedRuntimeHostCallbackHandle, ClosedRuntimeP2pCall,
ClosedRuntimeP2pCallRawObservedRequestView, ClosedRuntimeP2pResponse,
ClosedRuntimeP2pSendResponseRawRequestView, ClosedRuntimePeerGen, ClosedRuntimeRawSlice,
ClosedRuntimeRequest, ClosedRuntimeResponse, ClosedRuntimeTransferEngineCall,
@@ -491,11 +491,15 @@ impl WireBodyPartsOwner {
let (raw_lengths, raw_payload) = match raw_bytes.len() {
0 => (WireBodyRawLengths::Empty, WireBodyRawPayload::Empty),
1 => {
- let part = raw_bytes.into_iter().next().expect("single raw part missing");
- let len =
- u32::try_from(part.len()).map_err(|_| ClosedSdkConsumerError::RuntimeDecode {
+ let part = raw_bytes
+ .into_iter()
+ .next()
+ .expect("single raw part missing");
+ let len = u32::try_from(part.len()).map_err(|_| {
+ ClosedSdkConsumerError::RuntimeDecode {
detail: format!("wire raw part too large for u32 length: {}", part.len()),
- })?;
+ }
+ })?;
(
WireBodyRawLengths::Single([len]),
WireBodyRawPayload::Single(part),
@@ -849,8 +853,7 @@ fn decode_call_raw_observed_output_view(
return Err(ClosedSdkConsumerError::RuntimeDecode {
detail: format!(
"closed SDK call_raw_observed serialize_part overflow: serialize_len={} full_len={}",
- message_view.body.serialize_part.len,
- message_view.body.full_body.len,
+ message_view.body.serialize_part.len, message_view.body.full_body.len,
),
});
}
@@ -860,21 +863,19 @@ fn decode_call_raw_observed_output_view(
.ok_or_else(|| ClosedSdkConsumerError::RuntimeDecode {
detail: "closed SDK call_raw_observed raw_bytes length overflow".to_string(),
})?;
- let expected_full_len =
- message_view
- .body
- .serialize_part
- .len
- .checked_add(raw_total)
- .ok_or_else(|| ClosedSdkConsumerError::RuntimeDecode {
- detail: "closed SDK call_raw_observed body length overflow".to_string(),
- })?;
+ let expected_full_len = message_view
+ .body
+ .serialize_part
+ .len
+ .checked_add(raw_total)
+ .ok_or_else(|| ClosedSdkConsumerError::RuntimeDecode {
+ detail: "closed SDK call_raw_observed body length overflow".to_string(),
+ })?;
if expected_full_len != message_view.body.full_body.len {
return Err(ClosedSdkConsumerError::RuntimeDecode {
detail: format!(
"closed SDK call_raw_observed body length mismatch: expected={} full_len={}",
- expected_full_len,
- message_view.body.full_body.len,
+ expected_full_len, message_view.body.full_body.len,
),
});
}
@@ -923,9 +924,7 @@ fn decode_call_raw_observed_output_view(
frame_recv_done_ts_us: message_view.local_observe.frame_recv_done_ts_us,
dispatch_enqueued_ts_us: message_view.local_observe.dispatch_enqueued_ts_us,
dispatch_started_ts_us: message_view.local_observe.dispatch_started_ts_us,
- complete_pending_call_ts_us: message_view
- .local_observe
- .complete_pending_call_ts_us,
+ complete_pending_call_ts_us: message_view.local_observe.complete_pending_call_ts_us,
},
},
observe: fluxon_commu_contract::ClosedRuntimeRpcCallTransportObserveTrace {
@@ -1550,8 +1549,8 @@ async fn invoke_completion_async_with_keepalive(
) -> i32,
) -> Result<(i32, Bytes), ClosedSdkConsumerError> {
let (sender, receiver) = tokio::sync::oneshot::channel::<(i32, Bytes)>();
- let user_data = Box::into_raw(Box::new(RuntimeCompletionState { sender, keepalive }))
- .cast::();
+ let user_data =
+ Box::into_raw(Box::new(RuntimeCompletionState { sender, keepalive })).cast::();
let submit_status = submit(user_data, Some(runtime_completion_callback));
if submit_status != 0 {
unsafe {
@@ -2082,7 +2081,9 @@ pub async fn p2p_call_raw_observed(
)
.await?;
match status_code {
- FLUXON_COMMU_CLOSED_RUNTIME_RESULT_OK => decode_call_raw_observed_output_view(payload.as_ref()),
+ FLUXON_COMMU_CLOSED_RUNTIME_RESULT_OK => {
+ decode_call_raw_observed_output_view(payload.as_ref())
+ }
FLUXON_COMMU_CLOSED_RUNTIME_RESULT_ERR => {
let error = bitcode::decode::(payload.as_ref()).map_err(
|decode_error| ClosedSdkConsumerError::RuntimeDecode {
diff --git a/fluxon_rs/fluxon_fs/src/agent_service/transfer_agent.rs b/fluxon_rs/fluxon_fs/src/agent_service/transfer_agent.rs
index 1738ade..ca54a71 100644
--- a/fluxon_rs/fluxon_fs/src/agent_service/transfer_agent.rs
+++ b/fluxon_rs/fluxon_fs/src/agent_service/transfer_agent.rs
@@ -9,28 +9,23 @@ use std::time::{Duration, Instant};
use fluxon_fs_core::config::{
FS_AGENT_TRANSFER_STREAM_CLOSE_RPC_PATH, FS_AGENT_TRANSFER_STREAM_NEXT_RPC_PATH,
- FS_AGENT_TRANSFER_STREAM_OPEN_RPC_PATH,
- FS_MASTER_TRANSFER_SCHEDULER_HEARTBEAT_RPC_PATH, FS_MASTER_TRANSFER_SCHEDULER_RESULT_RPC_PATH,
- FluxonFsTransferBatchCollectInfoWire, FluxonFsTransferBatchKind,
- FluxonFsTransferCollectInfoKind, FluxonFsTransferDispositionWire,
- FluxonFsTransferFailedFileReasonKindWire,
- FluxonFsTransferReadStreamCloseWire, FluxonFsTransferReadStreamNextResultWire,
- FluxonFsTransferReadStreamNextWire, FluxonFsTransferReadStreamOpenResultWire,
- FluxonFsTransferReadStreamOpenWire,
- FluxonFsTransferSkipEntryKind, FluxonFsTransferSkipEntryWire,
- FluxonFsTransferManifestEntryWire, FluxonFsTransferManifestWire,
- FluxonFsTransferScanMode,
- FluxonFsTransferScanEventAckWire, FluxonFsTransferScanEventKindWire,
- FluxonFsTransferScanEventWire, FluxonFsTransferScanLaunchResultWire,
+ FS_AGENT_TRANSFER_STREAM_OPEN_RPC_PATH, FS_MASTER_TRANSFER_SCHEDULER_HEARTBEAT_RPC_PATH,
+ FS_MASTER_TRANSFER_SCHEDULER_RESULT_RPC_PATH, FluxonFsTransferBatchCollectInfoWire,
+ FluxonFsTransferBatchKind, FluxonFsTransferCollectInfoKind, FluxonFsTransferDispositionWire,
+ FluxonFsTransferFailedFileReasonKindWire, FluxonFsTransferManifestEntryWire,
+ FluxonFsTransferManifestWire, FluxonFsTransferReadStreamCloseWire,
+ FluxonFsTransferReadStreamNextResultWire, FluxonFsTransferReadStreamNextWire,
+ FluxonFsTransferReadStreamOpenResultWire, FluxonFsTransferReadStreamOpenWire,
FluxonFsTransferScanAssignmentWire, FluxonFsTransferScanBatchWire,
- FluxonFsTransferScanChildUnitWire, FluxonFsTransferScanFrontier,
+ FluxonFsTransferScanChildUnitWire, FluxonFsTransferScanEventAckWire,
+ FluxonFsTransferScanEventKindWire, FluxonFsTransferScanEventWire, FluxonFsTransferScanFrontier,
FluxonFsTransferScanFrontierDirEntry, FluxonFsTransferScanFrontierEntry,
- FluxonFsTransferScanResultWire,
- FluxonFsTransferSymlinkNoticeEntryWire, FluxonFsTransferWorkerCollectInfoResultWire,
- FluxonFsTransferWorkerAssignmentWire, FluxonFsTransferWorkerFileResultWire,
- FluxonFsTransferWorkerFailedFileResultWire,
- FluxonFsTransferWorkerHeartbeatResultWire, FluxonFsTransferWorkerHeartbeatTelemetryWire,
- FluxonFsTransferWorkerHeartbeatWire,
+ FluxonFsTransferScanLaunchResultWire, FluxonFsTransferScanMode, FluxonFsTransferScanResultWire,
+ FluxonFsTransferSkipEntryKind, FluxonFsTransferSkipEntryWire,
+ FluxonFsTransferSymlinkNoticeEntryWire, FluxonFsTransferWorkerAssignmentWire,
+ FluxonFsTransferWorkerCollectInfoResultWire, FluxonFsTransferWorkerFailedFileResultWire,
+ FluxonFsTransferWorkerFileResultWire, FluxonFsTransferWorkerHeartbeatResultWire,
+ FluxonFsTransferWorkerHeartbeatTelemetryWire, FluxonFsTransferWorkerHeartbeatWire,
FluxonFsTransferWorkerLaunchResultWire, FluxonFsTransferWorkerResultAckWire,
FluxonFsTransferWorkerResultWire, FluxonFsTransferWorkerStopReasonWire,
transfer_collect_info_output_relpath,
@@ -39,8 +34,8 @@ use fluxon_fs_core::retry::{
BackoffConfig, DEFAULT_WARN_INTERVAL_SECS, WarnConfig, next_backoff, should_warn,
};
use fluxon_kv::rpcresp_kvresult_convert::msg_and_error::{ApiError, KvError};
-use fluxon_kv::user_api::flat_dict::{FlatDict, FlatValue};
use fluxon_kv::user_api::FluxonUserApi;
+use fluxon_kv::user_api::flat_dict::{FlatDict, FlatValue};
use parking_lot::{Condvar, Mutex};
use super::{
@@ -202,16 +197,13 @@ fn transfer_scan_session_state() -> &'static Mutex {
TRANSFER_SCAN_SESSION_STATE.get_or_init(|| Mutex::new(TransferScanSessionState::default()))
}
-fn cleanup_expired_transfer_scan_sessions(
- state: &mut TransferScanSessionState,
- now_unix_ms: i64,
-) {
- state
- .root_dir_listing_sessions
- .retain(|_, session| session.lease_expire_unix_ms <= 0 || session.lease_expire_unix_ms > now_unix_ms);
- state
- .subtree_streaming_sessions
- .retain(|_, session| session.lease_expire_unix_ms <= 0 || session.lease_expire_unix_ms > now_unix_ms);
+fn cleanup_expired_transfer_scan_sessions(state: &mut TransferScanSessionState, now_unix_ms: i64) {
+ state.root_dir_listing_sessions.retain(|_, session| {
+ session.lease_expire_unix_ms <= 0 || session.lease_expire_unix_ms > now_unix_ms
+ });
+ state.subtree_streaming_sessions.retain(|_, session| {
+ session.lease_expire_unix_ms <= 0 || session.lease_expire_unix_ms > now_unix_ms
+ });
}
fn same_root_continuation_scan_unit(
@@ -301,10 +293,7 @@ fn flush_pending_root_direct_files_batch(
return Ok(None);
}
let batch = build_direct_files_only_batch_from_entries_with_batch_id(
- direct_files_only_batch_id_for_partition(
- assignment,
- session.next_direct_files_batch_index,
- ),
+ direct_files_only_batch_id_for_partition(assignment, session.next_direct_files_batch_index),
assignment,
assignment.root_relpath.clone(),
std::mem::take(&mut session.pending_direct_files),
@@ -313,7 +302,8 @@ fn flush_pending_root_direct_files_batch(
)?;
session.pending_direct_bytes = 0;
session.next_direct_files_batch_index = session.next_direct_files_batch_index.saturating_add(1);
- session.emitted_direct_files_batch_count = session.emitted_direct_files_batch_count.saturating_add(1);
+ session.emitted_direct_files_batch_count =
+ session.emitted_direct_files_batch_count.saturating_add(1);
Ok(Some(batch))
}
@@ -414,7 +404,8 @@ fn open_transfer_root_dir_listing_session(
root_dir_abs: &str,
assignment: &FluxonFsTransferScanAssignmentWire,
) -> Result, FlatDict> {
- let dir_abs = safe_join_root(root_dir_abs, assignment.root_relpath.as_str()).map_err(resp_err_kverr)?;
+ let dir_abs =
+ safe_join_root(root_dir_abs, assignment.root_relpath.as_str()).map_err(resp_err_kverr)?;
let read_dir = match retry_after_target_path_chmod(
dir_abs.as_path(),
"root_read_dir",
@@ -458,7 +449,10 @@ fn take_transfer_root_dir_listing_session(
let now_unix_ms = chrono::Utc::now().timestamp_millis();
let mut state = transfer_scan_session_state().lock();
cleanup_expired_transfer_scan_sessions(&mut state, now_unix_ms);
- if let Some(mut session) = state.root_dir_listing_sessions.remove(assignment.scan_unit_id.as_str()) {
+ if let Some(mut session) = state
+ .root_dir_listing_sessions
+ .remove(assignment.scan_unit_id.as_str())
+ {
if session.job_id == assignment.job_id
&& session.scan_epoch == assignment.scan_epoch
&& session.root_relpath == assignment.root_relpath
@@ -507,7 +501,8 @@ fn open_transfer_subtree_streaming_session(
if is_relpath_skipped(&assignment.skip_entries, assignment.root_relpath.as_str()) {
return Ok(None);
}
- let dir_abs = safe_join_root(root_dir_abs, assignment.root_relpath.as_str()).map_err(resp_err_kverr)?;
+ let dir_abs =
+ safe_join_root(root_dir_abs, assignment.root_relpath.as_str()).map_err(resp_err_kverr)?;
let root_md = retry_after_target_path_chmod(
Path::new(root_dir_abs),
"subtree_stream_root_symlink_metadata",
@@ -790,7 +785,8 @@ fn collect_transfer_root_dir_listing_slice(
assignment: &FluxonFsTransferScanAssignmentWire,
deadline: Option,
) -> Result {
- let Some(mut session) = take_transfer_root_dir_listing_session(root_dir_abs, assignment)? else {
+ let Some(mut session) = take_transfer_root_dir_listing_session(root_dir_abs, assignment)?
+ else {
return Ok(TransferRootDirListingOutcome::Finished(
build_finished_empty_transfer_scan_result(assignment),
));
@@ -848,7 +844,8 @@ fn collect_transfer_root_dir_listing_slice(
};
scanned_entries = scanned_entries.saturating_add(1);
let name = ent.file_name().to_string_lossy().to_string();
- let child_relpath = normalize_child_relpath(assignment.root_relpath.as_str(), name.as_str());
+ let child_relpath =
+ normalize_child_relpath(assignment.root_relpath.as_str(), name.as_str());
if is_relpath_skipped(&assignment.skip_entries, child_relpath.as_str()) {
continue;
}
@@ -899,10 +896,12 @@ fn collect_transfer_root_dir_listing_slice(
let size = md.len().min(i64::MAX as u64) as i64;
session.root_visible_entries = true;
session.root_total_bytes = session.root_total_bytes.saturating_add(size);
- session.pending_direct_files.push(FluxonFsTransferScanFrontierEntry {
- relpath: child_relpath,
- size,
- });
+ session
+ .pending_direct_files
+ .push(FluxonFsTransferScanFrontierEntry {
+ relpath: child_relpath,
+ size,
+ });
session.pending_direct_bytes = session.pending_direct_bytes.saturating_add(size);
if should_flush_direct_batch(
assignment.batch_ready_bytes,
@@ -910,7 +909,9 @@ fn collect_transfer_root_dir_listing_slice(
session.pending_direct_files.len(),
session.pending_direct_empty_dirs.len(),
) {
- if let Some(batch) = flush_pending_root_direct_files_batch(assignment, &mut session)? {
+ if let Some(batch) =
+ flush_pending_root_direct_files_batch(assignment, &mut session)?
+ {
direct_files_only_batches.push(batch);
}
}
@@ -933,14 +934,18 @@ fn collect_transfer_root_dir_listing_slice(
session.pending_direct_files.len(),
session.pending_direct_empty_dirs.len(),
) {
- if let Some(batch) = flush_pending_root_direct_files_batch(assignment, &mut session)? {
+ if let Some(batch) =
+ flush_pending_root_direct_files_batch(assignment, &mut session)?
+ {
direct_files_only_batches.push(batch);
}
}
} else {
- session.direct_dirs.push(FluxonFsTransferScanFrontierDirEntry {
- relpath: child_relpath,
- });
+ session
+ .direct_dirs
+ .push(FluxonFsTransferScanFrontierDirEntry {
+ relpath: child_relpath,
+ });
}
}
}
@@ -1244,12 +1249,14 @@ impl TransferWorkerProgressWindow {
fn record_written_bytes_and_maybe_ramp(&self, bytes: i64, now_unix_ms: i64) {
let normalized = bytes.max(0);
self.window_bytes.fetch_add(normalized, Ordering::SeqCst);
- self.total_written_bytes.fetch_add(normalized, Ordering::SeqCst);
+ self.total_written_bytes
+ .fetch_add(normalized, Ordering::SeqCst);
self.maybe_ramp(now_unix_ms);
}
fn record_materialized_empty_dir(&self) {
- self.total_materialized_empty_dirs.fetch_add(1, Ordering::SeqCst);
+ self.total_materialized_empty_dirs
+ .fetch_add(1, Ordering::SeqCst);
}
fn total_materialized_empty_dirs(&self) -> i64 {
@@ -1298,8 +1305,9 @@ impl TransferWorkerProgressWindow {
}
if previous_goodput > 0 {
let delta = current_goodput.saturating_sub(previous_goodput);
- let improvement_percent =
- delta.saturating_mul(100).saturating_div(previous_goodput.max(1));
+ let improvement_percent = delta
+ .saturating_mul(100)
+ .saturating_div(previous_goodput.max(1));
if improvement_percent < self.policy.min_improvement_percent {
return;
}
@@ -1335,10 +1343,8 @@ impl TransferWorkerProgressWindow {
.saturating_mul(1000)
.saturating_div(window_elapsed_ms.max(1))
};
- self.peak_sample_goodput_bytes_per_sec.fetch_max(
- window_goodput_bytes_per_sec.max(0),
- Ordering::SeqCst,
- );
+ self.peak_sample_goodput_bytes_per_sec
+ .fetch_max(window_goodput_bytes_per_sec.max(0), Ordering::SeqCst);
Some(TransferWorkerThroughputSample {
window_started_unix_ms,
window_elapsed_ms,
@@ -1448,7 +1454,8 @@ impl TransferReadStreamActorOwned {
data: Vec::new(),
});
}
- self.fill_prefetch_queue().map_err(|err| self.cache_terminal_error(err))?;
+ self.fill_prefetch_queue()
+ .map_err(|err| self.cache_terminal_error(err))?;
let to_take = std::cmp::min(length as usize, (self.file_size - next_offset) as usize);
let buf = self
.take_prefetched_bytes(to_take)
@@ -1457,7 +1464,8 @@ impl TransferReadStreamActorOwned {
self.replay_offset = next_offset;
self.replay_data = buf.clone();
self.next_offset = next_offset.saturating_add(buf.len() as i64);
- self.fill_prefetch_queue().map_err(|err| self.cache_terminal_error(err))?;
+ self.fill_prefetch_queue()
+ .map_err(|err| self.cache_terminal_error(err))?;
Ok(FluxonFsTransferReadStreamNextResultWire {
stream_missing: false,
data: buf,
@@ -1598,7 +1606,11 @@ impl TransferReadStreamActorHandle {
struct TransferWorkerCoordinator
where
- ReadChunkFn: Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError>,
{
log_context: TransferWorkerLogContext,
@@ -1611,7 +1623,11 @@ where
impl TransferWorkerCoordinator
where
- ReadChunkFn: Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError>,
{
fn new(
@@ -1693,7 +1709,8 @@ where
}
fn progress_snapshot(&self) -> TransferWorkerProgressSnapshot {
- self.progress.snapshot(chrono::Utc::now().timestamp_millis())
+ self.progress
+ .snapshot(chrono::Utc::now().timestamp_millis())
}
fn stop(&self) {
@@ -1737,7 +1754,8 @@ impl TransferReadStreamRegistryHandle {
});
}
}
- let full_path = safe_join_root(root_dir_abs, open.relpath.as_str()).map_err(resp_err_kverr)?;
+ let full_path =
+ safe_join_root(root_dir_abs, open.relpath.as_str()).map_err(resp_err_kverr)?;
let file = open_file_with_target_path_chmod_retry(&full_path, "open_stream")?;
let md = file.metadata().map_err(resp_err_io)?;
let file_size = md.len().min(i64::MAX as u64) as i64;
@@ -1764,12 +1782,15 @@ impl TransferReadStreamRegistryHandle {
});
}
state.streams.insert(stream_id.clone(), entry);
- state.dedup_by_worker_file.insert(dedup_key, stream_id.clone());
+ state
+ .dedup_by_worker_file
+ .insert(dedup_key, stream_id.clone());
drop(state);
if let Err(resp) = TransferReadStreamActorHandle::start(stream_id.as_str(), actor) {
let mut state = self.state.lock();
state.streams.remove(stream_id.as_str());
- state.dedup_by_worker_file
+ state
+ .dedup_by_worker_file
.retain(|_, existing_stream_id| existing_stream_id != &stream_id);
return Err(resp);
}
@@ -1811,7 +1832,9 @@ impl TransferReadStreamRegistryHandle {
let Some(entry) = state.streams.remove(stream_id) else {
return;
};
- state.dedup_by_worker_file.retain(|_, existing_stream_id| existing_stream_id != stream_id);
+ state
+ .dedup_by_worker_file
+ .retain(|_, existing_stream_id| existing_stream_id != stream_id);
entry.close();
}
}
@@ -1894,20 +1917,22 @@ fn decode_transfer_stream_open_result_payload(
return Err(TransferWorkerRpcFailure::Fatal(resp.clone()));
}
Ok(FluxonFsTransferReadStreamOpenResultWire {
- stream_id: require_str(resp, "stream_id").map_err(resp_err_kverr).map_err(
- |err| {
+ stream_id: require_str(resp, "stream_id")
+ .map_err(resp_err_kverr)
+ .map_err(|err| {
invalid_transfer_rpc_response(format!(
"transfer read stream open response missing stream_id: err={}",
transfer_rpc_response_err_text(&err)
))
- },
- )?,
- size: require_i64(resp, "size").map_err(resp_err_kverr).map_err(|err| {
- invalid_transfer_rpc_response(format!(
- "transfer read stream open response missing size: err={}",
- transfer_rpc_response_err_text(&err)
- ))
- })?,
+ })?,
+ size: require_i64(resp, "size")
+ .map_err(resp_err_kverr)
+ .map_err(|err| {
+ invalid_transfer_rpc_response(format!(
+ "transfer read stream open response missing size: err={}",
+ transfer_rpc_response_err_text(&err)
+ ))
+ })?,
})
}
@@ -1980,11 +2005,15 @@ fn is_relpath_skipped(skip_entries: &[FluxonFsTransferSkipEntryWire], relpath: &
}
fn file_name_from_relpath(relpath: &str) -> Result<&str, FlatDict> {
- relpath.rsplit('/').next().filter(|v| !v.is_empty()).ok_or_else(|| {
- resp_err_kverr(KvError::Api(ApiError::InvalidArgument {
- detail: format!("relpath must contain file name: {}", relpath),
- }))
- })
+ relpath
+ .rsplit('/')
+ .next()
+ .filter(|v| !v.is_empty())
+ .ok_or_else(|| {
+ resp_err_kverr(KvError::Api(ApiError::InvalidArgument {
+ detail: format!("relpath must contain file name: {}", relpath),
+ }))
+ })
}
fn transfer_staging_dir_for_file(staging_prefix: &str, relpath: &str) -> String {
@@ -2106,8 +2135,12 @@ where
match attempt() {
Ok(value) => Ok(value),
Err(initial_err) if initial_err.kind() == ErrorKind::PermissionDenied => {
- let repair_dir =
- repair_permission_denied_dir_for_retry(repair_anchor, op, target_path, &initial_err)?;
+ let repair_dir = repair_permission_denied_dir_for_retry(
+ repair_anchor,
+ op,
+ target_path,
+ &initial_err,
+ )?;
attempt().map_err(|retry_err| {
resp_err_kverr(KvError::Api(ApiError::Unknown {
detail: format!(
@@ -2334,10 +2367,11 @@ fn collect_transfer_tree_with_deadline(
continue;
}
};
- out.symlink_notices.push(FluxonFsTransferSymlinkNoticeEntryWire {
- relpath: child_rel,
- link_target: link_target.to_string_lossy().to_string(),
- });
+ out.symlink_notices
+ .push(FluxonFsTransferSymlinkNoticeEntryWire {
+ relpath: child_rel,
+ link_target: link_target.to_string_lossy().to_string(),
+ });
continue;
}
if md.is_dir() {
@@ -2361,7 +2395,8 @@ fn collect_transfer_tree_with_deadline(
}
out.files.sort_by(|a, b| a.relpath.cmp(&b.relpath));
out.empty_dirs.sort();
- out.symlink_notices.sort_by(|a, b| a.relpath.cmp(&b.relpath));
+ out.symlink_notices
+ .sort_by(|a, b| a.relpath.cmp(&b.relpath));
Ok(out)
}
@@ -2574,10 +2609,8 @@ fn build_transfer_scan_events_for_result(
event_seq_no_start: i64,
result: FluxonFsTransferScanResultWire,
) -> (Vec, bool, i64) {
- let (child_scan_units, continue_locally) = split_same_root_continuation_from_child_scan_units(
- assignment,
- result.child_scan_units,
- );
+ let (child_scan_units, continue_locally) =
+ split_same_root_continuation_from_child_scan_units(assignment, result.child_scan_units);
if continue_locally {
let event = build_transfer_scan_event(
assignment,
@@ -2588,11 +2621,7 @@ fn build_transfer_scan_events_for_result(
result.full_dir_batches,
String::new(),
);
- return (
- vec![event],
- true,
- event_seq_no_start.saturating_add(1),
- );
+ return (vec![event], true, event_seq_no_start.saturating_add(1));
}
let mut next_event_seq_no = event_seq_no_start;
let mut events = Vec::new();
@@ -2620,11 +2649,7 @@ fn build_transfer_scan_events_for_result(
Vec::new(),
String::new(),
));
- (
- events,
- false,
- next_event_seq_no.saturating_add(1),
- )
+ (events, false, next_event_seq_no.saturating_add(1))
}
fn send_transfer_scan_event_once(
@@ -2637,10 +2662,7 @@ fn send_transfer_scan_event_once(
}
let event_json = serde_json::to_string(event)
.map_err(|e| format!("serialize transfer scan event failed: {}", e))?;
- let payload = FlatDict::from([(
- "scan_event_json".to_string(),
- FlatValue::String(event_json),
- )]);
+ let payload = FlatDict::from([("scan_event_json".to_string(), FlatValue::String(event_json))]);
let resp = api
.rpc_client()
.call(
@@ -2793,30 +2815,28 @@ fn run_transfer_scan_background_task(
}
let mut next_event_seq_no = 1_i64;
loop {
- let result = match build_transfer_scan_result_for_root_dir_abs(
- root_dir_abs.as_str(),
- &assignment,
- ) {
- Ok(v) => v,
- Err(resp) => {
- let failed = build_transfer_scan_event(
- &assignment,
- next_event_seq_no,
- FluxonFsTransferScanEventKindWire::Failed,
- Vec::new(),
- Vec::new(),
- Vec::new(),
- transfer_rpc_response_err_text(&resp),
- );
- let _ = send_transfer_scan_event_with_retry(
- api.as_ref(),
- master_id.as_str(),
- &mut assignment,
- &failed,
- );
- break;
- }
- };
+ let result =
+ match build_transfer_scan_result_for_root_dir_abs(root_dir_abs.as_str(), &assignment) {
+ Ok(v) => v,
+ Err(resp) => {
+ let failed = build_transfer_scan_event(
+ &assignment,
+ next_event_seq_no,
+ FluxonFsTransferScanEventKindWire::Failed,
+ Vec::new(),
+ Vec::new(),
+ Vec::new(),
+ transfer_rpc_response_err_text(&resp),
+ );
+ let _ = send_transfer_scan_event_with_retry(
+ api.as_ref(),
+ master_id.as_str(),
+ &mut assignment,
+ &failed,
+ );
+ break;
+ }
+ };
let (events, continue_locally, next_seq_no_after_events) =
build_transfer_scan_events_for_result(&assignment, next_event_seq_no, result);
next_event_seq_no = next_seq_no_after_events;
@@ -2922,17 +2942,14 @@ impl TransferScanRegistryHandle {
let assignment2 = assignment.clone();
let thread_name = format!("fluxon_fs_transfer_scan_{}", assignment.scan_task_id);
match thread::Builder::new().name(thread_name).spawn(move || {
- run_transfer_scan_background_task(
- registry,
- api2,
- master_id2,
- exports2,
- assignment2,
- );
+ run_transfer_scan_background_task(registry, api2, master_id2, exports2, assignment2);
}) {
Ok(_) => Ok(FluxonFsTransferScanLaunchResultWire::started()),
Err(err) => {
- self.state.lock().tasks.remove(assignment.scan_task_id.as_str());
+ self.state
+ .lock()
+ .tasks
+ .remove(assignment.scan_task_id.as_str());
Err(resp_err_kverr(KvError::Api(ApiError::Unknown {
detail: format!(
"spawn transfer scan thread failed: scan_task_id={} err={}",
@@ -3006,22 +3023,16 @@ impl TransferWorkerRegistryHandle {
let master_id2 = master_id.to_string();
let exports2 = exports.clone();
let assignment2 = assignment.clone();
- let thread_name = format!(
- "fluxon_fs_transfer_worker_{}",
- assignment.worker_task_id
- );
+ let thread_name = format!("fluxon_fs_transfer_worker_{}", assignment.worker_task_id);
match thread::Builder::new().name(thread_name).spawn(move || {
- run_transfer_worker_background_task(
- registry,
- api2,
- master_id2,
- exports2,
- assignment2,
- );
+ run_transfer_worker_background_task(registry, api2, master_id2, exports2, assignment2);
}) {
Ok(_) => Ok(FluxonFsTransferWorkerLaunchResultWire::started()),
Err(err) => {
- self.state.lock().tasks.remove(assignment.worker_task_id.as_str());
+ self.state
+ .lock()
+ .tasks
+ .remove(assignment.worker_task_id.as_str());
Err(resp_err_kverr(KvError::Api(ApiError::Unknown {
detail: format!(
"spawn transfer worker thread failed: worker_task_id={} err={}",
@@ -3298,10 +3309,7 @@ fn send_transfer_worker_result_once(
detail: format!("serialize transfer worker result failed: {}", e),
})))
})?;
- let payload = FlatDict::from([(
- "result_json".to_string(),
- FlatValue::String(result_json),
- )]);
+ let payload = FlatDict::from([("result_json".to_string(), FlatValue::String(result_json))]);
let resp = api
.rpc_client()
.call(
@@ -3396,7 +3404,10 @@ fn open_transfer_read_stream_via_rpc_once(
"relpath".to_string(),
FlatValue::String(file.relpath.clone()),
),
- ("initial_offset".to_string(), FlatValue::Int64(initial_offset)),
+ (
+ "initial_offset".to_string(),
+ FlatValue::Int64(initial_offset),
+ ),
]);
let resp = api
.rpc_client()
@@ -3518,25 +3529,25 @@ impl TransferWorkerRemoteControl {
loop {
self.before_heartbeat_retry_attempt()?;
let current_materialized_empty_dirs = self.progress.total_materialized_empty_dirs();
- match self
- .heartbeat
- .ensure_continue(
- force,
- current_materialized_empty_dirs,
- |heartbeat_unix_ms, _heartbeat_detail| {
- let progress_snapshot =
- self.progress.snapshot(chrono::Utc::now().timestamp_millis());
- let telemetry =
- Some(transfer_worker_telemetry_from_progress_snapshot(&progress_snapshot));
- send_transfer_worker_heartbeat_once(
- self.api.as_ref(),
- self.master_id.as_str(),
- &self.assignment,
- heartbeat_unix_ms,
- telemetry,
- )
- },
- ) {
+ match self.heartbeat.ensure_continue(
+ force,
+ current_materialized_empty_dirs,
+ |heartbeat_unix_ms, _heartbeat_detail| {
+ let progress_snapshot = self
+ .progress
+ .snapshot(chrono::Utc::now().timestamp_millis());
+ let telemetry = Some(transfer_worker_telemetry_from_progress_snapshot(
+ &progress_snapshot,
+ ));
+ send_transfer_worker_heartbeat_once(
+ self.api.as_ref(),
+ self.master_id.as_str(),
+ &self.assignment,
+ heartbeat_unix_ms,
+ telemetry,
+ )
+ },
+ ) {
Ok(()) => return Ok(()),
Err(TransferWorkerHeartbeatGateError::Terminal(err)) => return Err(err),
Err(TransferWorkerHeartbeatGateError::Retryable {
@@ -3631,10 +3642,7 @@ impl TransferWorkerRemoteControl {
)
}
- fn close_stream_with_retry(
- &self,
- stream_id: &str,
- ) -> Result<(), TransferWorkerExecutionError> {
+ fn close_stream_with_retry(&self, stream_id: &str) -> Result<(), TransferWorkerExecutionError> {
let api = self.api.clone();
let assignment = self.assignment.clone();
let op_detail = format!(
@@ -3744,9 +3752,9 @@ impl TransferWorkerRemoteControl {
if ack.accepted {
return Ok(());
}
- Err(TransferWorkerExecutionError::Stop(stop_reason_or_superseded(
- ack.stop_reason,
- )))
+ Err(TransferWorkerExecutionError::Stop(
+ stop_reason_or_superseded(ack.stop_reason),
+ ))
}
}
@@ -3825,10 +3833,12 @@ impl TransferWorkerHeartbeatGate {
mut heartbeat_op: HeartbeatOp,
) -> Result<(), TransferWorkerHeartbeatGateError>
where
- HeartbeatOp: FnMut(
- i64,
- &'static str,
- ) -> Result,
+ HeartbeatOp:
+ FnMut(
+ i64,
+ &'static str,
+ )
+ -> Result,
{
loop {
let (heartbeat_unix_ms, heartbeat_detail) = {
@@ -3875,15 +3885,13 @@ impl TransferWorkerHeartbeatGate {
state.heartbeat_inflight = false;
let result = match heartbeat_result {
Ok(heartbeat_result) if heartbeat_result.continue_running => {
- state.last_heartbeat_completed_unix_ms =
- chrono::Utc::now().timestamp_millis();
+ state.last_heartbeat_completed_unix_ms = chrono::Utc::now().timestamp_millis();
state.last_heartbeat_materialized_empty_dirs = current_materialized_empty_dirs;
state.granted_lease_expire_unix_ms = heartbeat_result.lease_expire_unix_ms;
Ok(())
}
Ok(heartbeat_result) => {
- state.last_heartbeat_completed_unix_ms =
- chrono::Utc::now().timestamp_millis();
+ state.last_heartbeat_completed_unix_ms = chrono::Utc::now().timestamp_millis();
state.last_heartbeat_materialized_empty_dirs = current_materialized_empty_dirs;
let reason = stop_reason_or_superseded(heartbeat_result.stop_reason);
state.terminal_state =
@@ -3933,7 +3941,8 @@ fn run_transfer_worker_background_task(
));
let dedup_expire_unix_ms = match control.ensure_continue(true) {
Ok(()) => {
- let dst_export_root = match exports.export_root_dir_abs(assignment.dst_export.as_str()) {
+ let dst_export_root = match exports.export_root_dir_abs(assignment.dst_export.as_str())
+ {
Ok(v) => v,
Err(err) => {
tracing::warn!(
@@ -3996,7 +4005,9 @@ fn run_transfer_worker_background_task(
},
{
let control = control.clone();
- move |file, read_offset, length| control.read_chunk_with_retry(file, read_offset, length)
+ move |file, read_offset, length| {
+ control.read_chunk_with_retry(file, read_offset, length)
+ }
},
) {
Ok(result) => {
@@ -4004,34 +4015,38 @@ fn run_transfer_worker_background_task(
if let Err(resp) =
cleanup_transfer_worker_attempt_artifacts(&dst_root, &assignment)
{
- log_transfer_worker_cleanup_failure("before_result_submit", &assignment, &resp);
- }
- match control.submit_result_with_retry(&result) {
- Ok(()) => control.dedup_expire_unix_ms(),
- Err(TransferWorkerExecutionError::Stop(reason)) => {
- tracing::info!(
- "transfer worker result submission stopped: job_id={} batch_id={} worker_id={} worker_task_id={} reason={:?}",
- assignment.job_id,
- assignment.batch_id,
- assignment.worker_id,
- assignment.worker_task_id,
- reason
+ log_transfer_worker_cleanup_failure(
+ "before_result_submit",
+ &assignment,
+ &resp,
);
- control.dedup_expire_unix_ms()
}
- Err(TransferWorkerExecutionError::Fatal(resp)) => {
- tracing::warn!(
- "transfer worker result submission failed: job_id={} batch_id={} worker_id={} worker_task_id={} resp={:?}",
- assignment.job_id,
- assignment.batch_id,
- assignment.worker_id,
- assignment.worker_task_id,
- resp
- );
- control.dedup_expire_unix_ms()
+ match control.submit_result_with_retry(&result) {
+ Ok(()) => control.dedup_expire_unix_ms(),
+ Err(TransferWorkerExecutionError::Stop(reason)) => {
+ tracing::info!(
+ "transfer worker result submission stopped: job_id={} batch_id={} worker_id={} worker_task_id={} reason={:?}",
+ assignment.job_id,
+ assignment.batch_id,
+ assignment.worker_id,
+ assignment.worker_task_id,
+ reason
+ );
+ control.dedup_expire_unix_ms()
+ }
+ Err(TransferWorkerExecutionError::Fatal(resp)) => {
+ tracing::warn!(
+ "transfer worker result submission failed: job_id={} batch_id={} worker_id={} worker_task_id={} resp={:?}",
+ assignment.job_id,
+ assignment.batch_id,
+ assignment.worker_id,
+ assignment.worker_task_id,
+ resp
+ );
+ control.dedup_expire_unix_ms()
+ }
}
}
- }
Err(TransferWorkerExecutionError::Stop(reason)) => {
control.close_all_streams();
if let Err(resp) =
@@ -4054,10 +4069,13 @@ fn run_transfer_worker_background_task(
if let Err(cleanup_resp) =
cleanup_transfer_worker_attempt_artifacts(&dst_root, &assignment)
{
- log_transfer_worker_cleanup_failure("after_fatal", &assignment, &cleanup_resp);
+ log_transfer_worker_cleanup_failure(
+ "after_fatal",
+ &assignment,
+ &cleanup_resp,
+ );
}
- if let Some((fatal_kind, fatal_message)) =
- classify_transfer_worker_fatal(&resp)
+ if let Some((fatal_kind, fatal_message)) = classify_transfer_worker_fatal(&resp)
{
match report_transfer_worker_fatal_once(
control.api.as_ref(),
@@ -4107,15 +4125,25 @@ fn run_transfer_worker_background_task(
}
}
Err(TransferWorkerExecutionError::Stop(reason)) => {
- let dst_export_root = exports.export_root_dir_abs(assignment.dst_export.as_str()).ok();
+ let dst_export_root = exports
+ .export_root_dir_abs(assignment.dst_export.as_str())
+ .ok();
let dst_root = dst_export_root.and_then(|dst_export_root| {
- safe_join_root(dst_export_root.as_str(), assignment.dst_root_relpath.as_str())
- .ok()
- .map(PathBuf::from)
+ safe_join_root(
+ dst_export_root.as_str(),
+ assignment.dst_root_relpath.as_str(),
+ )
+ .ok()
+ .map(PathBuf::from)
});
if let Some(dst_root) = dst_root {
- if let Err(resp) = cleanup_transfer_worker_attempt_artifacts(&dst_root, &assignment) {
- log_transfer_worker_cleanup_failure("before_execution_stop", &assignment, &resp);
+ if let Err(resp) = cleanup_transfer_worker_attempt_artifacts(&dst_root, &assignment)
+ {
+ log_transfer_worker_cleanup_failure(
+ "before_execution_stop",
+ &assignment,
+ &resp,
+ );
}
}
tracing::info!(
@@ -4129,11 +4157,16 @@ fn run_transfer_worker_background_task(
control.dedup_expire_unix_ms()
}
Err(TransferWorkerExecutionError::Fatal(resp)) => {
- let dst_export_root = exports.export_root_dir_abs(assignment.dst_export.as_str()).ok();
+ let dst_export_root = exports
+ .export_root_dir_abs(assignment.dst_export.as_str())
+ .ok();
let dst_root = dst_export_root.and_then(|dst_export_root| {
- safe_join_root(dst_export_root.as_str(), assignment.dst_root_relpath.as_str())
- .ok()
- .map(PathBuf::from)
+ safe_join_root(
+ dst_export_root.as_str(),
+ assignment.dst_root_relpath.as_str(),
+ )
+ .ok()
+ .map(PathBuf::from)
});
if let Some(dst_root) = dst_root {
if let Err(cleanup_resp) =
@@ -4509,7 +4542,9 @@ fn plan_transfer_subtree_batches(
total_bytes: 0,
root_is_empty: true,
mergeable_empty_dir_count: 1,
- mergeable_empty_dir_estimated_bytes: estimate_empty_dir_manifest_entry_bytes(root_relpath),
+ mergeable_empty_dir_estimated_bytes: estimate_empty_dir_manifest_entry_bytes(
+ root_relpath,
+ ),
direct_files_only_batches: Vec::new(),
full_dir_batches: Vec::new(),
child_scan_units: Vec::new(),
@@ -4540,8 +4575,7 @@ fn plan_transfer_subtree_batches(
let child_empty_dir_count = child_plan.mergeable_empty_dir_count;
let child_empty_dir_estimated_bytes =
child_plan.mergeable_empty_dir_estimated_bytes;
- if mergeable_empty_dir_count
- .saturating_add(child_empty_dir_count)
+ if mergeable_empty_dir_count.saturating_add(child_empty_dir_count)
> TRANSFER_MERGEABLE_EMPTY_DIR_BUDGET
|| mergeable_empty_dir_estimated_bytes
.saturating_add(child_empty_dir_estimated_bytes)
@@ -4716,7 +4750,11 @@ fn build_root_direct_files_only_batch_from_entries(
}
fn sort_transfer_scan_batches(batches: &mut [FluxonFsTransferScanBatchWire]) {
- batches.sort_by(|a, b| a.root_relpath.cmp(&b.root_relpath).then(a.batch_id.cmp(&b.batch_id)));
+ batches.sort_by(|a, b| {
+ a.root_relpath
+ .cmp(&b.root_relpath)
+ .then(a.batch_id.cmp(&b.batch_id))
+ });
}
fn build_full_dir_batch_for_mergeable_subtree(
@@ -4750,14 +4788,12 @@ fn build_transfer_scan_result_for_subtree_streaming_root_dir_abs(
root_dir_abs: &str,
assignment: &FluxonFsTransferScanAssignmentWire,
) -> Result {
- let Some(mut session) = take_transfer_subtree_streaming_session(root_dir_abs, assignment)? else {
+ let Some(mut session) = take_transfer_subtree_streaming_session(root_dir_abs, assignment)?
+ else {
return Ok(build_finished_empty_subtree_stream_result(assignment));
};
loop {
- if session
- .dir_stack
- .is_empty()
- {
+ if session.dir_stack.is_empty() {
let mut full_dir_batches = Vec::new();
if let Some(batch) = flush_pending_subtree_stream_batch(assignment, &mut session)? {
full_dir_batches.push(batch);
@@ -4776,7 +4812,9 @@ fn build_transfer_scan_result_for_subtree_streaming_root_dir_abs(
finished: true,
});
}
- if TransferScanDeadline::from_assignment(assignment).is_some_and(|deadline| deadline.reached()) {
+ if TransferScanDeadline::from_assignment(assignment)
+ .is_some_and(|deadline| deadline.reached())
+ {
let mut full_dir_batches = Vec::new();
if let Some(batch) = flush_pending_subtree_stream_batch(assignment, &mut session)? {
full_dir_batches.push(batch);
@@ -4808,7 +4846,10 @@ fn build_transfer_scan_result_for_subtree_streaming_root_dir_abs(
if should_flush_subtree_stream_batch(
assignment.batch_ready_bytes,
session.pending_bytes,
- session.pending_files.len().saturating_add(session.pending_symlink_notices.len()),
+ session
+ .pending_files
+ .len()
+ .saturating_add(session.pending_symlink_notices.len()),
session.pending_empty_dirs.len(),
) {
let batch = flush_pending_subtree_stream_batch(assignment, &mut session)?.unwrap();
@@ -4865,22 +4906,30 @@ fn build_transfer_scan_result_for_subtree_streaming_root_dir_abs(
});
} else if md.is_dir() {
frame.saw_visible_child = true;
- session.dir_stack.push(open_transfer_subtree_streaming_dir_frame(
- child_path,
- child_relpath,
- )?);
+ session
+ .dir_stack
+ .push(open_transfer_subtree_streaming_dir_frame(
+ child_path,
+ child_relpath,
+ )?);
} else if md.is_file() {
frame.saw_visible_child = true;
let size = md.len().min(i64::MAX as u64) as i64;
session.pending_bytes = session.pending_bytes.saturating_add(size);
session
.pending_files
- .push(FluxonFsTransferScanFrontierEntry { relpath: child_relpath, size });
+ .push(FluxonFsTransferScanFrontierEntry {
+ relpath: child_relpath,
+ size,
+ });
}
if should_flush_subtree_stream_batch(
assignment.batch_ready_bytes,
session.pending_bytes,
- session.pending_files.len().saturating_add(session.pending_symlink_notices.len()),
+ session
+ .pending_files
+ .len()
+ .saturating_add(session.pending_symlink_notices.len()),
session.pending_empty_dirs.len(),
) {
let batch = flush_pending_subtree_stream_batch(assignment, &mut session)?.unwrap();
@@ -4928,11 +4977,12 @@ pub(crate) fn build_transfer_scan_result_for_root_dir_abs(
);
}
let deadline = TransferScanDeadline::from_assignment(assignment);
- let root_listing = match collect_transfer_root_dir_listing_slice(root_dir_abs, assignment, deadline)? {
- TransferRootDirListingOutcome::Complete(v) => v,
- TransferRootDirListingOutcome::Finished(result) => return Ok(result),
- TransferRootDirListingOutcome::Partial(result) => return Ok(result),
- };
+ let root_listing =
+ match collect_transfer_root_dir_listing_slice(root_dir_abs, assignment, deadline)? {
+ TransferRootDirListingOutcome::Complete(v) => v,
+ TransferRootDirListingOutcome::Finished(result) => return Ok(result),
+ TransferRootDirListingOutcome::Partial(result) => return Ok(result),
+ };
let mut direct_files = root_listing.direct_files;
let mut direct_symlink_notices = root_listing.direct_symlink_notices;
let mut direct_empty_dirs = root_listing.direct_empty_dirs;
@@ -4976,7 +5026,10 @@ pub(crate) fn build_transfer_scan_result_for_root_dir_abs(
if (!direct_files.is_empty()
|| !direct_symlink_notices.is_empty()
|| !direct_empty_dirs.is_empty())
- && !direct_files_only_disposition_covers_root(assignment, assignment.root_relpath.as_str())
+ && !direct_files_only_disposition_covers_root(
+ assignment,
+ assignment.root_relpath.as_str(),
+ )
{
let mut next_partition_index = root_listing.emitted_direct_files_batch_count;
direct_files_only_batches.extend(build_partitioned_root_direct_files_only_batches(
@@ -4987,17 +5040,15 @@ pub(crate) fn build_transfer_scan_result_for_root_dir_abs(
direct_empty_dirs.clone(),
)?);
}
- child_scan_units.extend(
- direct_dirs[delegated_child_scan_unit_count..]
- .iter()
- .map(|entry| {
- new_child_scan_unit(
- entry.relpath.clone(),
- assignment.generation + 1,
- delegated_child_scan_mode(),
- )
- }),
- );
+ child_scan_units.extend(direct_dirs[delegated_child_scan_unit_count..].iter().map(
+ |entry| {
+ new_child_scan_unit(
+ entry.relpath.clone(),
+ assignment.generation + 1,
+ delegated_child_scan_mode(),
+ )
+ },
+ ));
child_scan_units.sort_by(|a, b| a.root_relpath.cmp(&b.root_relpath));
sort_transfer_scan_batches(&mut direct_files_only_batches);
return Ok(FluxonFsTransferScanResultWire {
@@ -5027,7 +5078,8 @@ pub(crate) fn build_transfer_scan_result_for_root_dir_abs(
let mut root_partitioned = root_listing.emitted_direct_files_batch_count > 0
|| direct_files_only_disposition_covers_root(assignment, assignment.root_relpath.as_str());
let mut mergeable_empty_dir_count = direct_empty_dirs.len();
- let mut mergeable_empty_dir_estimated_bytes = estimate_empty_dir_manifest_bytes(&direct_empty_dirs);
+ let mut mergeable_empty_dir_estimated_bytes =
+ estimate_empty_dir_manifest_bytes(&direct_empty_dirs);
for child_relpath in direct_dirs.iter().map(|entry| entry.relpath.clone()) {
let child_plan = plan_transfer_subtree_batches(
root_dir_abs,
@@ -5043,8 +5095,7 @@ pub(crate) fn build_transfer_scan_result_for_root_dir_abs(
let child_empty_dir_count = child_plan.mergeable_empty_dir_count;
let child_empty_dir_estimated_bytes =
child_plan.mergeable_empty_dir_estimated_bytes;
- if mergeable_empty_dir_count
- .saturating_add(child_empty_dir_count)
+ if mergeable_empty_dir_count.saturating_add(child_empty_dir_count)
> TRANSFER_MERGEABLE_EMPTY_DIR_BUDGET
|| mergeable_empty_dir_estimated_bytes
.saturating_add(child_empty_dir_estimated_bytes)
@@ -5083,7 +5134,10 @@ pub(crate) fn build_transfer_scan_result_for_root_dir_abs(
if (!direct_files.is_empty()
|| !direct_symlink_notices.is_empty()
|| !mergeable_empty_child_relpaths.is_empty())
- && !direct_files_only_disposition_covers_root(assignment, assignment.root_relpath.as_str())
+ && !direct_files_only_disposition_covers_root(
+ assignment,
+ assignment.root_relpath.as_str(),
+ )
{
let mut next_partition_index = root_listing.emitted_direct_files_batch_count;
direct_empty_dirs.extend(mergeable_empty_child_relpaths);
@@ -5212,10 +5266,11 @@ fn handle_transfer_scan_assignment(
assignment.generation,
assignment.known_dispositions.len(),
);
- let result = match build_transfer_scan_result_for_root_dir_abs(root_dir_abs.as_str(), &assignment) {
- Ok(v) => v,
- Err(resp) => return resp,
- };
+ let result =
+ match build_transfer_scan_result_for_root_dir_abs(root_dir_abs.as_str(), &assignment) {
+ Ok(v) => v,
+ Err(resp) => return resp,
+ };
encode_transfer_scan_result(&result, "transfer scan result")
}
@@ -5228,8 +5283,11 @@ fn prepare_transfer_file_streaming(
coordinator: &TransferWorkerCoordinator,
) -> Result
where
- ReadChunkFn:
- Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError>,
{
let staging_relpath = transfer_staging_file_relpath(staging_prefix, file.relpath.as_str())
@@ -5239,9 +5297,12 @@ where
.map_err(TransferWorkerExecutionError::fatal)?;
ensure_transfer_parent_dirs(dst_root, final_relpath.as_str())
.map_err(TransferWorkerExecutionError::fatal)?;
- let staging_abs = safe_join_root(dst_root.to_string_lossy().as_ref(), staging_relpath.as_str())
- .map_err(resp_err_kverr)
- .map_err(TransferWorkerExecutionError::fatal)?;
+ let staging_abs = safe_join_root(
+ dst_root.to_string_lossy().as_ref(),
+ staging_relpath.as_str(),
+ )
+ .map_err(resp_err_kverr)
+ .map_err(TransferWorkerExecutionError::fatal)?;
let mut dst_file = open_create_file_with_parent_dir_chmod_retry(&staging_abs)
.map_err(TransferWorkerExecutionError::fatal)?;
dst_file
@@ -5254,14 +5315,14 @@ where
let remaining = file.size.saturating_sub(copied);
let chunk = coordinator.read_chunk(file, copied, remaining.min(CHUNK_BYTES as i64))?;
if chunk.is_empty() {
- return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(KvError::Api(
- ApiError::InvalidArgument {
+ return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(
+ KvError::Api(ApiError::InvalidArgument {
detail: format!(
"transfer worker source ended before expected size: relpath={} expected={} copied={}",
file.relpath, file.size, copied
),
- },
- ))));
+ }),
+ )));
}
dst_file
.write_all(&chunk)
@@ -5271,13 +5332,14 @@ where
coordinator.record_written_bytes(chunk.len() as i64);
}
if copied != file.size {
- return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(KvError::Api(
- ApiError::InvalidArgument {
- detail: format!(
- "transfer worker size mismatch before staging completion: relpath={} expected={} actual={}",
- file.relpath, file.size, copied
- ),
- }))));
+ return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(
+ KvError::Api(ApiError::InvalidArgument {
+ detail: format!(
+ "transfer worker size mismatch before staging completion: relpath={} expected={} actual={}",
+ file.relpath, file.size, copied
+ ),
+ }),
+ )));
}
// The staged file is still invisible at this point, so one more checkpoint
// keeps supersession able to stop the worker before any later visible
@@ -5302,48 +5364,57 @@ fn execute_transfer_single_file(
coordinator: &TransferWorkerCoordinator,
) -> Result
where
- ReadChunkFn:
- Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError>,
{
coordinator.checkpoint_continue()?;
- let prepared = match prepare_transfer_file_streaming(dst_root, staging_prefix, file, coordinator) {
- Ok(v) => v,
- Err(TransferWorkerExecutionError::Fatal(resp)) => {
- if let Some(failed) = classify_transfer_failed_file(file, &resp) {
- let staging_relpath =
- transfer_staging_file_relpath(staging_prefix, file.relpath.as_str())
- .map_err(TransferWorkerExecutionError::fatal)?;
- let staging_abs = safe_join_root(
- dst_root.to_string_lossy().as_ref(),
- staging_relpath.as_str(),
- )
- .map_err(resp_err_kverr)
- .map_err(TransferWorkerExecutionError::fatal)?;
- match fs::remove_file(&staging_abs) {
- Ok(()) => {}
- Err(err) if err.kind() == ErrorKind::NotFound => {}
- Err(err) => return Err(TransferWorkerExecutionError::fatal(resp_err_io(err))),
+ let prepared =
+ match prepare_transfer_file_streaming(dst_root, staging_prefix, file, coordinator) {
+ Ok(v) => v,
+ Err(TransferWorkerExecutionError::Fatal(resp)) => {
+ if let Some(failed) = classify_transfer_failed_file(file, &resp) {
+ let staging_relpath =
+ transfer_staging_file_relpath(staging_prefix, file.relpath.as_str())
+ .map_err(TransferWorkerExecutionError::fatal)?;
+ let staging_abs = safe_join_root(
+ dst_root.to_string_lossy().as_ref(),
+ staging_relpath.as_str(),
+ )
+ .map_err(resp_err_kverr)
+ .map_err(TransferWorkerExecutionError::fatal)?;
+ match fs::remove_file(&staging_abs) {
+ Ok(()) => {}
+ Err(err) if err.kind() == ErrorKind::NotFound => {}
+ Err(err) => {
+ return Err(TransferWorkerExecutionError::fatal(resp_err_io(err)));
+ }
+ }
+ return Ok(TransferWorkerLaneOutcome::Failed(
+ TransferWorkerLaneFailedFileResult { result: failed },
+ ));
}
- return Ok(TransferWorkerLaneOutcome::Failed(
- TransferWorkerLaneFailedFileResult { result: failed },
- ));
+ return Err(TransferWorkerExecutionError::Fatal(resp));
}
- return Err(TransferWorkerExecutionError::Fatal(resp));
- }
- Err(err) => return Err(err),
- };
+ Err(err) => return Err(err),
+ };
coordinator.checkpoint_continue()?;
- let result = promote_prepared_transfer_file(dst_root, PreparedTransferFile {
- staging_relpath: prepared.staging_relpath.clone(),
- final_relpath: prepared.final_relpath.clone(),
- visible_size: prepared.visible_size,
- })
+ let result = promote_prepared_transfer_file(
+ dst_root,
+ PreparedTransferFile {
+ staging_relpath: prepared.staging_relpath.clone(),
+ final_relpath: prepared.final_relpath.clone(),
+ visible_size: prepared.visible_size,
+ },
+ )
.map_err(TransferWorkerExecutionError::fatal);
match result {
- Ok(result) => Ok(TransferWorkerLaneOutcome::Visible(TransferWorkerLaneFileResult {
- result,
- })),
+ Ok(result) => Ok(TransferWorkerLaneOutcome::Visible(
+ TransferWorkerLaneFileResult { result },
+ )),
Err(TransferWorkerExecutionError::Fatal(resp)) => {
if let Some(failed) = classify_transfer_failed_file(file, &resp) {
let staging_abs = safe_join_root(
@@ -5374,8 +5445,11 @@ fn execute_transfer_empty_dir(
coordinator: &TransferWorkerCoordinator,
) -> Result
where
- ReadChunkFn:
- Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError>,
{
coordinator.checkpoint_continue()?;
@@ -5393,11 +5467,14 @@ fn execute_transfer_worker_assignment_with_policy(
read_chunk: ReadChunkFn,
) -> Result
where
- ReadChunkFn:
- Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>
- + Send
- + Sync
- + 'static,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>
+ + Send
+ + Sync
+ + 'static,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError> + Send + Sync + 'static,
{
let policy = policy.normalized();
@@ -5424,11 +5501,14 @@ fn execute_transfer_worker_assignment_with_policy_and_progress Result
where
- ReadChunkFn:
- Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>
- + Send
- + Sync
- + 'static,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>
+ + Send
+ + Sync
+ + 'static,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError> + Send + Sync + 'static,
{
create_dir_all_with_parent_dir_chmod_retry(dst_root)
@@ -5436,9 +5516,11 @@ where
let manifest =
FluxonFsTransferManifestWire::decode_from_blob(assignment.manifest_blob.as_slice())
.map_err(|e| {
- TransferWorkerExecutionError::fatal(resp_err_kverr(KvError::Api(ApiError::InvalidArgument {
- detail: format!("decode transfer worker manifest failed: {}", e),
- })))
+ TransferWorkerExecutionError::fatal(resp_err_kverr(KvError::Api(
+ ApiError::InvalidArgument {
+ detail: format!("decode transfer worker manifest failed: {}", e),
+ },
+ )))
})?;
if transfer_manifest_is_empty_dirs_only_batch(&manifest, assignment.collect_infos.as_slice()) {
// Empty-dir-only batches never generate byte-based ramp-up signals, so
@@ -5664,10 +5746,16 @@ fn promote_prepared_transfer_file(
dst_root: &PathBuf,
file: PreparedTransferFile,
) -> Result {
- let staging_abs = safe_join_root(dst_root.to_string_lossy().as_ref(), file.staging_relpath.as_str())
- .map_err(resp_err_kverr)?;
- let final_abs = safe_join_root(dst_root.to_string_lossy().as_ref(), file.final_relpath.as_str())
- .map_err(resp_err_kverr)?;
+ let staging_abs = safe_join_root(
+ dst_root.to_string_lossy().as_ref(),
+ file.staging_relpath.as_str(),
+ )
+ .map_err(resp_err_kverr)?;
+ let final_abs = safe_join_root(
+ dst_root.to_string_lossy().as_ref(),
+ file.final_relpath.as_str(),
+ )
+ .map_err(resp_err_kverr)?;
rename_with_dst_parent_dir_chmod_retry(&staging_abs, &final_abs)?;
Ok(FluxonFsTransferWorkerFileResultWire {
relpath: file.final_relpath.clone(),
@@ -5694,15 +5782,15 @@ fn prepare_transfer_collect_info_materialization(
),
}))
})?;
- let staging_relpath = transfer_collect_info_staging_relpath(
- batch_id,
- worker_task_id,
- collect_info.collect_kind,
- )?;
+ let staging_relpath =
+ transfer_collect_info_staging_relpath(batch_id, worker_task_id, collect_info.collect_kind)?;
ensure_transfer_parent_dirs(dst_root, staging_relpath.as_str())?;
ensure_transfer_parent_dirs(dst_root, output_relpath.as_str())?;
- let staging_abs = safe_join_root(dst_root.to_string_lossy().as_ref(), staging_relpath.as_str())
- .map_err(resp_err_kverr)?;
+ let staging_abs = safe_join_root(
+ dst_root.to_string_lossy().as_ref(),
+ staging_relpath.as_str(),
+ )
+ .map_err(resp_err_kverr)?;
let mut dst_file = open_create_file_with_parent_dir_chmod_retry(&staging_abs)?;
dst_file.set_len(0).map_err(resp_err_io)?;
dst_file
@@ -5723,14 +5811,15 @@ fn transfer_collect_info_staging_relpath(
worker_task_id: &str,
collect_kind: FluxonFsTransferCollectInfoKind,
) -> Result {
- let output_relpath = transfer_collect_info_output_relpath(batch_id, collect_kind).map_err(|detail| {
- resp_err_kverr(KvError::Api(ApiError::InvalidArgument {
- detail: format!(
- "build transfer collect info output relpath failed: batch_id={} err={}",
- batch_id, detail
- ),
- }))
- })?;
+ let output_relpath =
+ transfer_collect_info_output_relpath(batch_id, collect_kind).map_err(|detail| {
+ resp_err_kverr(KvError::Api(ApiError::InvalidArgument {
+ detail: format!(
+ "build transfer collect info output relpath failed: batch_id={} err={}",
+ batch_id, detail
+ ),
+ }))
+ })?;
Ok(format!("{}.{}.fluxon.part", output_relpath, worker_task_id))
}
@@ -5750,9 +5839,12 @@ fn prune_empty_parent_dirs(mut current: PathBuf, root: &PathBuf) -> Result<(), F
Ok(())
}
-fn cleanup_attempt_staging_prefix(dst_root: &PathBuf, staging_prefix: &str) -> Result<(), FlatDict> {
- let staging_abs =
- safe_join_root(dst_root.to_string_lossy().as_ref(), staging_prefix).map_err(resp_err_kverr)?;
+fn cleanup_attempt_staging_prefix(
+ dst_root: &PathBuf,
+ staging_prefix: &str,
+) -> Result<(), FlatDict> {
+ let staging_abs = safe_join_root(dst_root.to_string_lossy().as_ref(), staging_prefix)
+ .map_err(resp_err_kverr)?;
match fs::remove_dir_all(&staging_abs) {
Ok(()) => {}
Err(err) if err.kind() == ErrorKind::NotFound => return Ok(()),
@@ -5865,7 +5957,8 @@ pub(crate) fn read_transfer_chunk_from_root_dir_abs(
return Ok(Vec::new());
}
let to_read = std::cmp::min(length, size - offset) as usize;
- f.seek(SeekFrom::Start(offset as u64)).map_err(resp_err_io)?;
+ f.seek(SeekFrom::Start(offset as u64))
+ .map_err(resp_err_io)?;
let mut buf = vec![0u8; to_read];
f.read_exact(&mut buf).map_err(resp_err_io)?;
Ok(buf)
@@ -5881,11 +5974,14 @@ pub(crate) fn execute_transfer_worker_assignment(
read_chunk: ReadChunkFn,
) -> Result
where
- ReadChunkFn:
- Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>
- + Send
- + Sync
- + 'static,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>
+ + Send
+ + Sync
+ + 'static,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError> + Send + Sync + 'static,
{
execute_transfer_worker_assignment_with_policy(
@@ -5943,12 +6039,19 @@ pub(super) fn handle_transfer_read(exports: &AgentExportsHandle, payload: FlatDi
Ok(v) => v,
Err(e) => return resp_err_kverr(e),
};
- let buf =
- match read_transfer_chunk_from_root_dir_abs(root_dir_abs.as_str(), relpath.as_str(), offset, length) {
- Ok(v) => v,
- Err(resp) => return resp,
- };
- resp_ok(BTreeMap::from([("data".to_string(), FlatValue::Bytes(buf))]))
+ let buf = match read_transfer_chunk_from_root_dir_abs(
+ root_dir_abs.as_str(),
+ relpath.as_str(),
+ offset,
+ length,
+ ) {
+ Ok(v) => v,
+ Err(resp) => return resp,
+ };
+ resp_ok(BTreeMap::from([(
+ "data".to_string(),
+ FlatValue::Bytes(buf),
+ )]))
}
pub(super) fn handle_transfer_stream_open(
@@ -6010,7 +6113,9 @@ pub(super) fn handle_transfer_worker(
Err(resp) => return resp,
};
match registry.launch_task(api, master_id, exports, assignment) {
- Ok(result) => encode_transfer_worker_launch_result(&result, "transfer worker launch result"),
+ Ok(result) => {
+ encode_transfer_worker_launch_result(&result, "transfer worker launch result")
+ }
Err(resp) => resp,
}
}
@@ -6086,15 +6191,15 @@ mod tests {
.collect()
}
- fn assert_all_child_scan_units_are_subtree_streaming(
- result: &FluxonFsTransferScanResultWire,
- ) {
- assert!(result
- .child_scan_units
- .iter()
- .all(|child| child.scan_mode == FluxonFsTransferScanMode::SubtreeStreaming));
- }
-
+ fn assert_all_child_scan_units_are_subtree_streaming(result: &FluxonFsTransferScanResultWire) {
+ assert!(
+ result
+ .child_scan_units
+ .iter()
+ .all(|child| child.scan_mode == FluxonFsTransferScanMode::SubtreeStreaming)
+ );
+ }
+
fn ok_bool(resp: &FlatDict) -> bool {
matches!(resp.get("ok"), Some(FlatValue::Bool(true)))
}
@@ -6106,10 +6211,7 @@ mod tests {
panic!("unexpected open result fatal decode error: {:?}", other)
}
Err(TransferWorkerRpcFailure::Retryable { detail }) => {
- panic!(
- "unexpected open result retryable decode error: {}",
- detail
- )
+ panic!("unexpected open result retryable decode error: {}", detail)
}
}
}
@@ -6121,10 +6223,7 @@ mod tests {
panic!("unexpected next result fatal decode error: {:?}", other)
}
Err(TransferWorkerRpcFailure::Retryable { detail }) => {
- panic!(
- "unexpected next result retryable decode error: {}",
- detail
- )
+ panic!("unexpected next result retryable decode error: {}", detail)
}
}
}
@@ -6140,10 +6239,7 @@ mod tests {
.collect()
}
- fn test_worker_assignment(
- relpath: &str,
- size: i64,
- ) -> FluxonFsTransferWorkerAssignmentWire {
+ fn test_worker_assignment(relpath: &str, size: i64) -> FluxonFsTransferWorkerAssignmentWire {
FluxonFsTransferWorkerAssignmentWire {
job_id: "job".to_string(),
batch_id: "batch".to_string(),
@@ -6158,12 +6254,13 @@ mod tests {
root_relpath: ".".to_string(),
staging_prefix: ".fluxon.stage/job/batch".to_string(),
lease_expire_unix_ms: 0,
- manifest_blob: build_transfer_manifest_blob(vec![
- FluxonFsTransferScanFrontierEntry {
+ manifest_blob: build_transfer_manifest_blob(
+ vec![FluxonFsTransferScanFrontierEntry {
relpath: relpath.to_string(),
size,
- },
- ], Vec::new())
+ }],
+ Vec::new(),
+ )
.unwrap(),
collect_infos: Vec::new(),
}
@@ -6183,7 +6280,11 @@ mod tests {
read_chunk: ReadChunkFn,
) -> TransferWorkerCoordinator
where
- ReadChunkFn: Fn(&FluxonFsTransferManifestEntryWire, i64, i64) -> Result, TransferWorkerExecutionError>,
+ ReadChunkFn: Fn(
+ &FluxonFsTransferManifestEntryWire,
+ i64,
+ i64,
+ ) -> Result, TransferWorkerExecutionError>,
CheckpointFn: Fn() -> Result<(), TransferWorkerExecutionError>,
{
let policy = Arc::new(TransferWorkerLanePolicy::production_default());
@@ -6217,16 +6318,19 @@ mod tests {
#[test]
fn build_transfer_manifest_blob_round_trips_entries() {
- let blob = build_transfer_manifest_blob(vec![
- FluxonFsTransferScanFrontierEntry {
- relpath: "a".to_string(),
- size: 1,
- },
- FluxonFsTransferScanFrontierEntry {
- relpath: "b/c".to_string(),
- size: 2,
- },
- ], vec!["empty".to_string()])
+ let blob = build_transfer_manifest_blob(
+ vec![
+ FluxonFsTransferScanFrontierEntry {
+ relpath: "a".to_string(),
+ size: 1,
+ },
+ FluxonFsTransferScanFrontierEntry {
+ relpath: "b/c".to_string(),
+ size: 2,
+ },
+ ],
+ vec!["empty".to_string()],
+ )
.unwrap();
let manifest = FluxonFsTransferManifestWire::decode_from_blob(&blob).unwrap();
assert_eq!(manifest.entry_count, 2);
@@ -6250,11 +6354,12 @@ mod tests {
#[test]
fn materialize_transfer_collect_info_writes_task_scoped_staging_then_output_file() {
let root = TempDir::new().unwrap();
- let collect_infos = build_symlink_collect_infos(vec![FluxonFsTransferSymlinkNoticeEntryWire {
- relpath: "root/link-file.bin".to_string(),
- link_target: "target/file.bin".to_string(),
- }])
- .unwrap();
+ let collect_infos =
+ build_symlink_collect_infos(vec![FluxonFsTransferSymlinkNoticeEntryWire {
+ relpath: "root/link-file.bin".to_string(),
+ link_target: "target/file.bin".to_string(),
+ }])
+ .unwrap();
let prepared = prepare_transfer_collect_info_materialization(
&root.path().to_path_buf(),
"batch-1",
@@ -6383,7 +6488,10 @@ mod tests {
&exports,
FlatDict::from([
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("offset".to_string(), FlatValue::Int64(2)),
("length".to_string(), FlatValue::Int64(3)),
]),
@@ -6398,7 +6506,10 @@ mod tests {
&exports,
FlatDict::from([
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("offset".to_string(), FlatValue::Int64(6)),
("length".to_string(), FlatValue::Int64(1)),
]),
@@ -6423,7 +6534,10 @@ mod tests {
&exports,
FlatDict::from([
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("offset".to_string(), FlatValue::Int64(1)),
("length".to_string(), FlatValue::Int64(3)),
]),
@@ -6446,9 +6560,15 @@ mod tests {
&exports,
FlatDict::from([
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("offset".to_string(), FlatValue::Int64(0)),
- ("length".to_string(), FlatValue::Int64(CHUNK_BYTES as i64 + 1)),
+ (
+ "length".to_string(),
+ FlatValue::Int64(CHUNK_BYTES as i64 + 1),
+ ),
]),
);
assert!(matches!(resp.get("ok"), Some(FlatValue::Bool(false))));
@@ -6470,7 +6590,10 @@ mod tests {
FlatValue::String("task-0".to_string()),
),
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("initial_offset".to_string(), FlatValue::Int64(0)),
]),
);
@@ -6551,7 +6674,10 @@ mod tests {
FlatValue::String("task-1".to_string()),
),
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("initial_offset".to_string(), FlatValue::Int64(0)),
]),
);
@@ -6584,7 +6710,10 @@ mod tests {
("length".to_string(), FlatValue::Int64(2)),
]),
);
- assert!(matches!(invalid_resp.get("ok"), Some(FlatValue::Bool(false))));
+ assert!(matches!(
+ invalid_resp.get("ok"),
+ Some(FlatValue::Bool(false))
+ ));
}
#[test]
@@ -6603,7 +6732,10 @@ mod tests {
FlatValue::String("task-2".to_string()),
),
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("initial_offset".to_string(), FlatValue::Int64(3)),
]),
);
@@ -6647,7 +6779,10 @@ mod tests {
FlatValue::String("task-3".to_string()),
),
("export".to_string(), FlatValue::String("src".to_string())),
- ("relpath".to_string(), FlatValue::String("f.bin".to_string())),
+ (
+ "relpath".to_string(),
+ FlatValue::String("f.bin".to_string()),
+ ),
("initial_offset".to_string(), FlatValue::Int64(3)),
]),
);
@@ -6718,7 +6853,10 @@ mod tests {
let result = decode_result_json(&resp);
assert!(result.finished);
assert!(result.direct_files_only_batches.is_empty());
- assert_eq!(child_scan_unit_roots(&result), vec!["root/child".to_string()]);
+ assert_eq!(
+ child_scan_unit_roots(&result),
+ vec!["root/child".to_string()]
+ );
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
}
@@ -6781,7 +6919,8 @@ mod tests {
}
#[test]
- fn handle_transfer_scan_assignment_groups_empty_children_into_direct_batch_without_direct_files() {
+ fn handle_transfer_scan_assignment_groups_empty_children_into_direct_batch_without_direct_files()
+ {
let root = TempDir::new().unwrap();
write_file(&root, "root/big/data.bin", b"12345");
fs::create_dir_all(root.path().join("root/empty-a")).unwrap();
@@ -6814,9 +6953,10 @@ mod tests {
assert_eq!(child_scan_unit_roots(&result), vec!["root/big".to_string()]);
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
- let direct_manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ let direct_manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert!(direct_manifest.entries.is_empty());
assert_eq!(
direct_manifest.empty_dir_relpaths,
@@ -6854,16 +6994,17 @@ mod tests {
assert_eq!(result.direct_files_only_batches.len(), 1);
assert_eq!(result.full_dir_batches.len(), 0);
assert_eq!(result.child_scan_units.len(), 1);
- assert_eq!(result.child_scan_units[0].root_relpath, "root/huge".to_string());
+ assert_eq!(
+ result.child_scan_units[0].root_relpath,
+ "root/huge".to_string()
+ );
let manifest = FluxonFsTransferManifestWire::decode_from_blob(
&result.direct_files_only_batches[0].manifest_blob,
)
.unwrap();
assert!(manifest.entries.is_empty());
assert!(!manifest.empty_dir_relpaths.is_empty());
- assert!(
- manifest.empty_dir_relpaths.len() <= TRANSFER_MERGEABLE_EMPTY_DIR_BUDGET
- );
+ assert!(manifest.empty_dir_relpaths.len() <= TRANSFER_MERGEABLE_EMPTY_DIR_BUDGET);
assert!(
estimate_empty_dir_manifest_bytes(&manifest.empty_dir_relpaths)
<= TRANSFER_MERGEABLE_EMPTY_DIR_ESTIMATED_BYTES_BUDGET
@@ -6879,10 +7020,10 @@ mod tests {
))
+ 1;
for idx in 0..child_count {
- fs::create_dir_all(root.path().join(format!(
- "root/branch-{idx:05}/{}",
- "x".repeat(200)
- )))
+ fs::create_dir_all(
+ root.path()
+ .join(format!("root/branch-{idx:05}/{}", "x".repeat(200))),
+ )
.unwrap();
}
let result = build_transfer_scan_result_for_root_dir_abs(
@@ -6908,10 +7049,12 @@ mod tests {
assert!(!result.finished);
assert!(result.direct_files_only_batches.is_empty());
assert!(!result.child_scan_units.is_empty());
- assert!(result
- .child_scan_units
- .iter()
- .any(|child| child.scan_mode == FluxonFsTransferScanMode::FullTree));
+ assert!(
+ result
+ .child_scan_units
+ .iter()
+ .any(|child| child.scan_mode == FluxonFsTransferScanMode::FullTree)
+ );
assert!(result.child_scan_units.iter().all(|child| {
child.scan_mode == FluxonFsTransferScanMode::FullTree
|| child.scan_mode == FluxonFsTransferScanMode::SubtreeStreaming
@@ -6963,10 +7106,16 @@ mod tests {
assert!(!continue_locally);
assert_eq!(next_event_seq_no, 9);
assert_eq!(events.len(), 2);
- assert_eq!(events[0].event_kind, FluxonFsTransferScanEventKindWire::Append);
+ assert_eq!(
+ events[0].event_kind,
+ FluxonFsTransferScanEventKindWire::Append
+ );
assert_eq!(events[0].event_seq_no, 7);
assert_eq!(events[0].full_dir_batches.len(), 1);
- assert_eq!(events[1].event_kind, FluxonFsTransferScanEventKindWire::Finished);
+ assert_eq!(
+ events[1].event_kind,
+ FluxonFsTransferScanEventKindWire::Finished
+ );
assert_eq!(events[1].event_seq_no, 8);
assert!(events[1].direct_files_only_batches.is_empty());
assert!(events[1].child_scan_units.is_empty());
@@ -6997,17 +7146,21 @@ mod tests {
skip_entries: Vec::new(),
};
- let first = build_transfer_scan_result_for_root_dir_abs(
- root.path().to_str().unwrap(),
- &assignment,
- )
- .unwrap();
+ let first =
+ build_transfer_scan_result_for_root_dir_abs(root.path().to_str().unwrap(), &assignment)
+ .unwrap();
assert!(!first.finished);
assert!(!first.direct_files_only_batches.is_empty());
assert!(first.full_dir_batches.is_empty());
assert_eq!(first.child_scan_units.len(), 1);
- assert_eq!(first.child_scan_units[0].scan_unit_id, assignment.scan_unit_id);
- assert_eq!(first.child_scan_units[0].root_relpath, assignment.root_relpath);
+ assert_eq!(
+ first.child_scan_units[0].scan_unit_id,
+ assignment.scan_unit_id
+ );
+ assert_eq!(
+ first.child_scan_units[0].root_relpath,
+ assignment.root_relpath
+ );
assert_eq!(first.child_scan_units[0].generation, assignment.generation);
let first_entry_count = first
.direct_files_only_batches
@@ -7019,7 +7172,10 @@ mod tests {
.len()
})
.sum::();
- assert_eq!(first_entry_count, TRANSFER_SCAN_ROOT_LISTING_SLICE_ENTRY_LIMIT);
+ assert_eq!(
+ first_entry_count,
+ TRANSFER_SCAN_ROOT_LISTING_SLICE_ENTRY_LIMIT
+ );
let second_assignment = FluxonFsTransferScanAssignmentWire {
scan_task_id: "task-2".to_string(),
@@ -7086,7 +7242,8 @@ mod tests {
}
#[test]
- fn build_transfer_scan_result_root_direct_fanout_only_emits_child_scan_units_without_recursing() {
+ fn build_transfer_scan_result_root_direct_fanout_only_emits_child_scan_units_without_recursing()
+ {
let root = TempDir::new().unwrap();
write_file(&root, "root/direct.bin", b"abc");
write_file(&root, "root/child/payload.bin", b"xyz");
@@ -7114,14 +7271,18 @@ mod tests {
assert_eq!(result.direct_files_only_batches.len(), 1);
assert_eq!(result.child_scan_units.len(), 1);
assert!(result.full_dir_batches.is_empty());
- assert_eq!(result.child_scan_units[0].root_relpath, "root/child".to_string());
+ assert_eq!(
+ result.child_scan_units[0].root_relpath,
+ "root/child".to_string()
+ );
assert_eq!(
result.child_scan_units[0].scan_mode,
FluxonFsTransferScanMode::FullTree
);
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![FluxonFsTransferManifestEntryWire {
@@ -7133,7 +7294,8 @@ mod tests {
}
#[test]
- fn build_transfer_scan_result_directory_direct_fanout_only_emits_child_scan_units_without_recursing() {
+ fn build_transfer_scan_result_directory_direct_fanout_only_emits_child_scan_units_without_recursing()
+ {
let root = TempDir::new().unwrap();
write_file(&root, "root/child/direct.bin", b"abc");
write_file(&root, "root/child/grand/payload.bin", b"xyz");
@@ -7161,14 +7323,18 @@ mod tests {
assert_eq!(result.direct_files_only_batches.len(), 1);
assert_eq!(result.child_scan_units.len(), 1);
assert!(result.full_dir_batches.is_empty());
- assert_eq!(result.child_scan_units[0].root_relpath, "root/child/grand".to_string());
+ assert_eq!(
+ result.child_scan_units[0].root_relpath,
+ "root/child/grand".to_string()
+ );
assert_eq!(
result.child_scan_units[0].scan_mode,
FluxonFsTransferScanMode::FullTree
);
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![FluxonFsTransferManifestEntryWire {
@@ -7206,10 +7372,14 @@ mod tests {
.unwrap();
assert!(result.finished);
assert_eq!(result.direct_files_only_batches.len(), 1);
- assert_eq!(result.direct_files_only_batches[0].root_relpath, "root/child");
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ assert_eq!(
+ result.direct_files_only_batches[0].root_relpath,
+ "root/child"
+ );
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![FluxonFsTransferManifestEntryWire {
@@ -7218,7 +7388,10 @@ mod tests {
}]
);
assert!(manifest.empty_dir_relpaths.is_empty());
- assert_eq!(child_scan_unit_roots(&result), vec!["root/child/grand".to_string()]);
+ assert_eq!(
+ child_scan_unit_roots(&result),
+ vec!["root/child/grand".to_string()]
+ );
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
}
@@ -7253,10 +7426,14 @@ mod tests {
assert!(result.finished);
assert_eq!(result.direct_files_only_batches.len(), 1);
assert_eq!(result.child_scan_units.len(), 1);
- assert_eq!(result.child_scan_units[0].root_relpath, "root/child".to_string());
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ assert_eq!(
+ result.child_scan_units[0].root_relpath,
+ "root/child".to_string()
+ );
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![FluxonFsTransferManifestEntryWire {
@@ -7330,9 +7507,10 @@ mod tests {
assert_eq!(result.direct_files_only_batches.len(), 1);
assert!(result.child_scan_units.is_empty());
assert!(result.full_dir_batches.is_empty());
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![FluxonFsTransferManifestEntryWire {
@@ -7407,7 +7585,10 @@ mod tests {
assert!(result.finished);
assert_eq!(result.direct_files_only_batches.len(), 1);
assert_eq!(result.child_scan_units.len(), 1);
- assert_eq!(result.child_scan_units[0].root_relpath, "root/child-b".to_string());
+ assert_eq!(
+ result.child_scan_units[0].root_relpath,
+ "root/child-b".to_string()
+ );
assert_eq!(
result.child_scan_units[0].scan_mode,
FluxonFsTransferScanMode::FullTree
@@ -7447,9 +7628,10 @@ mod tests {
assert_eq!(result.direct_files_only_batches.len(), 1);
assert!(result.child_scan_units.is_empty());
assert!(result.full_dir_batches.is_empty());
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![FluxonFsTransferManifestEntryWire {
@@ -7490,9 +7672,10 @@ mod tests {
assert!(result.direct_files_only_batches.is_empty());
assert!(result.child_scan_units.is_empty());
assert_eq!(result.full_dir_batches.len(), 1);
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.full_dir_batches[0].manifest_blob)
- .unwrap();
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.full_dir_batches[0].manifest_blob,
+ )
+ .unwrap();
assert!(manifest.entries.is_empty());
assert_eq!(manifest.empty_dir_relpaths, vec!["root".to_string()]);
}
@@ -7533,7 +7716,8 @@ mod tests {
}
#[test]
- fn handle_transfer_scan_assignment_does_not_reaggregate_root_when_descendant_batch_is_durable() {
+ fn handle_transfer_scan_assignment_does_not_reaggregate_root_when_descendant_batch_is_durable()
+ {
let root = TempDir::new().unwrap();
write_file(&root, "root/direct.bin", b"abc");
write_file(&root, "root/big/data.bin", b"12345");
@@ -7579,21 +7763,29 @@ mod tests {
size: 3,
}]
);
- assert!(result
- .full_dir_batches
- .iter()
- .all(|batch| batch.root_relpath != "root"));
- assert!(result
- .full_dir_batches
- .iter()
- .all(|batch| batch.root_relpath != "root/big"));
- assert_eq!(child_scan_unit_roots(&result), vec!["root/small".to_string()]);
+ assert!(
+ result
+ .full_dir_batches
+ .iter()
+ .all(|batch| batch.root_relpath != "root")
+ );
+ assert!(
+ result
+ .full_dir_batches
+ .iter()
+ .all(|batch| batch.root_relpath != "root/big")
+ );
+ assert_eq!(
+ child_scan_unit_roots(&result),
+ vec!["root/small".to_string()]
+ );
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
}
#[test]
- fn handle_transfer_scan_assignment_honors_cross_generation_descendant_full_dir_during_restart() {
+ fn handle_transfer_scan_assignment_honors_cross_generation_descendant_full_dir_during_restart()
+ {
let root = TempDir::new().unwrap();
write_file(&root, "root/direct.bin", b"abc");
write_file(&root, "root/big/data.bin", b"12345");
@@ -7639,21 +7831,29 @@ mod tests {
size: 3,
}]
);
- assert!(result
- .full_dir_batches
- .iter()
- .all(|batch| batch.root_relpath != "root"));
- assert!(result
- .full_dir_batches
- .iter()
- .all(|batch| batch.root_relpath != "root/big"));
- assert_eq!(child_scan_unit_roots(&result), vec!["root/small".to_string()]);
+ assert!(
+ result
+ .full_dir_batches
+ .iter()
+ .all(|batch| batch.root_relpath != "root")
+ );
+ assert!(
+ result
+ .full_dir_batches
+ .iter()
+ .all(|batch| batch.root_relpath != "root/big")
+ );
+ assert_eq!(
+ child_scan_unit_roots(&result),
+ vec!["root/small".to_string()]
+ );
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
}
#[test]
- fn handle_transfer_scan_assignment_replays_descendant_current_layer_when_only_partial_descendant_direct_files_batch_is_durable() {
+ fn handle_transfer_scan_assignment_replays_descendant_current_layer_when_only_partial_descendant_direct_files_batch_is_durable()
+ {
let root = TempDir::new().unwrap();
write_file(&root, "root/child/a.bin", b"ab");
write_file(&root, "root/child/b.bin", b"cd");
@@ -7688,10 +7888,14 @@ mod tests {
assert!(result.child_scan_units.is_empty());
assert!(result.full_dir_batches.is_empty());
assert_eq!(result.direct_files_only_batches.len(), 1);
- assert_eq!(result.direct_files_only_batches[0].root_relpath, "root/child");
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ assert_eq!(
+ result.direct_files_only_batches[0].root_relpath,
+ "root/child"
+ );
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![
@@ -7737,7 +7941,10 @@ mod tests {
assert!(result.finished);
assert!(result.direct_files_only_batches.is_empty());
- assert_eq!(child_scan_unit_roots(&result), vec!["root/parent".to_string()]);
+ assert_eq!(
+ child_scan_unit_roots(&result),
+ vec!["root/parent".to_string()]
+ );
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
}
@@ -7772,9 +7979,10 @@ mod tests {
assert!(result.finished);
assert_eq!(result.direct_files_only_batches.len(), 1);
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.direct_files_only_batches[0].manifest_blob)
- .unwrap();
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.direct_files_only_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![FluxonFsTransferManifestEntryWire {
@@ -7782,7 +7990,10 @@ mod tests {
size: 10,
}]
);
- assert_eq!(child_scan_unit_roots(&result), vec!["root/child".to_string()]);
+ assert_eq!(
+ child_scan_unit_roots(&result),
+ vec!["root/child".to_string()]
+ );
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
}
@@ -7820,7 +8031,10 @@ mod tests {
assert!(ok_bool(&resp));
assert!(result.finished);
- assert_eq!(child_scan_unit_roots(&result), vec!["root/blocked".to_string()]);
+ assert_eq!(
+ child_scan_unit_roots(&result),
+ vec!["root/blocked".to_string()]
+ );
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
}
@@ -7870,7 +8084,11 @@ mod tests {
);
assert_eq!(
child_scan_unit_roots(&result),
- vec!["root/a".to_string(), "root/b".to_string(), "root/c".to_string()]
+ vec![
+ "root/a".to_string(),
+ "root/b".to_string(),
+ "root/c".to_string()
+ ]
);
assert_all_child_scan_units_are_subtree_streaming(&result);
assert!(result.full_dir_batches.is_empty());
@@ -7905,7 +8123,10 @@ mod tests {
);
let result = decode_result_json(&resp);
assert_eq!(result.full_dir_batches.len(), 1);
- assert_eq!(result.full_dir_batches[0].batch_kind, FluxonFsTransferBatchKind::SubtreeSlice);
+ assert_eq!(
+ result.full_dir_batches[0].batch_kind,
+ FluxonFsTransferBatchKind::SubtreeSlice
+ );
assert_eq!(result.full_dir_batches[0].collect_infos.len(), 1);
assert_eq!(
decode_symlink_notice_collect_blob(
@@ -7957,11 +8178,15 @@ mod tests {
assert!(result.direct_files_only_batches.is_empty());
assert!(result.child_scan_units.is_empty());
assert_eq!(result.full_dir_batches.len(), 1);
- assert_eq!(result.full_dir_batches[0].batch_kind, FluxonFsTransferBatchKind::SubtreeSlice);
+ assert_eq!(
+ result.full_dir_batches[0].batch_kind,
+ FluxonFsTransferBatchKind::SubtreeSlice
+ );
assert_eq!(result.full_dir_batches[0].root_relpath, "root".to_string());
- let manifest =
- FluxonFsTransferManifestWire::decode_from_blob(&result.full_dir_batches[0].manifest_blob)
- .unwrap();
+ let manifest = FluxonFsTransferManifestWire::decode_from_blob(
+ &result.full_dir_batches[0].manifest_blob,
+ )
+ .unwrap();
assert_eq!(
manifest.entries,
vec![
@@ -7982,7 +8207,9 @@ mod tests {
FluxonFsTransferCollectInfoKind::SymlinkNotice
);
let mut notices = decode_symlink_notice_collect_blob(
- direct_files_only_batch.collect_infos[0].collect_blob.as_slice()
+ direct_files_only_batch.collect_infos[0]
+ .collect_blob
+ .as_slice(),
);
notices.sort_by(|a, b| a.relpath.cmp(&b.relpath));
assert_eq!(
@@ -8004,16 +8231,17 @@ mod tests {
fn prepare_transfer_file_from_chunks_promotes_staged_file_to_final_path() {
let root = TempDir::new().unwrap();
let dst_root = root.path().to_path_buf();
- let coordinator = test_transfer_coordinator(
- || Ok(()),
- {
- let chunks = Arc::new(Mutex::new(vec![b"ab".to_vec(), b"cde".to_vec(), Vec::new()]));
- move |_file, _read_offset, _length| {
- let mut chunks = chunks.lock();
- Ok(chunks.remove(0))
- }
- },
- );
+ let coordinator = test_transfer_coordinator(|| Ok(()), {
+ let chunks = Arc::new(Mutex::new(vec![
+ b"ab".to_vec(),
+ b"cde".to_vec(),
+ Vec::new(),
+ ]));
+ move |_file, _read_offset, _length| {
+ let mut chunks = chunks.lock();
+ Ok(chunks.remove(0))
+ }
+ });
let prepared = prepare_transfer_file_streaming(
&dst_root,
".fluxon.stage/job/batch",
@@ -8038,32 +8266,31 @@ mod tests {
fs::read(root.path().join("dir/file.bin")).unwrap(),
b"abcde".to_vec()
);
- assert!(!root
- .path()
- .join(".fluxon.stage/job/batch/dir/file.bin/file.bin.fluxon.part")
- .exists());
+ assert!(
+ !root
+ .path()
+ .join(".fluxon.stage/job/batch/dir/file.bin/file.bin.fluxon.part")
+ .exists()
+ );
}
#[test]
fn prepare_transfer_file_from_chunks_truncates_existing_staging_file() {
let root = TempDir::new().unwrap();
let dst_root = root.path().to_path_buf();
- let stale_staging =
- root.path()
- .join(".fluxon.stage/job/batch/dir/file.bin/file.bin.fluxon.part");
+ let stale_staging = root
+ .path()
+ .join(".fluxon.stage/job/batch/dir/file.bin/file.bin.fluxon.part");
fs::create_dir_all(stale_staging.parent().unwrap()).unwrap();
fs::write(&stale_staging, b"stale-data").unwrap();
- let coordinator = test_transfer_coordinator(
- || Ok(()),
- {
- let chunks = Arc::new(Mutex::new(vec![b"xy".to_vec(), Vec::new()]));
- move |_file, _read_offset, _length| {
- let mut chunks = chunks.lock();
- Ok(chunks.remove(0))
- }
- },
- );
+ let coordinator = test_transfer_coordinator(|| Ok(()), {
+ let chunks = Arc::new(Mutex::new(vec![b"xy".to_vec(), Vec::new()]));
+ move |_file, _read_offset, _length| {
+ let mut chunks = chunks.lock();
+ Ok(chunks.remove(0))
+ }
+ });
let prepared = prepare_transfer_file_streaming(
&dst_root,
".fluxon.stage/job/batch",
@@ -8076,23 +8303,23 @@ mod tests {
.unwrap();
promote_prepared_transfer_file(&dst_root, prepared).unwrap();
- assert_eq!(fs::read(root.path().join("dir/file.bin")).unwrap(), b"xy".to_vec());
+ assert_eq!(
+ fs::read(root.path().join("dir/file.bin")).unwrap(),
+ b"xy".to_vec()
+ );
}
#[test]
fn prepare_transfer_file_from_chunks_rejects_size_mismatch_and_keeps_staging_file() {
let root = TempDir::new().unwrap();
let dst_root = root.path().to_path_buf();
- let coordinator = test_transfer_coordinator(
- || Ok(()),
- {
- let chunks = Arc::new(Mutex::new(vec![b"xy".to_vec(), Vec::new()]));
- move |_file, _read_offset, _length| {
- let mut chunks = chunks.lock();
- Ok(chunks.remove(0))
- }
- },
- );
+ let coordinator = test_transfer_coordinator(|| Ok(()), {
+ let chunks = Arc::new(Mutex::new(vec![b"xy".to_vec(), Vec::new()]));
+ move |_file, _read_offset, _length| {
+ let mut chunks = chunks.lock();
+ Ok(chunks.remove(0))
+ }
+ });
let err = prepare_transfer_file_streaming(
&dst_root,
".fluxon.stage/job/batch",
@@ -8275,15 +8502,10 @@ mod tests {
let file_bytes = b"hello".to_vec();
let assignment = test_worker_assignment("dir/file.bin", file_bytes.len() as i64);
- let result = execute_transfer_worker_assignment(
- &assignment,
- &dst_root,
- || Ok(()),
- {
- let file_bytes = file_bytes.clone();
- move |_file, _read_offset, _length| Ok(file_bytes.clone())
- },
- )
+ let result = execute_transfer_worker_assignment(&assignment, &dst_root, || Ok(()), {
+ let file_bytes = file_bytes.clone();
+ move |_file, _read_offset, _length| Ok(file_bytes.clone())
+ })
.unwrap();
assert_eq!(result.file_results.len(), 1);
@@ -8303,7 +8525,10 @@ mod tests {
create_dir_all_with_parent_dir_chmod_retry(&target).unwrap();
assert!(target.is_dir());
- assert_eq!(fs::metadata(&locked_parent).unwrap().permissions().mode() & 0o777, 0o777);
+ assert_eq!(
+ fs::metadata(&locked_parent).unwrap().permissions().mode() & 0o777,
+ 0o777
+ );
}
#[cfg(unix)]
@@ -8318,21 +8543,19 @@ mod tests {
let file_bytes = b"hello".to_vec();
let assignment = test_worker_assignment("dir/file.bin", file_bytes.len() as i64);
- let result = execute_transfer_worker_assignment(
- &assignment,
- &dst_root,
- || Ok(()),
- {
- let file_bytes = file_bytes.clone();
- move |_file, _read_offset, _length| Ok(file_bytes.clone())
- },
- )
+ let result = execute_transfer_worker_assignment(&assignment, &dst_root, || Ok(()), {
+ let file_bytes = file_bytes.clone();
+ move |_file, _read_offset, _length| Ok(file_bytes.clone())
+ })
.unwrap();
assert_eq!(result.file_results.len(), 1);
assert!(dst_root.is_dir());
assert_eq!(fs::read(dst_root.join("dir/file.bin")).unwrap(), file_bytes);
- assert_eq!(fs::metadata(&locked_parent).unwrap().permissions().mode() & 0o777, 0o777);
+ assert_eq!(
+ fs::metadata(&locked_parent).unwrap().permissions().mode() & 0o777,
+ 0o777
+ );
}
#[test]
@@ -8349,47 +8572,48 @@ mod tests {
let assignment = assignment.clone();
let heartbeat_attempts = heartbeat_attempts.clone();
move || {
- retry_transfer_worker_rpc_with_backoff(
- &assignment,
- "checkpoint",
- "test-checkpoint",
- BackoffConfig {
- initial_secs: 0,
- max_secs: 0,
- },
- WarnConfig {
- warn_interval_secs: 0,
- },
- || {
- let attempt =
- heartbeat_attempts.fetch_add(1, Ordering::SeqCst) + 1;
- if attempt < 3 {
- return Err(TransferWorkerRpcFailure::Retryable {
- detail: format!(
- "transient heartbeat failure attempt={}",
- attempt
- ),
- });
- }
- Ok(())
- },
- )
- .map_err(TransferWorkerExecutionError::fatal)
- }
+ retry_transfer_worker_rpc_with_backoff(
+ &assignment,
+ "checkpoint",
+ "test-checkpoint",
+ BackoffConfig {
+ initial_secs: 0,
+ max_secs: 0,
+ },
+ WarnConfig {
+ warn_interval_secs: 0,
+ },
+ || {
+ let attempt = heartbeat_attempts.fetch_add(1, Ordering::SeqCst) + 1;
+ if attempt < 3 {
+ return Err(TransferWorkerRpcFailure::Retryable {
+ detail: format!(
+ "transient heartbeat failure attempt={}",
+ attempt
+ ),
+ });
+ }
+ Ok(())
+ },
+ )
+ .map_err(TransferWorkerExecutionError::fatal)
+ }
},
{
let file_bytes = file_bytes.clone();
move |file, read_offset, _length| {
- if file.relpath != "dir/file.bin" {
- return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(KvError::Api(ApiError::InvalidArgument {
- detail: format!("unexpected file relpath: {}", file.relpath),
- }))));
- }
- if read_offset == 0 {
- return Ok(file_bytes.clone());
+ if file.relpath != "dir/file.bin" {
+ return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(
+ KvError::Api(ApiError::InvalidArgument {
+ detail: format!("unexpected file relpath: {}", file.relpath),
+ }),
+ )));
+ }
+ if read_offset == 0 {
+ return Ok(file_bytes.clone());
+ }
+ Ok(Vec::new())
}
- Ok(Vec::new())
- }
},
)
.unwrap();
@@ -8409,24 +8633,20 @@ mod tests {
let file_bytes = b"payload".to_vec();
let assignment = test_worker_assignment("dir/file.bin", file_bytes.len() as i64);
let read_attempts = Arc::new(AtomicUsize::new(0));
- let result = execute_transfer_worker_assignment(
- &assignment,
- &dst_root,
- || Ok(()),
- {
- let assignment = assignment.clone();
- let file_bytes = file_bytes.clone();
- let read_attempts = read_attempts.clone();
- move |file, read_offset, _length| {
+ let result = execute_transfer_worker_assignment(&assignment, &dst_root, || Ok(()), {
+ let assignment = assignment.clone();
+ let file_bytes = file_bytes.clone();
+ let read_attempts = read_attempts.clone();
+ move |file, read_offset, _length| {
if file.relpath != "dir/file.bin" {
- return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(KvError::Api(ApiError::InvalidArgument {
- detail: format!("unexpected file relpath: {}", file.relpath),
- }))));
+ return Err(TransferWorkerExecutionError::fatal(resp_err_kverr(
+ KvError::Api(ApiError::InvalidArgument {
+ detail: format!("unexpected file relpath: {}", file.relpath),
+ }),
+ )));
}
- let op_detail = format!(
- "test-read relpath={} offset={}",
- file.relpath, read_offset
- );
+ let op_detail =
+ format!("test-read relpath={} offset={}", file.relpath, read_offset);
retry_transfer_worker_rpc_with_backoff(
&assignment,
"read_chunk",
@@ -8443,10 +8663,7 @@ mod tests {
let attempt = read_attempts.fetch_add(1, Ordering::SeqCst) + 1;
if attempt < 3 {
return Err(TransferWorkerRpcFailure::Retryable {
- detail: format!(
- "transient read failure attempt={}",
- attempt
- ),
+ detail: format!("transient read failure attempt={}", attempt),
});
}
return Ok(file_bytes.clone());
@@ -8456,8 +8673,7 @@ mod tests {
)
.map_err(TransferWorkerExecutionError::fatal)
}
- },
- )
+ })
.unwrap();
assert_eq!(read_attempts.load(Ordering::SeqCst), 3);
@@ -8481,23 +8697,23 @@ mod tests {
{
let checkpoint_calls = checkpoint_calls.clone();
move || {
- let calls = checkpoint_calls.fetch_add(1, Ordering::SeqCst) + 1;
- if calls >= 4 {
- return Err(TransferWorkerExecutionError::Stop(
- FluxonFsTransferWorkerStopReasonWire::Superseded,
- ));
+ let calls = checkpoint_calls.fetch_add(1, Ordering::SeqCst) + 1;
+ if calls >= 4 {
+ return Err(TransferWorkerExecutionError::Stop(
+ FluxonFsTransferWorkerStopReasonWire::Superseded,
+ ));
+ }
+ Ok(())
}
- Ok(())
- }
},
{
let file_bytes = file_bytes.clone();
move |_file, read_offset, _length| {
- if read_offset == 0 {
- return Ok(file_bytes.clone());
+ if read_offset == 0 {
+ return Ok(file_bytes.clone());
+ }
+ Ok(Vec::new())
}
- Ok(Vec::new())
- }
},
);
assert!(matches!(
@@ -8567,12 +8783,7 @@ mod tests {
break;
}
if max_in_flight
- .compare_exchange(
- observed,
- current,
- Ordering::SeqCst,
- Ordering::SeqCst,
- )
+ .compare_exchange(observed, current, Ordering::SeqCst, Ordering::SeqCst)
.is_ok()
{
break;
@@ -8588,8 +8799,14 @@ mod tests {
assert_eq!(result.file_results.len(), 2);
assert!(max_in_flight.load(Ordering::SeqCst) >= 2);
- assert_eq!(fs::read(root.path().join("dir/a.bin")).unwrap(), b"xxx".to_vec());
- assert_eq!(fs::read(root.path().join("dir/b.bin")).unwrap(), b"xxx".to_vec());
+ assert_eq!(
+ fs::read(root.path().join("dir/a.bin")).unwrap(),
+ b"xxx".to_vec()
+ );
+ assert_eq!(
+ fs::read(root.path().join("dir/b.bin")).unwrap(),
+ b"xxx".to_vec()
+ );
}
#[test]
@@ -8597,29 +8814,25 @@ mod tests {
let root = TempDir::new().unwrap();
let dst_root = root.path().to_path_buf();
let file_bytes = b"hello".to_vec();
- let collect_infos = build_symlink_collect_infos(vec![FluxonFsTransferSymlinkNoticeEntryWire {
- relpath: "dir/link.bin".to_string(),
- link_target: "dir/file.bin".to_string(),
- }])
- .unwrap();
+ let collect_infos =
+ build_symlink_collect_infos(vec![FluxonFsTransferSymlinkNoticeEntryWire {
+ relpath: "dir/link.bin".to_string(),
+ link_target: "dir/file.bin".to_string(),
+ }])
+ .unwrap();
let assignment = FluxonFsTransferWorkerAssignmentWire {
collect_infos: collect_infos.clone(),
..test_worker_assignment("dir/file.bin", file_bytes.len() as i64)
};
- let result = execute_transfer_worker_assignment(
- &assignment,
- &dst_root,
- || Ok(()),
- {
- let file_bytes = file_bytes.clone();
- move |_file, read_offset, _length| {
- if read_offset == 0 {
- return Ok(file_bytes.clone());
- }
- Ok(Vec::new())
+ let result = execute_transfer_worker_assignment(&assignment, &dst_root, || Ok(()), {
+ let file_bytes = file_bytes.clone();
+ move |_file, read_offset, _length| {
+ if read_offset == 0 {
+ return Ok(file_bytes.clone());
}
- },
- )
+ Ok(Vec::new())
+ }
+ })
.unwrap();
assert_eq!(result.file_results.len(), 1);
@@ -8633,7 +8846,11 @@ mod tests {
"fluxon_collect_info/batches/batch/symlinks.jsonl"
);
assert_eq!(
- fs::read(root.path().join("fluxon_collect_info/batches/batch/symlinks.jsonl")).unwrap(),
+ fs::read(
+ root.path()
+ .join("fluxon_collect_info/batches/batch/symlinks.jsonl")
+ )
+ .unwrap(),
collect_infos[0].collect_blob
);
}
@@ -8643,16 +8860,19 @@ mod tests {
let root = TempDir::new().unwrap();
let dst_root = root.path().to_path_buf();
let assignment = FluxonFsTransferWorkerAssignmentWire {
- manifest_blob: build_transfer_manifest_blob(vec![
- FluxonFsTransferScanFrontierEntry {
- relpath: "dir/good.bin".to_string(),
- size: 5,
- },
- FluxonFsTransferScanFrontierEntry {
- relpath: "dir/bad.bin".to_string(),
- size: 5,
- },
- ], Vec::new())
+ manifest_blob: build_transfer_manifest_blob(
+ vec![
+ FluxonFsTransferScanFrontierEntry {
+ relpath: "dir/good.bin".to_string(),
+ size: 5,
+ },
+ FluxonFsTransferScanFrontierEntry {
+ relpath: "dir/bad.bin".to_string(),
+ size: 5,
+ },
+ ],
+ Vec::new(),
+ )
.unwrap(),
..test_worker_assignment("dir/good.bin", 5)
};
@@ -8816,20 +9036,16 @@ mod tests {
.unwrap();
let progress_heartbeat_count = Arc::new(AtomicUsize::new(0));
- gate.ensure_continue(
- false,
- TRANSFER_WORKER_HEARTBEAT_EMPTY_DIR_PROGRESS_COUNT,
- {
- let progress_heartbeat_count = progress_heartbeat_count.clone();
- move |_heartbeat_unix_ms, heartbeat_detail| {
- assert_eq!(heartbeat_detail, "empty_dir_progress");
- progress_heartbeat_count.fetch_add(1, Ordering::SeqCst);
- Ok(FluxonFsTransferWorkerHeartbeatResultWire::continue_running(
- chrono::Utc::now().timestamp_millis() + 60_000,
- ))
- }
- },
- )
+ gate.ensure_continue(false, TRANSFER_WORKER_HEARTBEAT_EMPTY_DIR_PROGRESS_COUNT, {
+ let progress_heartbeat_count = progress_heartbeat_count.clone();
+ move |_heartbeat_unix_ms, heartbeat_detail| {
+ assert_eq!(heartbeat_detail, "empty_dir_progress");
+ progress_heartbeat_count.fetch_add(1, Ordering::SeqCst);
+ Ok(FluxonFsTransferWorkerHeartbeatResultWire::continue_running(
+ chrono::Utc::now().timestamp_millis() + 60_000,
+ ))
+ }
+ })
.unwrap();
gate.ensure_continue(
@@ -8927,20 +9143,15 @@ mod tests {
let dst_root = root.path().to_path_buf();
let file_bytes = b"hello".to_vec();
let assignment = test_worker_assignment("dir/file.bin", file_bytes.len() as i64);
- let result = execute_transfer_worker_assignment(
- &assignment,
- &dst_root,
- || Ok(()),
- {
- let file_bytes = file_bytes.clone();
- move |_file, read_offset, _length| {
- if read_offset == 0 {
- return Ok(file_bytes.clone());
- }
- Ok(Vec::new())
+ let result = execute_transfer_worker_assignment(&assignment, &dst_root, || Ok(()), {
+ let file_bytes = file_bytes.clone();
+ move |_file, read_offset, _length| {
+ if read_offset == 0 {
+ return Ok(file_bytes.clone());
}
- },
- )
+ Ok(Vec::new())
+ }
+ })
.unwrap();
assert_eq!(result.file_results.len(), 1);
@@ -8948,7 +9159,10 @@ mod tests {
cleanup_transfer_worker_attempt_artifacts(&dst_root, &assignment).unwrap();
- assert_eq!(fs::read(root.path().join("dir/file.bin")).unwrap(), file_bytes);
+ assert_eq!(
+ fs::read(root.path().join("dir/file.bin")).unwrap(),
+ file_bytes
+ );
assert!(!root.path().join(".fluxon.stage").exists());
}
@@ -8957,11 +9171,12 @@ mod tests {
let root = TempDir::new().unwrap();
let dst_root = root.path().to_path_buf();
let file_bytes = b"hello".to_vec();
- let collect_infos = build_symlink_collect_infos(vec![FluxonFsTransferSymlinkNoticeEntryWire {
- relpath: "root/link-file.bin".to_string(),
- link_target: "target/file.bin".to_string(),
- }])
- .unwrap();
+ let collect_infos =
+ build_symlink_collect_infos(vec![FluxonFsTransferSymlinkNoticeEntryWire {
+ relpath: "root/link-file.bin".to_string(),
+ link_target: "target/file.bin".to_string(),
+ }])
+ .unwrap();
let assignment = FluxonFsTransferWorkerAssignmentWire {
collect_infos: collect_infos.clone(),
..test_worker_assignment("dir/file.bin", file_bytes.len() as i64)
@@ -8976,9 +9191,11 @@ mod tests {
let result = execute_transfer_worker_assignment(
&assignment,
&dst_root,
- || Err(TransferWorkerExecutionError::Stop(
- FluxonFsTransferWorkerStopReasonWire::Superseded,
- )),
+ || {
+ Err(TransferWorkerExecutionError::Stop(
+ FluxonFsTransferWorkerStopReasonWire::Superseded,
+ ))
+ },
{
let file_bytes = file_bytes.clone();
move |_file, read_offset, _length| {
@@ -8996,11 +9213,20 @@ mod tests {
FluxonFsTransferWorkerStopReasonWire::Superseded
))
));
- assert!(root.path().join(prepared_collect.staging_relpath.as_str()).exists());
+ assert!(
+ root.path()
+ .join(prepared_collect.staging_relpath.as_str())
+ .exists()
+ );
cleanup_transfer_worker_attempt_artifacts(&dst_root, &assignment).unwrap();
assert!(!root.path().join(".fluxon.stage").exists());
- assert!(!root.path().join(prepared_collect.staging_relpath.as_str()).exists());
+ assert!(
+ !root
+ .path()
+ .join(prepared_collect.staging_relpath.as_str())
+ .exists()
+ );
}
}
diff --git a/fluxon_rs/fluxon_fs/src/cache_controller.rs b/fluxon_rs/fluxon_fs/src/cache_controller.rs
index 8a0845c..13ce5a8 100644
--- a/fluxon_rs/fluxon_fs/src/cache_controller.rs
+++ b/fluxon_rs/fluxon_fs/src/cache_controller.rs
@@ -429,8 +429,8 @@ fn now_ms() -> i64 {
#[cfg(test)]
mod tests {
use super::*;
- use std::sync::mpsc;
use std::sync::atomic::{AtomicUsize, Ordering as AtomicOrdering};
+ use std::sync::mpsc;
use std::sync::{Condvar, Mutex};
use tokio::time::{Duration, sleep};
diff --git a/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs b/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs
index 827bb23..0866432 100644
--- a/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs
+++ b/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs
@@ -5344,10 +5344,9 @@ mod tests {
};
use crate::transfer::encode_transfer_manifest_blob_with_empty_dirs;
use fluxon_fs_core::config::{
- FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1,
- FS_EXPORT_DEFAULT_INLINE_BYTES_MAX_BYTES_V1,
- FS_EXPORT_DEFAULT_METADATA_CACHE_TTL_MS_V1,
FLUXON_FS_LOCAL_TRANSFER_CHECK_DST_EXPORT, FLUXON_FS_LOCAL_TRANSFER_CHECK_SRC_EXPORT,
+ FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1,
+ FS_EXPORT_DEFAULT_INLINE_BYTES_MAX_BYTES_V1, FS_EXPORT_DEFAULT_METADATA_CACHE_TTL_MS_V1,
FluxonFsAccessModel, FluxonFsAccessUser, FluxonFsExport, FluxonFsExportRoutingMode,
FluxonFsGlobalConfig, FluxonFsLocalTransferCheckJobSpecWire, FluxonFsRequestIdentity,
FluxonFsS3GatewayConfig, FluxonFsS3KvMissPolicy, FluxonFsS3PermissionAccount,
diff --git a/fluxon_rs/fluxon_kv/Cargo.toml b/fluxon_rs/fluxon_kv/Cargo.toml
index 22ff136..fe7c669 100644
--- a/fluxon_rs/fluxon_kv/Cargo.toml
+++ b/fluxon_rs/fluxon_kv/Cargo.toml
@@ -95,6 +95,7 @@ limit_thirdparty = { path = "../limit_thirdparty" }
fluxon_cli = { path = "../fluxon_cli" }
fluxon_util = { path = "../fluxon_util" }
fluxon_observability = { path = "../fluxon_observability" }
+fluxon_mq = { path = "../fluxon_mq" }
[build-dependencies]
tonic-build = { workspace = true }
fluxon_util = { path = "../fluxon_util" }
diff --git a/fluxon_rs/fluxon_kv/framework_init_steps.yaml b/fluxon_rs/fluxon_kv/framework_init_steps.yaml
index 923ae30..c90cd28 100644
--- a/fluxon_rs/fluxon_kv/framework_init_steps.yaml
+++ b/fluxon_rs/fluxon_kv/framework_init_steps.yaml
@@ -4,6 +4,8 @@ title: fluxon_kv init
variants:
- id: master
tags: [master]
+ - id: broker
+ tags: [broker, external]
- id: owner
tags: [owner]
- id: external
@@ -20,8 +22,8 @@ variants:
# - A step depends on a resource by declaring `deps: ["res:"]`.
resources:
- id: cluster_member_watch_ready
- tags: [master, owner, external]
- publish_tags: [master, owner, external]
+ tags: [master, broker, owner, external]
+ publish_tags: [master, broker, owner, external]
published_by: ClusterManager.step.1.init2
doc: |
- ClusterManager: member watch is established and continuous observation is available
@@ -56,8 +58,8 @@ resources:
# `Framework.step.0.attach_views`.
module_tags:
- ClusterManager: [master, owner, external]
- P2pModule: [master, owner, external]
+ ClusterManager: [master, broker, owner, external]
+ P2pModule: [master, broker, owner, external]
MasterSegManager: [master]
MasterKvRouter: [master]
MetricReporter: [master, owner, external]
diff --git a/fluxon_rs/fluxon_kv/src/client_seg_pool/mod.rs b/fluxon_rs/fluxon_kv/src/client_seg_pool/mod.rs
index 1aa6954..8c7cc78 100644
--- a/fluxon_rs/fluxon_kv/src/client_seg_pool/mod.rs
+++ b/fluxon_rs/fluxon_kv/src/client_seg_pool/mod.rs
@@ -237,10 +237,7 @@ impl ClientSegPool {
std::path::Path::new(share_mem_path).join(SIDE_TRANSFER_PEERS_DIRNAME)
}
- pub fn side_transfer_peer_file_path(
- share_mem_path: &str,
- side_id: &str,
- ) -> std::path::PathBuf {
+ pub fn side_transfer_peer_file_path(share_mem_path: &str, side_id: &str) -> std::path::PathBuf {
Self::side_transfer_peers_dir(share_mem_path).join(format!("{side_id}.json"))
}
@@ -399,17 +396,13 @@ impl ClientSegPool {
crate::rpcresp_kvresult_convert::msg_and_error::SharedMemError::MappingFailed {
path: String::new(),
len: map_len as u64,
- detail: "share_mem_path is empty; explicit configuration required"
- .to_string(),
+ detail: "share_mem_path is empty; explicit configuration required".to_string(),
},
));
}
let base_path = &share_mem_path;
- tracing::info!(
- "Using share_mem_path: {} for memory-mapped file",
- base_path
- );
+ tracing::info!("Using share_mem_path: {} for memory-mapped file", base_path);
std::fs::create_dir_all(base_path).map_err(|e| {
KvError::SharedMem(
crate::rpcresp_kvresult_convert::msg_and_error::SharedMemError::MappingFailed {
diff --git a/fluxon_rs/fluxon_kv/src/config.rs b/fluxon_rs/fluxon_kv/src/config.rs
index f9c7691..1577651 100644
--- a/fluxon_rs/fluxon_kv/src/config.rs
+++ b/fluxon_rs/fluxon_kv/src/config.rs
@@ -733,7 +733,7 @@ pub struct ClientConfig {
pub pprof_duration_seconds: Option,
pub redis_compat_listen_addr: Option,
pub fluxonkv_spec: FluxonKvSpec,
- pub share_mem_path: String, // Mandatory shared bundle path
+ pub share_mem_path: String, // Mandatory shared bundle path
pub large_file_paths: LargeFilePaths, // Mandatory large-file roots for logs and caches
pub test_spec_config: TestSpecConfig,
}
@@ -1170,13 +1170,15 @@ impl ClientConfigYaml {
} else {
let Some(large_file_paths_yaml) = self.fluxonkv_spec.large_file_paths.as_ref() else {
return Err(ConfigError::InvalidClientConfig {
- detail: "fluxonkv_spec.large_file_paths is required for owner mode"
- .to_string(),
+ detail: "fluxonkv_spec.large_file_paths is required for owner mode".to_string(),
}
.into_kverror());
};
LargeFilePaths {
- paths: verify_non_empty_root_path_list(&large_file_paths_yaml.0, "large_file_paths")?,
+ paths: verify_non_empty_root_path_list(
+ &large_file_paths_yaml.0,
+ "large_file_paths",
+ )?,
}
};
@@ -1647,7 +1649,9 @@ fluxonkv_spec:
.unwrap();
let err = cfg.verify().unwrap_err();
let text = format!("{err}");
- assert!(text.contains("fluxonkv_spec.large_file_paths is forbidden in zero-contribution mode"));
+ assert!(
+ text.contains("fluxonkv_spec.large_file_paths is forbidden in zero-contribution mode")
+ );
}
#[test]
@@ -1667,7 +1671,9 @@ fluxonkv_spec:
let logs_dir = large_file_paths.kv_logs_dir("test_cluster").unwrap();
assert_eq!(
logs_dir,
- first_root.join("child").join("test_cluster_cluster_kv_logs")
+ first_root
+ .join("child")
+ .join("test_cluster_cluster_kv_logs")
);
assert!(logs_dir.exists());
diff --git a/fluxon_rs/fluxon_kv/src/external_client_api/mod.rs b/fluxon_rs/fluxon_kv/src/external_client_api/mod.rs
index 9cb291f..b7715dd 100644
--- a/fluxon_rs/fluxon_kv/src/external_client_api/mod.rs
+++ b/fluxon_rs/fluxon_kv/src/external_client_api/mod.rs
@@ -865,8 +865,7 @@ impl ExternalInner {
return Ok(false);
}
- self.finish_owner_recover(&share_mem_path, payload)
- .await?;
+ self.finish_owner_recover(&share_mem_path, payload).await?;
Ok(true)
}
diff --git a/fluxon_rs/fluxon_kv/src/kv_test.rs b/fluxon_rs/fluxon_kv/src/kv_test.rs
index 5f0a9e2..910aac8 100644
--- a/fluxon_rs/fluxon_kv/src/kv_test.rs
+++ b/fluxon_rs/fluxon_kv/src/kv_test.rs
@@ -11,8 +11,9 @@
use crate::cluster_manager::ClusterManagerRdmaControlInit;
use crate::config::{
- ClientConfig, ContributeToClusterPoolSize, FluxonKvSpec, LargeFilePaths, MasterConfig, MonitoringConfig,
- ProtocolConfig, ProtocolType, TestSpecConfig, TestSpecTransportMode, TransferEngineType,
+ ClientConfig, ContributeToClusterPoolSize, FluxonKvSpec, LargeFilePaths, MasterConfig,
+ MonitoringConfig, ProtocolConfig, ProtocolType, TestSpecConfig, TestSpecTransportMode,
+ TransferEngineType,
};
use crate::run_master_with_test_overrides;
use crate::{ClientRunTestOverrides, MasterRunTestOverrides, run_client_with_test_overrides};
@@ -802,7 +803,6 @@ impl KvTestRoundOptions {
kv_test_run_scope()
)
}
-
}
#[derive(Clone, Debug)]
@@ -842,8 +842,7 @@ fn default_client_large_file_paths(
instance_key: &str,
contribute_to_cluster_pool_size: &ContributeToClusterPoolSize,
) -> LargeFilePaths {
- if contribute_to_cluster_pool_size.dram == 0
- && contribute_to_cluster_pool_size.vram.is_empty()
+ if contribute_to_cluster_pool_size.dram == 0 && contribute_to_cluster_pool_size.vram.is_empty()
{
return LargeFilePaths { paths: Vec::new() };
}
@@ -1381,7 +1380,10 @@ async fn key_meta_cache_check(
}
}
- tracing::info!("🔍 Starting PUT and GET in parallel: {}", parallel_unique_key);
+ tracing::info!(
+ "🔍 Starting PUT and GET in parallel: {}",
+ parallel_unique_key
+ );
for i in 0..10 {
let (put_client, other_client) = if i % 2 == 0 {
(client, client2)
@@ -1420,7 +1422,9 @@ async fn key_meta_cache_check(
}
assert!(
- put_client.client_kv_api().has_cached_key(parallel_unique_key),
+ put_client
+ .client_kv_api()
+ .has_cached_key(parallel_unique_key),
"put client should have immediate local cache metadata for key {} after put time {}",
parallel_unique_key,
i
diff --git a/fluxon_rs/fluxon_kv/src/lib.rs b/fluxon_rs/fluxon_kv/src/lib.rs
index edaa386..a7fd905 100644
--- a/fluxon_rs/fluxon_kv/src/lib.rs
+++ b/fluxon_rs/fluxon_kv/src/lib.rs
@@ -86,6 +86,10 @@ use external_client_api::{ExternalClientApi, ExternalClientApiNewArg};
use fluxon_commu::TransferBackendActivationMode;
use fluxon_framework::LogicalModule;
use fluxon_framework::{AnyResult, define_framework};
+use fluxon_mq::{
+ FLUXON_MQ_COMPONENT_BROKER_METADATA_VALUE, FLUXON_MQ_COMPONENT_METADATA_KEY,
+ register_broker_service,
+};
use master_kv_router::{MasterKvRouter, MasterKvRouterNewArg};
use master_seg_manager::MasterSegManager;
use metric_reporter::{
@@ -194,6 +198,11 @@ pub(crate) struct MasterRunTestOverrides {
pub transfer_backend_activation_mode: Option,
}
+#[derive(Clone, Debug)]
+pub(crate) struct BrokerRunTestOverrides {
+ pub rdma_control_init: ClusterManagerRdmaControlInit,
+}
+
/// Result of a unified `get` that carries the role-specific holder types.
#[derive(Clone)]
pub enum KvGetResult {
@@ -460,6 +469,12 @@ enum Commands {
#[arg(short = 'f', long = "config")]
config: Option,
},
+ /// Run as broker node
+ Broker {
+ /// Configuration file path
+ #[arg(short = 'f', long = "config")]
+ config: Option,
+ },
/// Run as client node
Client {
/// Configuration file path
@@ -1336,6 +1351,15 @@ pub async fn entry() -> Result<()> {
.await
.map_err(|e| anyhow::anyhow!("{}", e))?;
}
+ Commands::Broker { config } => {
+ let config_arg = config.map_or(ConfigArg::None, ConfigArg::File);
+ let (framework, _) = run_broker(config_arg).await?;
+ framework.wait_shutdown_signal().await;
+ framework
+ .shutdown()
+ .await
+ .map_err(|e| anyhow::anyhow!("{}", e))?;
+ }
Commands::Client { config } => {
let config_arg = config.map_or(ConfigArg::None, ConfigArg::File);
let (framework, _) = run_client(config_arg).await?;
@@ -1548,6 +1572,205 @@ pub async fn run_master(
run_master_impl(config_arg, None).await
}
+async fn run_broker_impl(
+ config_arg: ConfigArg,
+ test_overrides: Option,
+) -> Result<(Arc, ClientConfig)> {
+ #[cfg(unix)]
+ segfault_handler::install_sigsegv_classifier();
+
+ println!("Starting cache backend in BROKER mode");
+
+ let build_version = fluxon_util::git_version_build_record::get_current_git_commitid().unwrap();
+ let source_sha256 = fluxon_util::build_info::SOURCE_SHA256;
+ println!("Build version (git commit): {}", build_version);
+ println!("Build version (source-sha256): {}", source_sha256);
+
+ let config = load_client_config(config_arg)
+ .await
+ .map_err(|e| anyhow::anyhow!("Failed to load broker config: {}", e))?;
+
+ let dram = config.contribute_to_cluster_pool_size.dram;
+ let vram_is_zero = config
+ .contribute_to_cluster_pool_size
+ .vram
+ .values()
+ .all(|&v| v == 0);
+ if dram != 0 || !vram_is_zero {
+ anyhow::bail!(
+ "broker config must be a zero-contribution external-client config; instance_key={}",
+ config.instance_key
+ );
+ }
+ if matches!(
+ config.test_spec_config.side_transfer_role,
+ Some(SideTransferRole::Worker)
+ ) {
+ anyhow::bail!(
+ "broker config must not set test_spec_config.side_transfer_role=worker; instance_key={}",
+ config.instance_key
+ );
+ }
+
+ unsafe {
+ std::env::set_var(
+ "FLUXON_ENABLE_ICEORYX_LOGS",
+ if config.test_spec_config.enable_iceoryx_logs {
+ "1"
+ } else {
+ "0"
+ },
+ );
+ }
+
+ let config = bootstrap_zero_contribution_client_config(config).await?;
+
+ let kv_logs_dir = config
+ .large_file_paths
+ .kv_logs_dir(&config.cluster_name)
+ .map_err(|e| anyhow::anyhow!("invalid large_file_paths for broker kv logs: {}", e))?;
+ let observability_disabled = config.test_spec_config.disable_observability;
+ let greptime_tracing_rx = if observability_disabled {
+ fluxon_util::init_log(&kv_logs_dir, &config.instance_key);
+ None
+ } else {
+ let (greptime_tracing_layer, greptime_tracing_rx) =
+ fluxon_observability::greptime_otlp_tracing::new_tracing_layer(
+ crate::config::DEFAULT_OTLP_LOG_MAX_QUEUE_LINES,
+ );
+ fluxon_util::init_log_with_extra_layer(
+ &kv_logs_dir,
+ &config.instance_key,
+ greptime_tracing_layer,
+ );
+ Some(greptime_tracing_rx)
+ };
+ info!("Broker config: {:?}", config);
+ info!("Build version (git commit): {}", build_version);
+ info!("Build version (source-sha256): {}", source_sha256);
+
+ let mut metadata = HashMap::from([
+ ("external_client".to_string(), "true".to_string()),
+ (
+ FLUXON_MQ_COMPONENT_METADATA_KEY.to_string(),
+ FLUXON_MQ_COMPONENT_BROKER_METADATA_VALUE.to_string(),
+ ),
+ ("version".to_string(), build_version.clone()),
+ ]);
+ merge_startup_member_metadata(&mut metadata, HashMap::new())?;
+
+ let rdma_control_init = test_overrides
+ .as_ref()
+ .map(|overrides| overrides.rdma_control_init.clone())
+ .or_else(|| test_spec_config_rdma_control_init(Some(&config.test_spec_config)))
+ .unwrap_or_else(|| cluster_manager_rdma_control_init_from_config(&config));
+
+ let init_args = InitArgsBroker {
+ cluster_manager_arg: ClusterManagerNewArg {
+ etcd_endpoints: config.fluxonkv_spec.etcd_addresses.clone(),
+ cluster_name: config.cluster_name.clone(),
+ instance_name: Some(config.instance_key.clone()),
+ port: None,
+ metadata,
+ local_ipc_root: cluster_manager_local_ipc_root(
+ &config.share_mem_path,
+ &config.test_spec_config,
+ ),
+ rdma_control_init,
+ sub_cluster: config.fluxonkv_spec.sub_cluster.clone(),
+ network: None,
+ },
+ p2p_arg: P2pModuleNewArg::new(
+ config.fluxonkv_spec.p2p_listen_port,
+ tcp_thread_transport_tuning_from_test_spec_config(&config.test_spec_config),
+ config.test_spec_config.disable_crossowner_ipc,
+ config.test_spec_config.iceoryx_external_busy_poll,
+ )
+ .with_iceoryx_owner_client_busy_poll(config.test_spec_config.iceoryx_owner_client_busy_poll)
+ .with_user_rpc_sync_handler_thread_count(
+ config.test_spec_config.user_rpc_sync_handler_thread_count,
+ ),
+ metric_reporter_arg: MetricReporterNewArg {
+ test_spec_config: config.test_spec_config.clone(),
+ },
+ external_client_api_arg: ExternalClientApiNewArg {
+ share_mem_path: config.share_mem_path.clone(),
+ large_file_paths: config.large_file_paths.clone(),
+ expected_cluster_name: config.cluster_name.clone(),
+ expected_protocol_version: build_version.clone(),
+ enable_side_transfer: config.test_spec_config.enable_side_transfer,
+ short_circuit_put_payload_path: config.test_spec_config.short_circuit_put_payload_path,
+ },
+ };
+
+ let framework = Framework::new(format!(
+ "fluxon_kv.broker:{}:{}",
+ config.cluster_name, config.instance_key
+ ));
+ info!("Initializing broker framework...");
+
+ init_framework_broker(&framework, init_args)
+ .await
+ .map_err(|e| anyhow::anyhow!("Failed to initialize broker framework: {:#}", e))?;
+ register_broker_service(framework.p2p_view().clone(), 4096);
+
+ let framework = Arc::new(framework);
+
+ if !observability_disabled {
+ let otlp_cluster_name = config.cluster_name.clone();
+ let otlp_member_id = config.instance_key.clone();
+ let cm_view = framework.cluster_manager_view().clone();
+ let p2p_view = framework.p2p_view().clone();
+ let spawner = cm_view.clone();
+ let _ = spawner.spawn("wait_master_otlp_log_api_broker", async move {
+ let outcome = wait_master_observe_broadcast(
+ &cm_view,
+ std::time::Duration::from_secs(60),
+ std::time::Duration::from_secs(10),
+ )
+ .await;
+ let Some(cfg) = outcome.otlp_log_api() else {
+ warn!(
+ "Broker OTLP log exporter disabled: master metadata does not carry otlp_log_api"
+ );
+ return;
+ };
+
+ start_greptime_otlp_tracing_exporter_kv(
+ cm_view,
+ p2p_view,
+ Some(cfg),
+ greptime_tracing_rx,
+ &otlp_cluster_name,
+ fluxon_observability::types::FluxonMemberRole::Broker,
+ &otlp_member_id,
+ );
+ });
+ }
+
+ let shutdown_waiter = framework.cluster_manager_view().register_shutdown_waiter();
+ let kv_profiles_dir = config
+ .large_file_paths
+ .kv_profiles_dir(&config.cluster_name)
+ .map_err(|e| anyhow::anyhow!("invalid large_file_paths for broker kv profiles: {}", e))?;
+ profile::spawn_pprof_flamegraph_on_timeout_or_shutdown(
+ config.pprof_duration_seconds,
+ kv_profiles_dir,
+ config.cluster_name.clone(),
+ profile::PprofRole::Broker,
+ config.instance_key.clone(),
+ shutdown_waiter,
+ );
+
+ Ok((framework, config))
+}
+
+pub async fn run_broker(
+ config_arg: ConfigArg,
+) -> Result<(Arc, ClientConfig)> {
+ run_broker_impl(config_arg, None).await
+}
+
#[cfg(feature = "test_bins")]
pub(crate) async fn run_master_with_test_overrides(
config_arg: ConfigArg,
@@ -2736,8 +2959,8 @@ mod tests {
large_file_paths: crate::config::LargeFilePaths {
paths: vec![owner_large_root.to_string_lossy().into_owned()],
},
- protocol_version:
- fluxon_util::git_version_build_record::get_current_git_commitid().unwrap(),
+ protocol_version: fluxon_util::git_version_build_record::get_current_git_commitid()
+ .unwrap(),
write_ts: Some(chrono::Utc::now().timestamp_micros()),
};
let shared_meta_json = serde_json::to_string(&shared_meta).unwrap();
diff --git a/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs b/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs
index 5c20cc1..5d344c9 100755
--- a/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs
+++ b/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs
@@ -22,7 +22,8 @@ async fn test1_lease_expire_removes_keys() {
unsafe {
std::env::set_var("FLUXON_LOG", "debug");
}
- let (master_fw, client_fw) = start_master_and_client("lease_master_t1", "lease_client_t1").await;
+ let (master_fw, client_fw) =
+ start_master_and_client("lease_master_t1", "lease_client_t1").await;
let client_view = client_fw.client_kv_api_view();
wait_master_ready(&client_view).await;
@@ -82,7 +83,8 @@ async fn test2_rebind_to_new_lease_preserves_until_new_expire() {
unsafe {
std::env::set_var("FLUXON_LOG", "debug");
}
- let (master_fw, client_fw) = start_master_and_client("lease_master_t2", "lease_client_t2").await;
+ let (master_fw, client_fw) =
+ start_master_and_client("lease_master_t2", "lease_client_t2").await;
let client_view = client_fw.client_kv_api_view();
wait_master_ready(&client_view).await;
@@ -161,7 +163,8 @@ async fn test3_keepalive() {
unsafe {
std::env::set_var("FLUXON_LOG", "debug");
}
- let (master_fw, client_fw) = start_master_and_client("lease_master_t3", "lease_client_t3").await;
+ let (master_fw, client_fw) =
+ start_master_and_client("lease_master_t3", "lease_client_t3").await;
let client_view = client_fw.client_kv_api_view();
wait_master_ready(&client_view).await;
@@ -236,7 +239,8 @@ async fn test4_delete_under_lease_then_get_fails() {
unsafe {
std::env::set_var("FLUXON_LOG", "debug");
}
- let (master_fw, client_fw) = start_master_and_client("lease_master_t4", "lease_client_t4").await;
+ let (master_fw, client_fw) =
+ start_master_and_client("lease_master_t4", "lease_client_t4").await;
let client_view = client_fw.client_kv_api_view();
wait_master_ready(&client_view).await;
diff --git a/fluxon_rs/fluxon_kv/src/memholder/lifetime.rs b/fluxon_rs/fluxon_kv/src/memholder/lifetime.rs
index ad23b4d..1301a98 100755
--- a/fluxon_rs/fluxon_kv/src/memholder/lifetime.rs
+++ b/fluxon_rs/fluxon_kv/src/memholder/lifetime.rs
@@ -448,8 +448,8 @@ impl MemholderManagerTrait for MasterOwnerMemMgr {
const DELETE_SUBMIT_QUEUE_CAPACITY: usize = 1000;
const DELETE_TARGET_QUEUE_CAPACITY: usize = 1000;
- const DELETE_MERGE_WINDOW_MILLIS: u64 = 1000;
- const DELETE_RETRY_INTERVAL_MILLIS: u64 = 1000;
+ const DELETE_MERGE_WINDOW_MILLIS: u64 = 10;
+ const DELETE_RETRY_INTERVAL_MILLIS: u64 = 200;
#[inline]
fn inner_map(&self) -> &DashMap {
@@ -737,8 +737,8 @@ impl MemholderManagerTrait for OwnerExternalMemMgr {
const DELETE_SUBMIT_QUEUE_CAPACITY: usize = 1000;
const DELETE_TARGET_QUEUE_CAPACITY: usize = 1000;
- const DELETE_MERGE_WINDOW_MILLIS: u64 = 1000;
- const DELETE_RETRY_INTERVAL_MILLIS: u64 = 1000;
+ const DELETE_MERGE_WINDOW_MILLIS: u64 = 10;
+ const DELETE_RETRY_INTERVAL_MILLIS: u64 = 200;
#[inline]
fn inner_map(&self) -> &DashMap {
diff --git a/fluxon_rs/fluxon_kv/src/profile.rs b/fluxon_rs/fluxon_kv/src/profile.rs
index c2f40d7..2d04374 100755
--- a/fluxon_rs/fluxon_kv/src/profile.rs
+++ b/fluxon_rs/fluxon_kv/src/profile.rs
@@ -7,6 +7,7 @@ use tracing::{info, warn};
#[derive(Debug, Clone, Copy)]
pub(crate) enum PprofRole {
Master,
+ Broker,
Client,
}
@@ -14,6 +15,7 @@ impl PprofRole {
fn as_str(self) -> &'static str {
match self {
PprofRole::Master => "master",
+ PprofRole::Broker => "broker",
PprofRole::Client => "client",
}
}
diff --git a/fluxon_rs/fluxon_mq/Cargo.toml b/fluxon_rs/fluxon_mq/Cargo.toml
index 4f10f44..15f6329 100644
--- a/fluxon_rs/fluxon_mq/Cargo.toml
+++ b/fluxon_rs/fluxon_mq/Cargo.toml
@@ -17,6 +17,7 @@ parking_lot = { workspace = true }
paste = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
+bitcode = { workspace = true }
etcd-client = { workspace = true }
fluxon_util = { path = "../fluxon_util" }
fluxon_observability = { path = "../fluxon_observability" }
diff --git a/fluxon_rs/fluxon_mq/src/broker.rs b/fluxon_rs/fluxon_mq/src/broker.rs
new file mode 100644
index 0000000..69827eb
--- /dev/null
+++ b/fluxon_rs/fluxon_mq/src/broker.rs
@@ -0,0 +1,2878 @@
+use std::collections::{HashMap, HashSet, VecDeque};
+use std::env;
+use std::sync::atomic::{AtomicU64, Ordering};
+use std::sync::{Arc, OnceLock};
+use std::time::{Duration, SystemTime, UNIX_EPOCH};
+
+use bitcode::{Decode, Encode};
+use fluxon_commu::cluster_manager::ClusterManagerView;
+use fluxon_commu::p2p::rpc::{MsgPack, MsgPackSerializePart, RPCCaller, RPCHandler, RPCReq};
+use fluxon_commu::p2p::P2pModuleView;
+use serde::{Deserialize, Serialize};
+use thiserror::Error;
+use tokio::sync::{mpsc, oneshot, Mutex};
+
+use crate::keys::{self, MqCategory};
+use crate::manager::PRODUCE_OFFSET_BEGIN;
+
+const BROKER_RPC_REQ_MSG_ID: u32 = 8101;
+const BROKER_RPC_RESP_MSG_ID: u32 = 8102;
+pub const FLUXON_MQ_COMPONENT_METADATA_KEY: &str = "fluxon_mq_component";
+pub const FLUXON_MQ_COMPONENT_BROKER_METADATA_VALUE: &str = "broker";
+const BROKER_PAYLOAD_BYTES_CAP_ENV: &str = "FLUXON_MQ_BROKER_PAYLOAD_BYTES_CAP";
+const BROKER_PAYLOAD_BYTES_CAP_PERCENT_ENV: &str = "FLUXON_MQ_BROKER_PAYLOAD_BYTES_CAP_PERCENT";
+const BROKER_CLEANUP_RELEASE_DELAY_MS_ENV: &str = "FLUXON_MQ_BROKER_CLEANUP_RELEASE_DELAY_MS";
+const OWNER_POOL_DRAM_BYTES_ENV: &str = "FLUXON_OWNER_POOL_DRAM_BYTES";
+const DEFAULT_BROKER_PAYLOAD_BYTES_CAP: u64 = 64 * 1024 * 1024 * 1024;
+const DEFAULT_BROKER_PAYLOAD_BYTES_CAP_PERCENT: u64 = 60;
+const DEFAULT_BROKER_CLEANUP_RELEASE_DELAY_MS: u64 = 0;
+const BROKER_DISCOVERY_TIMEOUT: Duration = Duration::from_secs(15);
+const BROKER_RPC_RESPONSE_CACHE_LIMIT: usize = 65536;
+
+static BROKER_RPC_REQUEST_SEQ: AtomicU64 = AtomicU64::new(1);
+static BROKER_RPC_REQUEST_PREFIX: OnceLock = OnceLock::new();
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerChannelConfig {
+ pub channel_id: i64,
+ pub capacity: i64,
+}
+
+#[derive(Debug, Clone, Default, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerReserveRequest {
+ pub channel_id: i64,
+ pub producer_id: String,
+ pub category: MqCategory,
+ pub payload_bytes: u64,
+ pub now_ms: i64,
+}
+
+#[derive(Debug, Clone, Default, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerFetchRequest {
+ pub channel_id: i64,
+ pub consumer_id: String,
+ pub now_ms: i64,
+}
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerEnvelope {
+ pub channel_id: i64,
+ pub producer_id: String,
+ pub msg_id: i64,
+ pub reservation_id: u64,
+ pub payload_key: String,
+ pub payload_bytes: u64,
+ pub reserved_at_ms: i64,
+ pub published_at_ms: Option,
+}
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerReservation {
+ pub envelope: BrokerEnvelope,
+}
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerFetchedMessage {
+ pub envelope: BrokerEnvelope,
+}
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerFetchBatch {
+ pub messages: Vec,
+}
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerCommitOutcome {
+ pub first_commit: bool,
+ pub cleanup: Option,
+}
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize, Encode, Decode)]
+pub struct BrokerCommitBatchOutcome {
+ pub first_commit_count: usize,
+ pub cleanup: Vec,
+}
+
+#[derive(Debug, Error, PartialEq, Eq, Clone, Serialize, Deserialize, Encode, Decode)]
+pub enum BrokerError {
+ #[error("broker channel not found: channel_id={0}")]
+ ChannelNotFound(i64),
+
+ #[error(
+ "broker channel capacity must be positive: channel_id={channel_id} capacity={capacity}"
+ )]
+ InvalidCapacity { channel_id: i64, capacity: i64 },
+
+ #[error(
+ "broker channel is full: channel_id={channel_id} capacity={capacity} used_slots={used_slots}"
+ )]
+ ChannelFull {
+ channel_id: i64,
+ capacity: i64,
+ used_slots: i64,
+ },
+
+ #[error(
+ "broker payload byte budget is full: requested_bytes={requested_bytes} capacity_bytes={capacity_bytes} used_bytes={used_bytes}"
+ )]
+ PayloadBytesFull {
+ requested_bytes: u64,
+ capacity_bytes: u64,
+ used_bytes: u64,
+ },
+
+ #[error(
+ "broker payload is larger than byte budget: requested_bytes={requested_bytes} capacity_bytes={capacity_bytes}"
+ )]
+ PayloadTooLarge {
+ requested_bytes: u64,
+ capacity_bytes: u64,
+ },
+
+ #[error(
+ "broker reservation not found: channel_id={channel_id} reservation_id={reservation_id}"
+ )]
+ ReservationNotFound {
+ channel_id: i64,
+ reservation_id: u64,
+ },
+
+ #[error(
+ "broker delivery not in-flight: channel_id={channel_id} reservation_id={reservation_id}"
+ )]
+ DeliveryNotFound {
+ channel_id: i64,
+ reservation_id: u64,
+ },
+
+ #[error("invalid broker state transition: {0}")]
+ InvalidRecord(String),
+
+ #[error("broker master unavailable: {0}")]
+ BrokerUnavailable(String),
+
+ #[error("broker rpc error: {0}")]
+ Rpc(String),
+
+ #[error("broker actor closed")]
+ ActorClosed,
+}
+
+#[derive(Debug, Default)]
+pub struct LocalBroker {
+ state: BrokerState,
+}
+
+#[derive(Debug)]
+struct BrokerState {
+ channels: HashMap,
+ payload_byte_capacity: u64,
+ used_payload_bytes: u64,
+}
+
+impl Default for BrokerState {
+ fn default() -> Self {
+ Self {
+ channels: HashMap::new(),
+ payload_byte_capacity: default_payload_byte_capacity(),
+ used_payload_bytes: 0,
+ }
+ }
+}
+
+#[derive(Debug)]
+struct ChannelState {
+ config: BrokerChannelConfig,
+ next_reservation_id: u64,
+ next_msg_by_producer: HashMap,
+ pending: HashMap,
+ visible: VecDeque,
+ inflight: HashMap,
+ inflight_order: VecDeque,
+ cleanup: VecDeque,
+ cleanup_inflight: HashMap,
+ used_slots: i64,
+ reserve_waiters: VecDeque,
+ fetch_waiters: VecDeque,
+}
+
+impl ChannelState {
+ fn new(config: BrokerChannelConfig) -> Self {
+ Self {
+ config,
+ next_reservation_id: 1,
+ next_msg_by_producer: HashMap::new(),
+ pending: HashMap::new(),
+ visible: VecDeque::new(),
+ inflight: HashMap::new(),
+ inflight_order: VecDeque::new(),
+ cleanup: VecDeque::new(),
+ cleanup_inflight: HashMap::new(),
+ used_slots: 0,
+ reserve_waiters: VecDeque::new(),
+ fetch_waiters: VecDeque::new(),
+ }
+ }
+}
+
+#[derive(Debug)]
+struct ReserveWaiter {
+ req: BrokerReserveRequest,
+ reply: oneshot::Sender>,
+}
+
+#[derive(Debug)]
+struct FetchWaiter {
+ req: BrokerFetchRequest,
+ reply: oneshot::Sender, BrokerError>>,
+}
+
+impl LocalBroker {
+ pub fn new() -> Self {
+ Self::default()
+ }
+
+ #[cfg(test)]
+ fn with_payload_byte_capacity(payload_byte_capacity: u64) -> Self {
+ Self {
+ state: BrokerState {
+ channels: HashMap::new(),
+ payload_byte_capacity: payload_byte_capacity.max(1),
+ used_payload_bytes: 0,
+ },
+ }
+ }
+
+ pub fn upsert_channel(&mut self, config: BrokerChannelConfig) -> Result<(), BrokerError> {
+ validate_capacity(&config)?;
+ match self.state.channels.get_mut(&config.channel_id) {
+ Some(channel) => {
+ if config.capacity < channel.used_slots {
+ return Err(BrokerError::InvalidRecord(format!(
+ "channel_id={} capacity={} below used_slots={}",
+ config.channel_id, config.capacity, channel.used_slots
+ )));
+ }
+ channel.config = config;
+ }
+ None => {
+ self.state
+ .channels
+ .insert(config.channel_id, ChannelState::new(config));
+ }
+ }
+ Ok(())
+ }
+
+ pub fn delete_channel(&mut self, channel_id: i64) -> Result, BrokerError> {
+ let payload_keys = self.delete_channel_state(channel_id);
+ Ok(payload_keys)
+ }
+
+ pub fn reserve(&mut self, req: BrokerReserveRequest) -> Result {
+ let channel = self.channel(req.channel_id)?;
+ if broker_category_enforces_capacity(req.category)
+ && channel.used_slots >= channel.config.capacity
+ {
+ return Err(BrokerError::ChannelFull {
+ channel_id: req.channel_id,
+ capacity: channel.config.capacity,
+ used_slots: channel.used_slots,
+ });
+ }
+
+ let msg_id = channel
+ .next_msg_by_producer
+ .get(&req.producer_id)
+ .copied()
+ .unwrap_or(PRODUCE_OFFSET_BEGIN + 1);
+ let reservation_id = channel.next_reservation_id;
+ let payload_key = keys::backend_message_key_with_category(
+ req.channel_id,
+ &req.producer_id,
+ msg_id,
+ &req.category,
+ );
+ let payload_bytes = req.payload_bytes.max(1);
+ if payload_bytes > self.state.payload_byte_capacity {
+ return Err(BrokerError::PayloadTooLarge {
+ requested_bytes: payload_bytes,
+ capacity_bytes: self.state.payload_byte_capacity,
+ });
+ }
+ if self.state.used_payload_bytes.saturating_add(payload_bytes)
+ > self.state.payload_byte_capacity
+ {
+ return Err(BrokerError::PayloadBytesFull {
+ requested_bytes: payload_bytes,
+ capacity_bytes: self.state.payload_byte_capacity,
+ used_bytes: self.state.used_payload_bytes,
+ });
+ }
+
+ let envelope = BrokerEnvelope {
+ channel_id: req.channel_id,
+ producer_id: req.producer_id,
+ msg_id,
+ reservation_id,
+ payload_key,
+ payload_bytes,
+ reserved_at_ms: req.now_ms,
+ published_at_ms: None,
+ };
+ let channel = self.channel_mut(req.channel_id)?;
+ channel.next_reservation_id = reservation_id + 1;
+ let next_msg = channel
+ .next_msg_by_producer
+ .entry(envelope.producer_id.clone())
+ .or_insert(PRODUCE_OFFSET_BEGIN + 1);
+ *next_msg = (*next_msg).max(msg_id + 1);
+ channel.pending.insert(reservation_id, envelope.clone());
+ channel.used_slots += 1;
+ self.state.used_payload_bytes += payload_bytes;
+ Ok(BrokerReservation { envelope })
+ }
+
+ pub fn publish(
+ &mut self,
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ ) -> Result {
+ let channel = self.channel_mut(channel_id)?;
+ let mut envelope =
+ channel
+ .pending
+ .remove(&reservation_id)
+ .ok_or(BrokerError::ReservationNotFound {
+ channel_id,
+ reservation_id,
+ })?;
+ envelope.published_at_ms = Some(now_ms);
+ channel.visible.push_back(envelope.clone());
+ Ok(envelope)
+ }
+
+ pub fn abort(&mut self, channel_id: i64, reservation_id: u64) -> Result<(), BrokerError> {
+ let channel = self.channel_mut(channel_id)?;
+ let envelope =
+ channel
+ .pending
+ .remove(&reservation_id)
+ .ok_or(BrokerError::ReservationNotFound {
+ channel_id,
+ reservation_id,
+ })?;
+ channel.used_slots -= 1;
+ self.release_payload_bytes(envelope.payload_bytes);
+ Ok(())
+ }
+
+ pub fn fetch_next(
+ &mut self,
+ req: BrokerFetchRequest,
+ ) -> Result, BrokerError> {
+ let channel = self.channel_mut(req.channel_id)?;
+ let Some(envelope) = channel.visible.pop_front() else {
+ return Ok(None);
+ };
+ channel
+ .inflight
+ .insert(envelope.reservation_id, envelope.clone());
+ channel.inflight_order.push_back(envelope.reservation_id);
+ Ok(Some(BrokerFetchedMessage { envelope }))
+ }
+
+ pub fn fetch_batch_available(
+ &mut self,
+ req: BrokerFetchRequest,
+ max_items: usize,
+ ) -> Result {
+ let mut messages = Vec::new();
+ for _ in 0..max_items {
+ let Some(message) = self.fetch_next(req.clone())? else {
+ break;
+ };
+ messages.push(message);
+ }
+ Ok(BrokerFetchBatch { messages })
+ }
+
+ pub fn commit(
+ &mut self,
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ ) -> Result {
+ let _ = now_ms;
+ let channel = self.channel_mut(channel_id)?;
+ if cleanup_contains(channel, reservation_id) {
+ return Ok(BrokerCommitOutcome {
+ first_commit: false,
+ cleanup: None,
+ });
+ }
+ let envelope =
+ channel
+ .inflight
+ .remove(&reservation_id)
+ .ok_or(BrokerError::DeliveryNotFound {
+ channel_id,
+ reservation_id,
+ })?;
+ remove_from_deque(&mut channel.inflight_order, reservation_id);
+ channel.cleanup.push_back(envelope.clone());
+ channel.used_slots -= 1;
+ Ok(BrokerCommitOutcome {
+ first_commit: true,
+ cleanup: Some(envelope),
+ })
+ }
+
+ pub fn commit_batch(
+ &mut self,
+ channel_id: i64,
+ reservation_ids: Vec,
+ now_ms: i64,
+ ) -> Result {
+ let mut cleanup = Vec::new();
+ let mut first_commit_count = 0usize;
+ for reservation_id in reservation_ids {
+ let outcome = self.commit(channel_id, reservation_id, now_ms)?;
+ if outcome.first_commit {
+ first_commit_count += 1;
+ if let Some(envelope) = outcome.cleanup {
+ cleanup.push(envelope);
+ }
+ }
+ }
+ Ok(BrokerCommitBatchOutcome {
+ first_commit_count,
+ cleanup,
+ })
+ }
+
+ pub fn requeue_inflight(
+ &mut self,
+ channel_id: i64,
+ reservation_id: u64,
+ ) -> Result<(), BrokerError> {
+ let channel = self.channel_mut(channel_id)?;
+ let envelope =
+ channel
+ .inflight
+ .remove(&reservation_id)
+ .ok_or(BrokerError::DeliveryNotFound {
+ channel_id,
+ reservation_id,
+ })?;
+ remove_from_deque(&mut channel.inflight_order, reservation_id);
+ channel.visible.push_front(envelope);
+ Ok(())
+ }
+
+ pub fn requeue_inflight_batch(
+ &mut self,
+ channel_id: i64,
+ reservation_ids: Vec,
+ ) -> Result<(), BrokerError> {
+ let channel = self.channel(channel_id)?;
+ let mut seen = HashSet::new();
+ for reservation_id in &reservation_ids {
+ if !seen.insert(*reservation_id) {
+ return Err(BrokerError::InvalidRecord(format!(
+ "duplicate requeue reservation_id={} for channel_id={}",
+ reservation_id, channel_id
+ )));
+ }
+ if !channel.inflight.contains_key(reservation_id) {
+ return Err(BrokerError::DeliveryNotFound {
+ channel_id,
+ reservation_id: *reservation_id,
+ });
+ }
+ }
+
+ for reservation_id in reservation_ids.into_iter().rev() {
+ self.requeue_inflight(channel_id, reservation_id)?;
+ }
+ Ok(())
+ }
+
+ pub fn requeue_all_inflight(&mut self, channel_id: i64) -> Result<(), BrokerError> {
+ let reservation_ids: Vec = self
+ .channel(channel_id)?
+ .inflight_order
+ .iter()
+ .rev()
+ .copied()
+ .collect();
+ for reservation_id in reservation_ids {
+ self.requeue_inflight(channel_id, reservation_id)?;
+ }
+ Ok(())
+ }
+
+ pub fn take_cleanup_batch(
+ &mut self,
+ channel_id: i64,
+ max_items: usize,
+ ) -> Result, BrokerError> {
+ let channel = self.channel_mut(channel_id)?;
+ let mut batch = Vec::new();
+ for _ in 0..max_items {
+ let Some(envelope) = channel.cleanup.pop_front() else {
+ break;
+ };
+ channel
+ .cleanup_inflight
+ .insert(envelope.reservation_id, envelope.clone());
+ batch.push(envelope);
+ }
+ Ok(batch)
+ }
+
+ pub fn cleanup_ack(&mut self, channel_id: i64, reservation_id: u64) -> Result<(), BrokerError> {
+ let _ = self.apply_cleanup_ack(channel_id, reservation_id, true)?;
+ Ok(())
+ }
+
+ pub fn cleanup_ack_for_delayed_release(
+ &mut self,
+ channel_id: i64,
+ reservation_id: u64,
+ ) -> Result {
+ self.apply_cleanup_ack(channel_id, reservation_id, false)
+ }
+
+ pub fn cleanup_nack(
+ &mut self,
+ channel_id: i64,
+ reservation_id: u64,
+ ) -> Result<(), BrokerError> {
+ let channel = self.channel_mut(channel_id)?;
+ if let Some(envelope) = channel.cleanup_inflight.remove(&reservation_id) {
+ channel.cleanup.push_front(envelope);
+ }
+ Ok(())
+ }
+
+ fn release_payload_bytes(&mut self, payload_bytes: u64) {
+ self.state.used_payload_bytes = self.state.used_payload_bytes.saturating_sub(payload_bytes);
+ }
+
+ fn delete_channel_state(&mut self, channel_id: i64) -> Vec {
+ let Some(mut channel) = self.state.channels.remove(&channel_id) else {
+ return Vec::new();
+ };
+
+ let mut payload_bytes = 0u64;
+ let mut payload_keys = Vec::new();
+ collect_deleted_payloads(
+ channel.pending.drain().map(|(_, envelope)| envelope),
+ &mut payload_keys,
+ &mut payload_bytes,
+ );
+ collect_deleted_payloads(
+ channel.visible.drain(..),
+ &mut payload_keys,
+ &mut payload_bytes,
+ );
+ collect_deleted_payloads(
+ channel.inflight.drain().map(|(_, envelope)| envelope),
+ &mut payload_keys,
+ &mut payload_bytes,
+ );
+ collect_deleted_payloads(
+ channel.cleanup.drain(..),
+ &mut payload_keys,
+ &mut payload_bytes,
+ );
+ collect_deleted_payloads(
+ channel
+ .cleanup_inflight
+ .drain()
+ .map(|(_, envelope)| envelope),
+ &mut payload_keys,
+ &mut payload_bytes,
+ );
+
+ while let Some(waiter) = channel.reserve_waiters.pop_front() {
+ let _ = waiter
+ .reply
+ .send(Err(BrokerError::ChannelNotFound(channel_id)));
+ }
+ while let Some(waiter) = channel.fetch_waiters.pop_front() {
+ let _ = waiter
+ .reply
+ .send(Err(BrokerError::ChannelNotFound(channel_id)));
+ }
+
+ self.release_payload_bytes(payload_bytes);
+ payload_keys
+ }
+
+ fn apply_cleanup_ack(
+ &mut self,
+ channel_id: i64,
+ reservation_id: u64,
+ release_payload_now: bool,
+ ) -> Result {
+ let channel = self.channel_mut(channel_id)?;
+ let envelope = if let Some(envelope) = channel.cleanup_inflight.remove(&reservation_id) {
+ envelope
+ } else if let Some(pos) = channel
+ .cleanup
+ .iter()
+ .position(|env| env.reservation_id == reservation_id)
+ {
+ channel
+ .cleanup
+ .remove(pos)
+ .expect("cleanup envelope position checked above")
+ } else {
+ return Err(BrokerError::ReservationNotFound {
+ channel_id,
+ reservation_id,
+ });
+ };
+ let payload_bytes = envelope.payload_bytes;
+ if release_payload_now {
+ self.release_payload_bytes(payload_bytes);
+ }
+ Ok(payload_bytes)
+ }
+
+ fn channel(&self, channel_id: i64) -> Result<&ChannelState, BrokerError> {
+ self.state
+ .channels
+ .get(&channel_id)
+ .ok_or(BrokerError::ChannelNotFound(channel_id))
+ }
+
+ fn channel_mut(&mut self, channel_id: i64) -> Result<&mut ChannelState, BrokerError> {
+ self.state
+ .channels
+ .get_mut(&channel_id)
+ .ok_or(BrokerError::ChannelNotFound(channel_id))
+ }
+}
+
+fn drain_reserve_waiters(broker: &mut LocalBroker) {
+ loop {
+ let channel_ids: Vec = broker.state.channels.keys().copied().collect();
+ let mut progressed = false;
+ for channel_id in channel_ids {
+ progressed |= drain_reserve_waiters_for_channel(broker, channel_id);
+ }
+ if !progressed {
+ return;
+ }
+ }
+}
+
+fn drain_reserve_waiters_for_channel(broker: &mut LocalBroker, channel_id: i64) -> bool {
+ let mut progressed = false;
+ loop {
+ let waiter = match broker.channel_mut(channel_id) {
+ Ok(channel) => channel.reserve_waiters.pop_front(),
+ Err(_) => return progressed,
+ };
+ let Some(waiter) = waiter else {
+ return progressed;
+ };
+
+ match broker.reserve(waiter.req.clone()) {
+ Ok(reservation) => {
+ if let Err(Ok(reservation)) = waiter.reply.send(Ok(reservation)) {
+ let _ = broker.abort(channel_id, reservation.envelope.reservation_id);
+ }
+ progressed = true;
+ }
+ Err(BrokerError::ChannelFull { .. }) | Err(BrokerError::PayloadBytesFull { .. }) => {
+ if let Ok(channel) = broker.channel_mut(channel_id) {
+ channel.reserve_waiters.push_front(waiter);
+ }
+ return progressed;
+ }
+ Err(err) => {
+ let _ = waiter.reply.send(Err(err));
+ progressed = true;
+ }
+ }
+ }
+}
+
+fn drain_fetch_waiters_for_channel(broker: &mut LocalBroker, channel_id: i64) {
+ loop {
+ let waiter = match broker.channel_mut(channel_id) {
+ Ok(channel) => channel.fetch_waiters.pop_front(),
+ Err(_) => return,
+ };
+ let Some(waiter) = waiter else {
+ return;
+ };
+
+ match broker.fetch_next(waiter.req.clone()) {
+ Ok(Some(fetched)) => {
+ if let Err(Ok(Some(fetched))) = waiter.reply.send(Ok(Some(fetched))) {
+ let _ = broker.requeue_inflight(
+ fetched.envelope.channel_id,
+ fetched.envelope.reservation_id,
+ );
+ }
+ }
+ Ok(None) => {
+ if let Ok(channel) = broker.channel_mut(channel_id) {
+ channel.fetch_waiters.push_front(waiter);
+ }
+ return;
+ }
+ Err(err) => {
+ let _ = waiter.reply.send(Err(err));
+ }
+ }
+ }
+}
+
+fn fail_all_waiters_with_actor_closed(broker: &mut LocalBroker) {
+ for channel in broker.state.channels.values_mut() {
+ while let Some(waiter) = channel.reserve_waiters.pop_front() {
+ let _ = waiter.reply.send(Err(BrokerError::ActorClosed));
+ }
+ while let Some(waiter) = channel.fetch_waiters.pop_front() {
+ let _ = waiter.reply.send(Err(BrokerError::ActorClosed));
+ }
+ }
+}
+
+fn collect_deleted_payloads(
+ envelopes: impl Iterator- ,
+ payload_keys: &mut Vec
,
+ payload_bytes: &mut u64,
+) {
+ for envelope in envelopes {
+ *payload_bytes = payload_bytes.saturating_add(envelope.payload_bytes);
+ payload_keys.push(envelope.payload_key);
+ }
+}
+
+fn cleanup_contains(channel: &ChannelState, reservation_id: u64) -> bool {
+ channel.cleanup_inflight.contains_key(&reservation_id)
+ || channel
+ .cleanup
+ .iter()
+ .any(|env| env.reservation_id == reservation_id)
+}
+
+enum BrokerCommand {
+ UpsertChannel {
+ config: BrokerChannelConfig,
+ reply: oneshot::Sender>,
+ },
+ DeleteChannel {
+ channel_id: i64,
+ reply: oneshot::Sender, BrokerError>>,
+ },
+ Reserve {
+ req: BrokerReserveRequest,
+ reply: oneshot::Sender>,
+ },
+ Publish {
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ reply: oneshot::Sender>,
+ },
+ Abort {
+ channel_id: i64,
+ reservation_id: u64,
+ reply: oneshot::Sender>,
+ },
+ FetchNext {
+ req: BrokerFetchRequest,
+ reply: oneshot::Sender, BrokerError>>,
+ },
+ FetchBatchAvailable {
+ req: BrokerFetchRequest,
+ max_items: usize,
+ reply: oneshot::Sender>,
+ },
+ Commit {
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ reply: oneshot::Sender>,
+ },
+ CommitBatch {
+ channel_id: i64,
+ reservation_ids: Vec,
+ now_ms: i64,
+ reply: oneshot::Sender>,
+ },
+ RequeueInflight {
+ channel_id: i64,
+ reservation_id: u64,
+ reply: oneshot::Sender>,
+ },
+ RequeueInflightBatch {
+ channel_id: i64,
+ reservation_ids: Vec,
+ reply: oneshot::Sender>,
+ },
+ RequeueAllInflight {
+ channel_id: i64,
+ reply: oneshot::Sender>,
+ },
+ TakeCleanupBatch {
+ channel_id: i64,
+ max_items: usize,
+ reply: oneshot::Sender, BrokerError>>,
+ },
+ CleanupAck {
+ channel_id: i64,
+ reservation_id: u64,
+ reply: oneshot::Sender>,
+ },
+ CleanupNack {
+ channel_id: i64,
+ reservation_id: u64,
+ reply: oneshot::Sender>,
+ },
+ ReleasePayloadBytes {
+ payload_bytes: u64,
+ },
+ Shutdown {
+ reply: oneshot::Sender>,
+ },
+}
+
+#[derive(Clone, Debug)]
+struct LocalBrokerHandle {
+ tx: mpsc::Sender,
+}
+
+impl LocalBrokerHandle {
+ fn spawn_actor(broker: LocalBroker, queue_capacity: usize) -> Self {
+ Self::spawn_actor_with_cleanup_release_delay(
+ broker,
+ queue_capacity,
+ default_cleanup_release_delay(),
+ )
+ }
+
+ fn spawn_actor_with_cleanup_release_delay(
+ broker: LocalBroker,
+ queue_capacity: usize,
+ cleanup_release_delay: Duration,
+ ) -> Self {
+ let (tx, mut rx) = mpsc::channel(queue_capacity.max(1));
+ let tx_for_actor = tx.clone();
+ tokio::spawn(async move {
+ let mut broker = broker;
+ while let Some(cmd) = rx.recv().await {
+ match cmd {
+ BrokerCommand::UpsertChannel { config, reply } => {
+ let channel_id = config.channel_id;
+ let result = broker.upsert_channel(config);
+ if result.is_ok() {
+ let _ = channel_id;
+ drain_reserve_waiters(&mut broker);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::DeleteChannel { channel_id, reply } => {
+ let result = broker.delete_channel(channel_id);
+ if result.is_ok() {
+ drain_reserve_waiters(&mut broker);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::Reserve { req, reply } => {
+ let req_clone = req.clone();
+ match broker.reserve(req_clone) {
+ Ok(reservation) => {
+ let _ = reply.send(Ok(reservation));
+ }
+ Err(err) => {
+ let _ = reply.send(Err(err));
+ }
+ }
+ }
+ BrokerCommand::Publish {
+ channel_id,
+ reservation_id,
+ now_ms,
+ reply,
+ } => {
+ let result = broker.publish(channel_id, reservation_id, now_ms);
+ if result.is_ok() {
+ drain_fetch_waiters_for_channel(&mut broker, channel_id);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::Abort {
+ channel_id,
+ reservation_id,
+ reply,
+ } => {
+ let result = broker.abort(channel_id, reservation_id);
+ if result.is_ok() {
+ drain_reserve_waiters(&mut broker);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::FetchNext { req, reply } => {
+ let req_clone = req.clone();
+ match broker.fetch_next(req_clone) {
+ Ok(Some(message)) => {
+ let _ = reply.send(Ok(Some(message)));
+ }
+ Ok(None) => match broker.channel_mut(req.channel_id) {
+ Ok(channel) => {
+ channel.fetch_waiters.push_back(FetchWaiter { req, reply })
+ }
+ Err(err) => {
+ let _ = reply.send(Err(err));
+ }
+ },
+ Err(err) => {
+ let _ = reply.send(Err(err));
+ }
+ }
+ }
+ BrokerCommand::FetchBatchAvailable {
+ req,
+ max_items,
+ reply,
+ } => {
+ let _ = reply.send(broker.fetch_batch_available(req, max_items));
+ }
+ BrokerCommand::Commit {
+ channel_id,
+ reservation_id,
+ now_ms,
+ reply,
+ } => {
+ let result = broker.commit(channel_id, reservation_id, now_ms);
+ if result.is_ok() {
+ drain_reserve_waiters(&mut broker);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::CommitBatch {
+ channel_id,
+ reservation_ids,
+ now_ms,
+ reply,
+ } => {
+ let result = broker.commit_batch(channel_id, reservation_ids, now_ms);
+ if result.is_ok() {
+ drain_reserve_waiters(&mut broker);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::RequeueInflight {
+ channel_id,
+ reservation_id,
+ reply,
+ } => {
+ let result = broker.requeue_inflight(channel_id, reservation_id);
+ if result.is_ok() {
+ drain_fetch_waiters_for_channel(&mut broker, channel_id);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::RequeueInflightBatch {
+ channel_id,
+ reservation_ids,
+ reply,
+ } => {
+ let result = broker.requeue_inflight_batch(channel_id, reservation_ids);
+ if result.is_ok() {
+ drain_fetch_waiters_for_channel(&mut broker, channel_id);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::RequeueAllInflight { channel_id, reply } => {
+ let result = broker.requeue_all_inflight(channel_id);
+ if result.is_ok() {
+ drain_fetch_waiters_for_channel(&mut broker, channel_id);
+ }
+ let _ = reply.send(result);
+ }
+ BrokerCommand::TakeCleanupBatch {
+ channel_id,
+ max_items,
+ reply,
+ } => {
+ let _ = reply.send(broker.take_cleanup_batch(channel_id, max_items));
+ }
+ BrokerCommand::CleanupAck {
+ channel_id,
+ reservation_id,
+ reply,
+ } => {
+ let result =
+ broker.cleanup_ack_for_delayed_release(channel_id, reservation_id);
+ match result {
+ Ok(payload_bytes) if cleanup_release_delay.is_zero() => {
+ broker.release_payload_bytes(payload_bytes);
+ drain_reserve_waiters(&mut broker);
+ let _ = reply.send(Ok(()));
+ }
+ Ok(payload_bytes) => {
+ let tx_release = tx_for_actor.clone();
+ tokio::spawn(async move {
+ tokio::time::sleep(cleanup_release_delay).await;
+ let _ = tx_release
+ .send(BrokerCommand::ReleasePayloadBytes { payload_bytes })
+ .await;
+ });
+ let _ = reply.send(Ok(()));
+ }
+ Err(err) => {
+ let _ = reply.send(Err(err));
+ }
+ }
+ }
+ BrokerCommand::ReleasePayloadBytes { payload_bytes } => {
+ broker.release_payload_bytes(payload_bytes);
+ if payload_bytes > 0 {
+ drain_reserve_waiters(&mut broker);
+ }
+ }
+ BrokerCommand::CleanupNack {
+ channel_id,
+ reservation_id,
+ reply,
+ } => {
+ let _ = reply.send(broker.cleanup_nack(channel_id, reservation_id));
+ }
+ BrokerCommand::Shutdown { reply } => {
+ fail_all_waiters_with_actor_closed(&mut broker);
+ let _ = reply.send(Ok(()));
+ break;
+ }
+ }
+ }
+ });
+ Self { tx }
+ }
+
+ async fn upsert_channel(&self, config: BrokerChannelConfig) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::UpsertChannel { config, reply })
+ .await
+ }
+
+ async fn delete_channel(&self, channel_id: i64) -> Result, BrokerError> {
+ self.request(|reply| BrokerCommand::DeleteChannel { channel_id, reply })
+ .await
+ }
+
+ async fn reserve(&self, req: BrokerReserveRequest) -> Result {
+ self.request(|reply| BrokerCommand::Reserve { req, reply })
+ .await
+ }
+
+ async fn publish(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ ) -> Result {
+ self.request(|reply| BrokerCommand::Publish {
+ channel_id,
+ reservation_id,
+ now_ms,
+ reply,
+ })
+ .await
+ }
+
+ async fn abort(&self, channel_id: i64, reservation_id: u64) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::Abort {
+ channel_id,
+ reservation_id,
+ reply,
+ })
+ .await
+ }
+
+ async fn fetch_next(
+ &self,
+ req: BrokerFetchRequest,
+ ) -> Result, BrokerError> {
+ self.request(|reply| BrokerCommand::FetchNext { req, reply })
+ .await
+ }
+
+ async fn fetch_batch_available(
+ &self,
+ req: BrokerFetchRequest,
+ max_items: usize,
+ ) -> Result {
+ self.request(|reply| BrokerCommand::FetchBatchAvailable {
+ req,
+ max_items,
+ reply,
+ })
+ .await
+ }
+
+ async fn commit(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ ) -> Result {
+ self.request(|reply| BrokerCommand::Commit {
+ channel_id,
+ reservation_id,
+ now_ms,
+ reply,
+ })
+ .await
+ }
+
+ async fn commit_batch(
+ &self,
+ channel_id: i64,
+ reservation_ids: Vec,
+ now_ms: i64,
+ ) -> Result {
+ self.request(|reply| BrokerCommand::CommitBatch {
+ channel_id,
+ reservation_ids,
+ now_ms,
+ reply,
+ })
+ .await
+ }
+
+ async fn requeue_inflight(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ ) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::RequeueInflight {
+ channel_id,
+ reservation_id,
+ reply,
+ })
+ .await
+ }
+
+ async fn requeue_inflight_batch(
+ &self,
+ channel_id: i64,
+ reservation_ids: Vec,
+ ) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::RequeueInflightBatch {
+ channel_id,
+ reservation_ids,
+ reply,
+ })
+ .await
+ }
+
+ async fn requeue_all_inflight(&self, channel_id: i64) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::RequeueAllInflight { channel_id, reply })
+ .await
+ }
+
+ async fn take_cleanup_batch(
+ &self,
+ channel_id: i64,
+ max_items: usize,
+ ) -> Result, BrokerError> {
+ self.request(|reply| BrokerCommand::TakeCleanupBatch {
+ channel_id,
+ max_items,
+ reply,
+ })
+ .await
+ }
+
+ async fn cleanup_ack(&self, channel_id: i64, reservation_id: u64) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::CleanupAck {
+ channel_id,
+ reservation_id,
+ reply,
+ })
+ .await
+ }
+
+ async fn cleanup_nack(&self, channel_id: i64, reservation_id: u64) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::CleanupNack {
+ channel_id,
+ reservation_id,
+ reply,
+ })
+ .await
+ }
+
+ async fn shutdown(&self) -> Result<(), BrokerError> {
+ self.request(|reply| BrokerCommand::Shutdown { reply })
+ .await
+ }
+
+ async fn request(
+ &self,
+ make_cmd: impl FnOnce(oneshot::Sender>) -> BrokerCommand,
+ ) -> Result {
+ let (reply_tx, reply_rx) = oneshot::channel();
+ self.tx
+ .send(make_cmd(reply_tx))
+ .await
+ .map_err(|_| BrokerError::ActorClosed)?;
+ reply_rx.await.map_err(|_| BrokerError::ActorClosed)?
+ }
+}
+
+#[derive(Debug, Clone, Default, Encode, Decode)]
+enum BrokerRpcOperation {
+ #[default]
+ Noop,
+ UpsertChannel {
+ config: BrokerChannelConfig,
+ },
+ DeleteChannel {
+ channel_id: i64,
+ },
+ Reserve {
+ req: BrokerReserveRequest,
+ },
+ Publish {
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ },
+ Abort {
+ channel_id: i64,
+ reservation_id: u64,
+ },
+ FetchNext {
+ req: BrokerFetchRequest,
+ },
+ FetchBatchAvailable {
+ req: BrokerFetchRequest,
+ max_items: usize,
+ },
+ Commit {
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ },
+ CommitBatch {
+ channel_id: i64,
+ reservation_ids: Vec,
+ now_ms: i64,
+ },
+ RequeueInflight {
+ channel_id: i64,
+ reservation_id: u64,
+ },
+ RequeueInflightBatch {
+ channel_id: i64,
+ reservation_ids: Vec,
+ },
+ RequeueAllInflight {
+ channel_id: i64,
+ },
+ TakeCleanupBatch {
+ channel_id: i64,
+ max_items: usize,
+ },
+ CleanupAck {
+ channel_id: i64,
+ reservation_id: u64,
+ },
+ CleanupNack {
+ channel_id: i64,
+ reservation_id: u64,
+ },
+}
+
+#[derive(Debug, Clone, Default, Encode, Decode)]
+struct BrokerRpcRequest {
+ request_id: String,
+ op: BrokerRpcOperation,
+}
+
+impl BrokerRpcRequest {
+ fn new(op: BrokerRpcOperation) -> Self {
+ Self {
+ request_id: String::new(),
+ op,
+ }
+ }
+}
+
+impl MsgPackSerializePart for BrokerRpcRequest {
+ fn msg_id(&self) -> u32 {
+ BROKER_RPC_REQ_MSG_ID
+ }
+}
+
+impl RPCReq for BrokerRpcRequest {
+ type Resp = BrokerRpcResponse;
+}
+
+#[derive(Debug, Clone, Encode, Decode)]
+enum BrokerRpcReply {
+ Unit(Result<(), BrokerError>),
+ PayloadKeys(Result, BrokerError>),
+ Reservation(Result),
+ Envelope(Result),
+ Fetch(Result, BrokerError>),
+ FetchBatch(Result),
+ Commit(Result),
+ CommitBatch(Result),
+ CleanupBatch(Result, BrokerError>),
+}
+
+impl Default for BrokerRpcReply {
+ fn default() -> Self {
+ Self::Unit(Ok(()))
+ }
+}
+
+#[derive(Debug, Clone, Default, Encode, Decode)]
+struct BrokerRpcResponse {
+ reply: BrokerRpcReply,
+}
+
+#[derive(Default)]
+struct BrokerRpcResponseCache {
+ completed: HashMap,
+ completed_order: VecDeque,
+ in_flight: HashMap>>,
+}
+
+impl MsgPackSerializePart for BrokerRpcResponse {
+ fn msg_id(&self) -> u32 {
+ BROKER_RPC_RESP_MSG_ID
+ }
+}
+
+async fn execute_rpc_request(
+ broker: &LocalBrokerHandle,
+ request: BrokerRpcRequest,
+ allow_wait: bool,
+) -> BrokerRpcResponse {
+ let reply = match request.op {
+ BrokerRpcOperation::Noop => BrokerRpcReply::Unit(Err(BrokerError::Rpc(
+ "broker noop request is invalid".to_string(),
+ ))),
+ BrokerRpcOperation::UpsertChannel { config } => {
+ BrokerRpcReply::Unit(broker.upsert_channel(config).await)
+ }
+ BrokerRpcOperation::DeleteChannel { channel_id } => {
+ BrokerRpcReply::PayloadKeys(broker.delete_channel(channel_id).await)
+ }
+ BrokerRpcOperation::Reserve { req } => {
+ BrokerRpcReply::Reservation(broker.reserve(req).await)
+ }
+ BrokerRpcOperation::Publish {
+ channel_id,
+ reservation_id,
+ now_ms,
+ } => BrokerRpcReply::Envelope(broker.publish(channel_id, reservation_id, now_ms).await),
+ BrokerRpcOperation::Abort {
+ channel_id,
+ reservation_id,
+ } => BrokerRpcReply::Unit(broker.abort(channel_id, reservation_id).await),
+ BrokerRpcOperation::FetchNext { req } if allow_wait => {
+ BrokerRpcReply::Fetch(broker.fetch_next(req).await)
+ }
+ BrokerRpcOperation::FetchNext { req } => BrokerRpcReply::Fetch(
+ broker
+ .fetch_batch_available(req, 1)
+ .await
+ .map(|batch| batch.messages.into_iter().next()),
+ ),
+ BrokerRpcOperation::FetchBatchAvailable { req, max_items } => {
+ BrokerRpcReply::FetchBatch(broker.fetch_batch_available(req, max_items).await)
+ }
+ BrokerRpcOperation::Commit {
+ channel_id,
+ reservation_id,
+ now_ms,
+ } => BrokerRpcReply::Commit(broker.commit(channel_id, reservation_id, now_ms).await),
+ BrokerRpcOperation::CommitBatch {
+ channel_id,
+ reservation_ids,
+ now_ms,
+ } => BrokerRpcReply::CommitBatch(
+ broker
+ .commit_batch(channel_id, reservation_ids, now_ms)
+ .await,
+ ),
+ BrokerRpcOperation::RequeueInflight {
+ channel_id,
+ reservation_id,
+ } => BrokerRpcReply::Unit(broker.requeue_inflight(channel_id, reservation_id).await),
+ BrokerRpcOperation::RequeueInflightBatch {
+ channel_id,
+ reservation_ids,
+ } => BrokerRpcReply::Unit(
+ broker
+ .requeue_inflight_batch(channel_id, reservation_ids)
+ .await,
+ ),
+ BrokerRpcOperation::RequeueAllInflight { channel_id } => {
+ BrokerRpcReply::Unit(broker.requeue_all_inflight(channel_id).await)
+ }
+ BrokerRpcOperation::TakeCleanupBatch {
+ channel_id,
+ max_items,
+ } => BrokerRpcReply::CleanupBatch(broker.take_cleanup_batch(channel_id, max_items).await),
+ BrokerRpcOperation::CleanupAck {
+ channel_id,
+ reservation_id,
+ } => BrokerRpcReply::Unit(broker.cleanup_ack(channel_id, reservation_id).await),
+ BrokerRpcOperation::CleanupNack {
+ channel_id,
+ reservation_id,
+ } => BrokerRpcReply::Unit(broker.cleanup_nack(channel_id, reservation_id).await),
+ };
+ BrokerRpcResponse { reply }
+}
+
+async fn execute_rpc_request_with_cache(
+ broker: &LocalBrokerHandle,
+ response_cache: &Arc>,
+ request: BrokerRpcRequest,
+ allow_wait: bool,
+) -> BrokerRpcResponse {
+ let request_id = request.request_id.clone();
+ if request_id.is_empty() {
+ return execute_rpc_request(broker, request, allow_wait).await;
+ }
+
+ let wait_for_existing = {
+ let mut cache = response_cache.lock().await;
+ if let Some(response) = cache.completed.get(&request_id) {
+ return response.clone();
+ }
+ if let Some(waiters) = cache.in_flight.get_mut(&request_id) {
+ let (tx, rx) = oneshot::channel();
+ waiters.push(tx);
+ Some(rx)
+ } else {
+ cache.in_flight.insert(request_id.clone(), Vec::new());
+ None
+ }
+ };
+
+ if let Some(rx) = wait_for_existing {
+ return rx.await.unwrap_or(BrokerRpcResponse {
+ reply: BrokerRpcReply::Unit(Err(BrokerError::ActorClosed)),
+ });
+ }
+
+ let response = execute_rpc_request(broker, request, allow_wait).await;
+ let waiters = {
+ let mut cache = response_cache.lock().await;
+ let waiters = cache.in_flight.remove(&request_id).unwrap_or_default();
+ cache.completed.insert(request_id.clone(), response.clone());
+ cache.completed_order.push_back(request_id);
+ while cache.completed_order.len() > BROKER_RPC_RESPONSE_CACHE_LIMIT {
+ if let Some(old_request_id) = cache.completed_order.pop_front() {
+ cache.completed.remove(&old_request_id);
+ }
+ }
+ waiters
+ };
+
+ for waiter in waiters {
+ let _ = waiter.send(response.clone());
+ }
+ response
+}
+
+pub fn register_broker_service(p2p_view: P2pModuleView, queue_capacity: usize) {
+ let broker = LocalBrokerHandle::spawn_actor(LocalBroker::new(), queue_capacity);
+ let response_cache = Arc::new(Mutex::new(BrokerRpcResponseCache::default()));
+ let handler_view = p2p_view.clone();
+ RPCHandler::::new().regist(p2p_view.p2p_module(), move |resp, msg| {
+ let broker = broker.clone();
+ let response_cache = response_cache.clone();
+ let handler_view = handler_view.clone();
+ let _ = handler_view.spawn("fluxon_mq.broker.rpc", async move {
+ let response =
+ execute_rpc_request_with_cache(&broker, &response_cache, msg.serialize_part, false)
+ .await;
+ let _ = resp
+ .send_resp(MsgPack {
+ serialize_part: response,
+ raw_bytes: Vec::new(),
+ })
+ .await;
+ });
+ Ok(())
+ });
+}
+
+#[derive(Clone)]
+struct RemoteBrokerHandle {
+ cluster_manager_view: ClusterManagerView,
+ p2p_view: P2pModuleView,
+}
+
+#[derive(Clone)]
+enum BrokerHandleInner {
+ Local(LocalBrokerHandle),
+ Remote(RemoteBrokerHandle),
+}
+
+pub struct BrokerHandle {
+ inner: BrokerHandleInner,
+}
+
+impl Clone for BrokerHandle {
+ fn clone(&self) -> Self {
+ Self {
+ inner: self.inner.clone(),
+ }
+ }
+}
+
+impl std::fmt::Debug for BrokerHandle {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match &self.inner {
+ BrokerHandleInner::Local(_) => f
+ .debug_struct("BrokerHandle")
+ .field("kind", &"local")
+ .finish(),
+ BrokerHandleInner::Remote(_) => f
+ .debug_struct("BrokerHandle")
+ .field("kind", &"remote")
+ .finish(),
+ }
+ }
+}
+
+impl BrokerHandle {
+ pub fn new_distributed(
+ cluster_manager_view: ClusterManagerView,
+ p2p_view: P2pModuleView,
+ ) -> Self {
+ Self {
+ inner: BrokerHandleInner::Remote(RemoteBrokerHandle {
+ cluster_manager_view,
+ p2p_view,
+ }),
+ }
+ }
+
+ #[cfg(test)]
+ pub fn new_local_for_test(queue_capacity: usize) -> Self {
+ Self {
+ inner: BrokerHandleInner::Local(
+ LocalBrokerHandle::spawn_actor_with_cleanup_release_delay(
+ LocalBroker::new(),
+ queue_capacity,
+ Duration::ZERO,
+ ),
+ ),
+ }
+ }
+
+ #[cfg(test)]
+ pub fn new_local_with_payload_byte_capacity_for_test(
+ payload_byte_capacity: u64,
+ queue_capacity: usize,
+ ) -> Self {
+ Self {
+ inner: BrokerHandleInner::Local(
+ LocalBrokerHandle::spawn_actor_with_cleanup_release_delay(
+ LocalBroker::with_payload_byte_capacity(payload_byte_capacity),
+ queue_capacity,
+ Duration::ZERO,
+ ),
+ ),
+ }
+ }
+
+ pub async fn upsert_channel(&self, config: BrokerChannelConfig) -> Result<(), BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::UpsertChannel {
+ config,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Unit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for upsert_channel: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn delete_channel(&self, channel_id: i64) -> Result, BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::DeleteChannel {
+ channel_id,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::PayloadKeys(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for delete_channel: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn reserve(
+ &self,
+ req: BrokerReserveRequest,
+ ) -> Result {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::Reserve { req }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Reservation(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for reserve: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn publish(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ ) -> Result {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::Publish {
+ channel_id,
+ reservation_id,
+ now_ms,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Envelope(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for publish: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn abort(&self, channel_id: i64, reservation_id: u64) -> Result<(), BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::Abort {
+ channel_id,
+ reservation_id,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Unit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for abort: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn fetch_next(
+ &self,
+ req: BrokerFetchRequest,
+ ) -> Result, BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::FetchNext { req }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Fetch(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for fetch_next: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn fetch_batch_available(
+ &self,
+ req: BrokerFetchRequest,
+ max_items: usize,
+ ) -> Result {
+ match self
+ .request(BrokerRpcRequest::new(
+ BrokerRpcOperation::FetchBatchAvailable { req, max_items },
+ ))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::FetchBatch(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for fetch_batch_available: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn commit(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ now_ms: i64,
+ ) -> Result {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::Commit {
+ channel_id,
+ reservation_id,
+ now_ms,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Commit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for commit: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn commit_batch(
+ &self,
+ channel_id: i64,
+ reservation_ids: Vec,
+ now_ms: i64,
+ ) -> Result {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::CommitBatch {
+ channel_id,
+ reservation_ids,
+ now_ms,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::CommitBatch(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for commit_batch: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn requeue_inflight(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ ) -> Result<(), BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::RequeueInflight {
+ channel_id,
+ reservation_id,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Unit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for requeue_inflight: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn requeue_inflight_batch(
+ &self,
+ channel_id: i64,
+ reservation_ids: Vec,
+ ) -> Result<(), BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(
+ BrokerRpcOperation::RequeueInflightBatch {
+ channel_id,
+ reservation_ids,
+ },
+ ))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Unit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for requeue_inflight_batch: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn requeue_all_inflight(&self, channel_id: i64) -> Result<(), BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(
+ BrokerRpcOperation::RequeueAllInflight { channel_id },
+ ))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Unit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for requeue_all_inflight: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn take_cleanup_batch(
+ &self,
+ channel_id: i64,
+ max_items: usize,
+ ) -> Result, BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(
+ BrokerRpcOperation::TakeCleanupBatch {
+ channel_id,
+ max_items,
+ },
+ ))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::CleanupBatch(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for take_cleanup_batch: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn cleanup_ack(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ ) -> Result<(), BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::CleanupAck {
+ channel_id,
+ reservation_id,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Unit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for cleanup_ack: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn cleanup_nack(
+ &self,
+ channel_id: i64,
+ reservation_id: u64,
+ ) -> Result<(), BrokerError> {
+ match self
+ .request(BrokerRpcRequest::new(BrokerRpcOperation::CleanupNack {
+ channel_id,
+ reservation_id,
+ }))
+ .await?
+ .reply
+ {
+ BrokerRpcReply::Unit(result) => result,
+ other => Err(BrokerError::Rpc(format!(
+ "unexpected response for cleanup_nack: {:?}",
+ other
+ ))),
+ }
+ }
+
+ pub async fn shutdown(&self) -> Result<(), BrokerError> {
+ match &self.inner {
+ BrokerHandleInner::Local(local) => local.shutdown().await,
+ BrokerHandleInner::Remote(_) => Err(BrokerError::Rpc(
+ "shutdown is unsupported for distributed broker handles".to_string(),
+ )),
+ }
+ }
+
+ async fn request(&self, request: BrokerRpcRequest) -> Result {
+ match &self.inner {
+ BrokerHandleInner::Local(local) => Ok(execute_rpc_request(local, request, true).await),
+ BrokerHandleInner::Remote(remote) => remote.request(request).await,
+ }
+ }
+}
+
+impl RemoteBrokerHandle {
+ async fn request(
+ &self,
+ mut request: BrokerRpcRequest,
+ ) -> Result {
+ if request.request_id.is_empty() {
+ request.request_id = next_broker_rpc_request_id();
+ }
+ let broker_node =
+ find_or_wait_broker_node(self.cluster_manager_view.cluster_manager()).await?;
+ let response = RPCCaller::::new()
+ .call(
+ self.p2p_view.p2p_module(),
+ broker_node.into(),
+ MsgPack {
+ serialize_part: request,
+ raw_bytes: Vec::new(),
+ },
+ None,
+ 6,
+ )
+ .await
+ .map_err(|e| BrokerError::Rpc(format!("broker rpc call failed: {}", e)))?;
+ Ok(response.serialize_part)
+ }
+}
+
+async fn find_or_wait_broker_node(
+ cluster_manager: &fluxon_commu::ClusterManager,
+) -> Result {
+ let mut rx = cluster_manager.listen();
+ let members = cluster_manager.get_members();
+ let broker_nodes: Vec<_> = members
+ .iter()
+ .filter(|member| is_broker_member(member))
+ .collect();
+ if broker_nodes.len() == 1 {
+ return Ok(broker_nodes[0].id.to_string());
+ }
+ if broker_nodes.len() > 1 {
+ return Err(BrokerError::BrokerUnavailable(format!(
+ "multiple brokers found: {:?}",
+ broker_nodes
+ .into_iter()
+ .map(|member| member.id.to_string())
+ .collect::>()
+ )));
+ }
+
+ tokio::time::timeout(BROKER_DISCOVERY_TIMEOUT, async move {
+ while let Ok(event) = rx.recv().await {
+ match event {
+ fluxon_commu::ClusterEvent::MemberJoined(member)
+ | fluxon_commu::ClusterEvent::MemberUpdated(member)
+ if is_broker_member(&member) =>
+ {
+ return Ok(member.id.to_string());
+ }
+ _ => {}
+ }
+ }
+ Err(BrokerError::BrokerUnavailable(
+ "broker node not found from cluster manager".to_string(),
+ ))
+ })
+ .await
+ .unwrap_or_else(|_| {
+ Err(BrokerError::BrokerUnavailable(format!(
+ "timed out waiting {}s for broker node registration; start fluxon_py.runtime.start_broker first",
+ BROKER_DISCOVERY_TIMEOUT.as_secs()
+ )))
+ })
+}
+
+fn next_broker_rpc_request_id() -> String {
+ let prefix = BROKER_RPC_REQUEST_PREFIX.get_or_init(|| {
+ let started_ns = SystemTime::now()
+ .duration_since(UNIX_EPOCH)
+ .expect("system clock is before UNIX_EPOCH")
+ .as_nanos();
+ format!("{}-{}", std::process::id(), started_ns)
+ });
+ let seq = BROKER_RPC_REQUEST_SEQ.fetch_add(1, Ordering::Relaxed);
+ format!("{}-{}", prefix, seq)
+}
+
+fn is_broker_member(member: &fluxon_commu::ClusterMember) -> bool {
+ member
+ .metadata
+ .get(FLUXON_MQ_COMPONENT_METADATA_KEY)
+ .is_some_and(|value| value == FLUXON_MQ_COMPONENT_BROKER_METADATA_VALUE)
+}
+
+fn broker_category_enforces_capacity(category: MqCategory) -> bool {
+ matches!(category, MqCategory::MpmcSub { .. })
+}
+
+pub fn now_unix_ms() -> i64 {
+ SystemTime::now()
+ .duration_since(UNIX_EPOCH)
+ .expect("system clock is before UNIX_EPOCH")
+ .as_millis() as i64
+}
+
+fn validate_capacity(config: &BrokerChannelConfig) -> Result<(), BrokerError> {
+ if config.capacity <= 0 {
+ return Err(BrokerError::InvalidCapacity {
+ channel_id: config.channel_id,
+ capacity: config.capacity,
+ });
+ }
+ Ok(())
+}
+
+fn default_payload_byte_capacity() -> u64 {
+ if let Ok(raw) = env::var(BROKER_PAYLOAD_BYTES_CAP_ENV) {
+ if let Ok(value) = raw.trim().parse::() {
+ if value > 0 {
+ return value;
+ }
+ }
+ }
+
+ if let Ok(raw) = env::var(OWNER_POOL_DRAM_BYTES_ENV) {
+ if let Ok(value) = raw.trim().parse::() {
+ if value > 0 {
+ let percent = payload_byte_capacity_percent();
+ return ((value as u128) * (percent as u128) / 100).max(1) as u64;
+ }
+ }
+ }
+
+ DEFAULT_BROKER_PAYLOAD_BYTES_CAP
+}
+
+fn payload_byte_capacity_percent() -> u64 {
+ env::var(BROKER_PAYLOAD_BYTES_CAP_PERCENT_ENV)
+ .ok()
+ .and_then(|raw| raw.trim().parse::().ok())
+ .filter(|value| (1..=100).contains(value))
+ .unwrap_or(DEFAULT_BROKER_PAYLOAD_BYTES_CAP_PERCENT)
+}
+
+fn default_cleanup_release_delay() -> Duration {
+ Duration::from_millis(
+ env::var(BROKER_CLEANUP_RELEASE_DELAY_MS_ENV)
+ .ok()
+ .and_then(|raw| raw.trim().parse::().ok())
+ .unwrap_or(DEFAULT_BROKER_CLEANUP_RELEASE_DELAY_MS),
+ )
+}
+
+fn remove_from_deque(queue: &mut VecDeque, reservation_id: u64) {
+ if let Some(pos) = queue.iter().position(|id| *id == reservation_id) {
+ queue.remove(pos);
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ fn reserve_req(channel_id: i64, producer_id: &str, now_ms: i64) -> BrokerReserveRequest {
+ reserve_req_with_category(channel_id, producer_id, MqCategory::Mpsc, 1, now_ms)
+ }
+
+ fn reserve_req_with_category(
+ channel_id: i64,
+ producer_id: &str,
+ category: MqCategory,
+ payload_bytes: u64,
+ now_ms: i64,
+ ) -> BrokerReserveRequest {
+ BrokerReserveRequest {
+ channel_id,
+ producer_id: producer_id.to_string(),
+ category,
+ payload_bytes,
+ now_ms,
+ }
+ }
+
+ fn reserve_req_bytes(
+ channel_id: i64,
+ producer_id: &str,
+ payload_bytes: u64,
+ now_ms: i64,
+ ) -> BrokerReserveRequest {
+ BrokerReserveRequest {
+ channel_id,
+ producer_id: producer_id.to_string(),
+ category: MqCategory::Mpsc,
+ payload_bytes,
+ now_ms,
+ }
+ }
+
+ fn fetch_req(channel_id: i64, consumer_id: &str, now_ms: i64) -> BrokerFetchRequest {
+ BrokerFetchRequest {
+ channel_id,
+ consumer_id: consumer_id.to_string(),
+ now_ms,
+ }
+ }
+
+ #[tokio::test]
+ async fn rpc_request_cache_deduplicates_retried_reserve() {
+ let broker = LocalBrokerHandle::spawn_actor_with_cleanup_release_delay(
+ LocalBroker::new(),
+ 8,
+ Duration::ZERO,
+ );
+ let cache = Arc::new(Mutex::new(BrokerRpcResponseCache::default()));
+ let upsert = BrokerRpcRequest::new(BrokerRpcOperation::UpsertChannel {
+ config: BrokerChannelConfig {
+ channel_id: 41,
+ capacity: 2,
+ },
+ });
+ let _ = execute_rpc_request_with_cache(&broker, &cache, upsert, false).await;
+
+ let reserve = BrokerRpcRequest {
+ request_id: "reserve-retry-1".to_string(),
+ op: BrokerRpcOperation::Reserve {
+ req: reserve_req(41, "p0", 10),
+ },
+ };
+ let first = execute_rpc_request_with_cache(&broker, &cache, reserve.clone(), false).await;
+ let second = execute_rpc_request_with_cache(&broker, &cache, reserve, false).await;
+ let first_reservation = match first.reply {
+ BrokerRpcReply::Reservation(Ok(reservation)) => reservation,
+ other => panic!("unexpected first reserve response: {:?}", other),
+ };
+ let second_reservation = match second.reply {
+ BrokerRpcReply::Reservation(Ok(reservation)) => reservation,
+ other => panic!("unexpected second reserve response: {:?}", other),
+ };
+ assert_eq!(
+ first_reservation.envelope.reservation_id,
+ second_reservation.envelope.reservation_id
+ );
+
+ let next = broker.reserve(reserve_req(41, "p0", 11)).await.unwrap();
+ assert_eq!(next.envelope.reservation_id, 2);
+ broker.shutdown().await.unwrap();
+ }
+
+ #[tokio::test]
+ async fn rpc_fetch_next_without_wait_returns_none() {
+ let broker = LocalBrokerHandle::spawn_actor_with_cleanup_release_delay(
+ LocalBroker::new(),
+ 8,
+ Duration::ZERO,
+ );
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 42,
+ capacity: 2,
+ })
+ .await
+ .unwrap();
+ let cache = Arc::new(Mutex::new(BrokerRpcResponseCache::default()));
+ let response = tokio::time::timeout(
+ Duration::from_millis(50),
+ execute_rpc_request_with_cache(
+ &broker,
+ &cache,
+ BrokerRpcRequest {
+ request_id: "fetch-empty-1".to_string(),
+ op: BrokerRpcOperation::FetchNext {
+ req: fetch_req(42, "c0", 10),
+ },
+ },
+ false,
+ ),
+ )
+ .await
+ .expect("remote-style fetch must not wait");
+ match response.reply {
+ BrokerRpcReply::Fetch(Ok(None)) => {}
+ other => panic!("unexpected fetch response: {:?}", other),
+ }
+ broker.shutdown().await.unwrap();
+ }
+
+ #[test]
+ fn reserve_publish_fetch_commit_frees_capacity_for_mpmc_sub() {
+ let mut broker = LocalBroker::new();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 7,
+ capacity: 2,
+ })
+ .unwrap();
+
+ let first = broker
+ .reserve(reserve_req_with_category(
+ 7,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 70 },
+ 1,
+ 10,
+ ))
+ .unwrap();
+ let second = broker
+ .reserve(reserve_req_with_category(
+ 7,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 70 },
+ 1,
+ 11,
+ ))
+ .unwrap();
+ assert_eq!(first.envelope.msg_id, 0);
+ assert_eq!(second.envelope.msg_id, 1);
+ assert_eq!(
+ broker
+ .reserve(reserve_req_with_category(
+ 7,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 70 },
+ 1,
+ 12,
+ ))
+ .unwrap_err(),
+ BrokerError::ChannelFull {
+ channel_id: 7,
+ capacity: 2,
+ used_slots: 2,
+ }
+ );
+
+ broker
+ .publish(7, first.envelope.reservation_id, 20)
+ .unwrap();
+ let fetched = broker.fetch_next(fetch_req(7, "c0", 30)).unwrap().unwrap();
+ assert_eq!(
+ fetched.envelope.reservation_id,
+ first.envelope.reservation_id
+ );
+
+ let committed = broker
+ .commit(7, fetched.envelope.reservation_id, 40)
+ .unwrap();
+ assert!(committed.first_commit);
+ assert_eq!(
+ committed
+ .cleanup
+ .as_ref()
+ .map(|env| env.payload_key.as_str()),
+ Some(
+ keys::backend_message_key_with_category(
+ 7,
+ "p0",
+ 0,
+ &MqCategory::MpmcSub { parent_mpmc_id: 70 },
+ )
+ .as_str()
+ )
+ );
+
+ let third = broker
+ .reserve(reserve_req_with_category(
+ 7,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 70 },
+ 1,
+ 50,
+ ))
+ .unwrap();
+ assert_eq!(third.envelope.msg_id, 2);
+ }
+
+ #[test]
+ fn abort_releases_pending_slot_for_mpmc_sub() {
+ let mut broker = LocalBroker::new();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 8,
+ capacity: 1,
+ })
+ .unwrap();
+
+ let reservation = broker
+ .reserve(reserve_req_with_category(
+ 8,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 80 },
+ 1,
+ 10,
+ ))
+ .unwrap();
+ assert!(matches!(
+ broker.reserve(reserve_req_with_category(
+ 8,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 80 },
+ 1,
+ 11,
+ )),
+ Err(BrokerError::ChannelFull { .. })
+ ));
+
+ broker
+ .abort(8, reservation.envelope.reservation_id)
+ .unwrap();
+ let next = broker
+ .reserve(reserve_req_with_category(
+ 8,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 80 },
+ 1,
+ 12,
+ ))
+ .unwrap();
+ assert_eq!(next.envelope.msg_id, 1);
+ }
+
+ #[test]
+ fn requeue_all_inflight_preserves_fetch_order() {
+ let mut broker = LocalBroker::new();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 10,
+ capacity: 4,
+ })
+ .unwrap();
+ let first = broker.reserve(reserve_req(10, "p0", 10)).unwrap();
+ let second = broker.reserve(reserve_req(10, "p0", 11)).unwrap();
+ broker
+ .publish(10, first.envelope.reservation_id, 20)
+ .unwrap();
+ broker
+ .publish(10, second.envelope.reservation_id, 21)
+ .unwrap();
+
+ let _ = broker.fetch_next(fetch_req(10, "c0", 30)).unwrap().unwrap();
+ let _ = broker.fetch_next(fetch_req(10, "c0", 31)).unwrap().unwrap();
+ broker.requeue_all_inflight(10).unwrap();
+
+ let redelivered_first = broker.fetch_next(fetch_req(10, "c0", 40)).unwrap().unwrap();
+ let redelivered_second = broker.fetch_next(fetch_req(10, "c0", 41)).unwrap().unwrap();
+ assert_eq!(
+ redelivered_first.envelope.reservation_id,
+ first.envelope.reservation_id
+ );
+ assert_eq!(
+ redelivered_second.envelope.reservation_id,
+ second.envelope.reservation_id
+ );
+ }
+
+ #[test]
+ fn batch_fetch_and_commit_preserves_order_and_frees_capacity() {
+ let mut broker = LocalBroker::new();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 11,
+ capacity: 3,
+ })
+ .unwrap();
+
+ let first = broker.reserve(reserve_req(11, "p0", 10)).unwrap();
+ let second = broker.reserve(reserve_req(11, "p0", 11)).unwrap();
+ let third = broker.reserve(reserve_req(11, "p1", 12)).unwrap();
+ for reservation in [&first, &second, &third] {
+ broker
+ .publish(11, reservation.envelope.reservation_id, 20)
+ .unwrap();
+ }
+
+ let batch = broker
+ .fetch_batch_available(fetch_req(11, "c0", 30), 2)
+ .unwrap();
+ assert_eq!(batch.messages.len(), 2);
+ assert_eq!(batch.messages[0].envelope.msg_id, 0);
+ assert_eq!(batch.messages[1].envelope.msg_id, 1);
+
+ let outcome = broker
+ .commit_batch(
+ 11,
+ batch
+ .messages
+ .iter()
+ .map(|message| message.envelope.reservation_id)
+ .collect(),
+ 40,
+ )
+ .unwrap();
+ assert_eq!(outcome.first_commit_count, 2);
+ assert_eq!(outcome.cleanup.len(), 2);
+
+ let next = broker.reserve(reserve_req(11, "p0", 50)).unwrap();
+ assert_eq!(next.envelope.msg_id, 2);
+ }
+
+ #[test]
+ fn duplicate_commit_is_idempotent_until_cleanup_ack() {
+ let mut broker = LocalBroker::with_payload_byte_capacity(10);
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 19,
+ capacity: 4,
+ })
+ .unwrap();
+
+ let reserved = broker.reserve(reserve_req_bytes(19, "p0", 6, 10)).unwrap();
+ broker
+ .publish(19, reserved.envelope.reservation_id, 20)
+ .unwrap();
+ let fetched = broker.fetch_next(fetch_req(19, "c0", 30)).unwrap().unwrap();
+ let reservation_id = fetched.envelope.reservation_id;
+
+ let first = broker.commit(19, reservation_id, 40).unwrap();
+ assert!(first.first_commit);
+ assert!(first.cleanup.is_some());
+ let duplicate = broker.commit(19, reservation_id, 41).unwrap();
+ assert!(!duplicate.first_commit);
+ assert!(duplicate.cleanup.is_none());
+
+ broker.cleanup_ack(19, reservation_id).unwrap();
+ assert_eq!(
+ broker.commit(19, reservation_id, 42).unwrap_err(),
+ BrokerError::DeliveryNotFound {
+ channel_id: 19,
+ reservation_id,
+ }
+ );
+ }
+
+ #[test]
+ fn payload_byte_budget_is_global_and_released_on_cleanup_ack_or_abort() {
+ let mut broker = LocalBroker::with_payload_byte_capacity(10);
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 21,
+ capacity: 8,
+ })
+ .unwrap();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 22,
+ capacity: 8,
+ })
+ .unwrap();
+
+ let first = broker.reserve(reserve_req_bytes(21, "p0", 6, 10)).unwrap();
+ assert_eq!(first.envelope.payload_bytes, 6);
+ assert!(matches!(
+ broker.reserve(reserve_req_bytes(22, "p1", 5, 11)),
+ Err(BrokerError::PayloadBytesFull { .. })
+ ));
+
+ broker
+ .publish(21, first.envelope.reservation_id, 20)
+ .unwrap();
+ let fetched = broker.fetch_next(fetch_req(21, "c0", 30)).unwrap().unwrap();
+ broker
+ .commit(21, fetched.envelope.reservation_id, 40)
+ .unwrap();
+ assert!(matches!(
+ broker.reserve(reserve_req_bytes(22, "p1", 5, 41)),
+ Err(BrokerError::PayloadBytesFull { .. })
+ ));
+ broker
+ .cleanup_ack(21, fetched.envelope.reservation_id)
+ .unwrap();
+ let second = broker.reserve(reserve_req_bytes(22, "p1", 5, 50)).unwrap();
+ broker.abort(22, second.envelope.reservation_id).unwrap();
+ let third = broker.reserve(reserve_req_bytes(22, "p1", 10, 60)).unwrap();
+ assert_eq!(third.envelope.payload_bytes, 10);
+ }
+
+ #[test]
+ fn mpsc_reserve_does_not_gate_on_channel_capacity() {
+ let mut broker = LocalBroker::new();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 201,
+ capacity: 1,
+ })
+ .unwrap();
+
+ let first = broker.reserve(reserve_req(201, "p0", 10)).unwrap();
+ let second = broker.reserve(reserve_req(201, "p0", 11)).unwrap();
+
+ assert_eq!(first.envelope.msg_id, 0);
+ assert_eq!(second.envelope.msg_id, 1);
+ }
+
+ #[test]
+ fn mpmc_sub_reserve_still_gates_on_channel_capacity() {
+ let mut broker = LocalBroker::new();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 202,
+ capacity: 1,
+ })
+ .unwrap();
+
+ let _ = broker
+ .reserve(reserve_req_with_category(
+ 202,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 9 },
+ 1,
+ 10,
+ ))
+ .unwrap();
+
+ assert!(matches!(
+ broker.reserve(reserve_req_with_category(
+ 202,
+ "p0",
+ MqCategory::MpmcSub { parent_mpmc_id: 9 },
+ 1,
+ 11,
+ )),
+ Err(BrokerError::ChannelFull { .. })
+ ));
+ }
+
+ #[test]
+ fn cleanup_ack_releases_payload_after_cleanup_batch_take() {
+ let mut broker = LocalBroker::with_payload_byte_capacity(10);
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 23,
+ capacity: 8,
+ })
+ .unwrap();
+
+ let first = broker.reserve(reserve_req_bytes(23, "p0", 6, 10)).unwrap();
+ broker
+ .publish(23, first.envelope.reservation_id, 20)
+ .unwrap();
+ let fetched = broker.fetch_next(fetch_req(23, "c0", 30)).unwrap().unwrap();
+ broker
+ .commit(23, fetched.envelope.reservation_id, 40)
+ .unwrap();
+ assert_eq!(broker.take_cleanup_batch(23, 8).unwrap().len(), 1);
+ assert!(matches!(
+ broker.reserve(reserve_req_bytes(23, "p1", 5, 41)),
+ Err(BrokerError::PayloadBytesFull { .. })
+ ));
+
+ broker
+ .cleanup_ack(23, fetched.envelope.reservation_id)
+ .unwrap();
+ let second = broker.reserve(reserve_req_bytes(23, "p1", 5, 50)).unwrap();
+ assert_eq!(second.envelope.payload_bytes, 5);
+ }
+
+ #[test]
+ fn delete_channel_releases_payload_budget_for_all_queues() {
+ let mut broker = LocalBroker::with_payload_byte_capacity(100);
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 31,
+ capacity: 16,
+ })
+ .unwrap();
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 32,
+ capacity: 16,
+ })
+ .unwrap();
+
+ let pending = broker.reserve(reserve_req_bytes(31, "p0", 10, 10)).unwrap();
+
+ let inflight = broker.reserve(reserve_req_bytes(31, "p0", 12, 12)).unwrap();
+ broker
+ .publish(31, inflight.envelope.reservation_id, 21)
+ .unwrap();
+ let _ = broker.fetch_next(fetch_req(31, "c0", 30)).unwrap().unwrap();
+
+ let cleanup_inflight = broker.reserve(reserve_req_bytes(31, "p0", 13, 13)).unwrap();
+ broker
+ .publish(31, cleanup_inflight.envelope.reservation_id, 22)
+ .unwrap();
+ let fetched = broker.fetch_next(fetch_req(31, "c0", 31)).unwrap().unwrap();
+ broker
+ .commit(31, fetched.envelope.reservation_id, 40)
+ .unwrap();
+ assert_eq!(broker.take_cleanup_batch(31, 1).unwrap().len(), 1);
+
+ let cleanup = broker.reserve(reserve_req_bytes(31, "p0", 14, 14)).unwrap();
+ broker
+ .publish(31, cleanup.envelope.reservation_id, 23)
+ .unwrap();
+ let fetched = broker.fetch_next(fetch_req(31, "c0", 32)).unwrap().unwrap();
+ broker
+ .commit(31, fetched.envelope.reservation_id, 41)
+ .unwrap();
+
+ let visible = broker.reserve(reserve_req_bytes(31, "p0", 11, 15)).unwrap();
+ broker
+ .publish(31, visible.envelope.reservation_id, 24)
+ .unwrap();
+
+ assert_eq!(broker.state.used_payload_bytes, 60);
+ assert!(matches!(
+ broker.reserve(reserve_req_bytes(32, "p1", 41, 50)),
+ Err(BrokerError::PayloadBytesFull { .. })
+ ));
+
+ let mut payload_keys = broker.delete_channel(31).unwrap();
+ payload_keys.sort();
+ let mut expected_payload_keys = vec![
+ pending.envelope.payload_key,
+ inflight.envelope.payload_key,
+ cleanup_inflight.envelope.payload_key,
+ cleanup.envelope.payload_key,
+ visible.envelope.payload_key,
+ ];
+ expected_payload_keys.sort();
+ assert_eq!(payload_keys, expected_payload_keys);
+ assert_eq!(broker.state.used_payload_bytes, 0);
+ assert_eq!(broker.delete_channel(31), Ok(Vec::new()));
+ assert_eq!(
+ broker.fetch_next(fetch_req(31, "c0", 60)).unwrap_err(),
+ BrokerError::ChannelNotFound(31)
+ );
+
+ let next = broker
+ .reserve(reserve_req_bytes(32, "p1", 100, 70))
+ .unwrap();
+ assert_eq!(next.envelope.payload_bytes, 100);
+ }
+
+ #[tokio::test]
+ async fn broker_handle_roundtrip_uses_local_actor() {
+ let handle = BrokerHandle::new_local_for_test(32);
+ handle
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 12,
+ capacity: 2,
+ })
+ .await
+ .unwrap();
+ let reserved = handle.reserve(reserve_req(12, "p0", 10)).await.unwrap();
+ handle
+ .publish(12, reserved.envelope.reservation_id, 20)
+ .await
+ .unwrap();
+ let fetched = handle
+ .fetch_next(fetch_req(12, "c0", 30))
+ .await
+ .unwrap()
+ .unwrap();
+ assert_eq!(fetched.envelope.msg_id, 0);
+ handle
+ .commit(12, fetched.envelope.reservation_id, 40)
+ .await
+ .unwrap();
+ assert_eq!(handle.take_cleanup_batch(12, 8).await.unwrap().len(), 1);
+ handle
+ .cleanup_ack(12, fetched.envelope.reservation_id)
+ .await
+ .unwrap();
+ handle.shutdown().await.unwrap();
+ }
+
+ #[tokio::test]
+ async fn broker_handle_delete_channel_releases_payload_budget() {
+ let handle = BrokerHandle::new_local_with_payload_byte_capacity_for_test(10, 8);
+ handle
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 24,
+ capacity: 4,
+ })
+ .await
+ .unwrap();
+
+ let first = handle
+ .reserve(reserve_req_bytes(24, "p0", 6, 10))
+ .await
+ .unwrap();
+ assert!(matches!(
+ handle.reserve(reserve_req_bytes(24, "p1", 5, 11)).await,
+ Err(BrokerError::PayloadBytesFull { .. })
+ ));
+
+ assert_eq!(
+ handle.delete_channel(24).await.unwrap(),
+ vec![first.envelope.payload_key]
+ );
+ assert_eq!(
+ handle.delete_channel(24).await.unwrap(),
+ Vec::::new()
+ );
+ handle
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 25,
+ capacity: 4,
+ })
+ .await
+ .unwrap();
+ let next = handle
+ .reserve(reserve_req_bytes(25, "p1", 10, 20))
+ .await
+ .unwrap();
+ assert_eq!(next.envelope.payload_bytes, 10);
+
+ handle.shutdown().await.unwrap();
+ }
+
+ #[tokio::test]
+ async fn broker_handle_returns_actor_closed_after_shutdown() {
+ let handle = BrokerHandle::new_local_for_test(8);
+ handle
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 13,
+ capacity: 1,
+ })
+ .await
+ .unwrap();
+ handle.shutdown().await.unwrap();
+ assert_eq!(
+ handle.reserve(reserve_req(13, "p0", 10)).await.unwrap_err(),
+ BrokerError::ActorClosed
+ );
+ }
+
+ #[tokio::test]
+ async fn broker_handle_returns_channel_full_without_waiting_for_mpmc_sub() {
+ let handle = BrokerHandle::new_local_for_test(8);
+ handle
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 14,
+ capacity: 1,
+ })
+ .await
+ .unwrap();
+
+ let first = handle
+ .reserve(reserve_req_with_category(
+ 14,
+ "p0",
+ MqCategory::MpmcSub {
+ parent_mpmc_id: 140,
+ },
+ 1,
+ 10,
+ ))
+ .await
+ .unwrap();
+ assert!(matches!(
+ handle
+ .reserve(reserve_req_with_category(
+ 14,
+ "p0",
+ MqCategory::MpmcSub {
+ parent_mpmc_id: 140
+ },
+ 1,
+ 11,
+ ))
+ .await,
+ Err(BrokerError::ChannelFull { .. })
+ ));
+
+ handle
+ .abort(14, first.envelope.reservation_id)
+ .await
+ .unwrap();
+ let second = handle
+ .reserve(reserve_req_with_category(
+ 14,
+ "p0",
+ MqCategory::MpmcSub {
+ parent_mpmc_id: 140,
+ },
+ 1,
+ 12,
+ ))
+ .await
+ .unwrap();
+ assert_eq!(second.envelope.msg_id, 1);
+
+ handle.shutdown().await.unwrap();
+ }
+
+ #[tokio::test]
+ async fn broker_handle_returns_payload_bytes_full_without_waiting() {
+ let handle = BrokerHandle::new_local_with_payload_byte_capacity_for_test(10, 8);
+ handle
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 16,
+ capacity: 8,
+ })
+ .await
+ .unwrap();
+
+ let first = handle
+ .reserve(reserve_req_bytes(16, "p0", 6, 10))
+ .await
+ .unwrap();
+ assert!(matches!(
+ handle.reserve(reserve_req_bytes(16, "p1", 5, 11)).await,
+ Err(BrokerError::PayloadBytesFull { .. })
+ ));
+
+ handle
+ .publish(16, first.envelope.reservation_id, 20)
+ .await
+ .unwrap();
+ let fetched = handle
+ .fetch_next(fetch_req(16, "c0", 30))
+ .await
+ .unwrap()
+ .unwrap();
+ handle
+ .commit(16, fetched.envelope.reservation_id, 40)
+ .await
+ .unwrap();
+
+ handle
+ .cleanup_ack(16, fetched.envelope.reservation_id)
+ .await
+ .unwrap();
+
+ let second = handle
+ .reserve(reserve_req_bytes(16, "p1", 5, 50))
+ .await
+ .unwrap();
+ assert_eq!(second.envelope.producer_id, "p1");
+ assert_eq!(second.envelope.payload_bytes, 5);
+
+ handle.shutdown().await.unwrap();
+ }
+
+ #[tokio::test]
+ async fn broker_handle_waits_for_message_then_resumes() {
+ use std::time::Duration;
+ use tokio::time::sleep;
+
+ let handle = BrokerHandle::new_local_for_test(8);
+ handle
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 15,
+ capacity: 2,
+ })
+ .await
+ .unwrap();
+
+ let waiter_handle = handle.clone();
+ let pending =
+ tokio::spawn(async move { waiter_handle.fetch_next(fetch_req(15, "c0", 10)).await });
+
+ sleep(Duration::from_millis(50)).await;
+ assert!(!pending.is_finished());
+
+ let reservation = handle.reserve(reserve_req(15, "p0", 11)).await.unwrap();
+ handle
+ .publish(15, reservation.envelope.reservation_id, 12)
+ .await
+ .unwrap();
+
+ let fetched = pending.await.unwrap().unwrap().unwrap();
+ assert_eq!(fetched.envelope.msg_id, 0);
+
+ handle.shutdown().await.unwrap();
+ }
+}
diff --git a/fluxon_rs/fluxon_mq/src/consumer.rs b/fluxon_rs/fluxon_mq/src/consumer.rs
index c5e5fa4..6207a30 100644
--- a/fluxon_rs/fluxon_mq/src/consumer.rs
+++ b/fluxon_rs/fluxon_mq/src/consumer.rs
@@ -47,6 +47,7 @@ use crate::nonblocking_monitor::{
};
use crate::shutdown::ShutdownCtl;
use crate::LifecycleView;
+use crate::{BrokerEnvelope, BrokerFetchRequest, BrokerFetchedMessage, BrokerHandle};
use tracing::{debug, info, warn};
const NO_MESSAGE_WARN_INTERVAL: Duration = Duration::from_secs(30);
@@ -54,6 +55,10 @@ const PREFETCH_LATENCY_LOG_INTERVAL: Duration = NO_MESSAGE_WARN_INTERVAL;
const PREFETCH_LATENCY_WINDOW_SIZE: usize = 16;
const NONBLOCKING_QUEUE_WAIT_THRESHOLD: Duration = Duration::from_millis(500);
const DELETE_CALLBACK_WARN_INTERVAL: Duration = Duration::from_secs(1);
+const BROKER_CLEANUP_DELETE_RETRY_INITIAL_SLEEP: Duration = Duration::from_millis(50);
+const BROKER_CLEANUP_DELETE_RETRY_MAX_SLEEP: Duration = Duration::from_secs(5);
+const BROKER_CLEANUP_ACK_RETRY_INITIAL_SLEEP: Duration = Duration::from_millis(50);
+const BROKER_CLEANUP_ACK_RETRY_MAX_SLEEP: Duration = Duration::from_secs(5);
const COMMIT_WAIT_WARN_INTERVAL: Duration = Duration::from_secs(10);
const COMMIT_WAIT_BREAKDOWN_SUMMARY_THRESHOLD: Duration = Duration::from_millis(50);
const COMMIT_OFFSET_PUT_TIMEOUT: Duration = Duration::from_secs(10);
@@ -64,6 +69,9 @@ const PREFETCH_HANDLE_AWAIT_WARN_INTERVAL: Duration = Duration::from_secs(2);
const COMMIT_PROGRESS_RETENTION: usize = 1024;
const STALE_PRODUCER_PROBE_TOMB_TTL: Duration = Duration::from_secs(10);
const READY_TRACE_HISTORY_PER_PRODUCER: usize = 64;
+const PREFETCH_REFILL_BURST_MAX: usize = 128;
+const PREFETCH_NO_MESSAGE_RETRY_EMPTY_SLEEP: Duration = Duration::from_millis(1);
+const PREFETCH_NO_MESSAGE_RETRY_PARTIAL_SLEEP: Duration = Duration::from_millis(5);
static NEXT_CONSUMER_INSTANCE_ID: AtomicUsize = AtomicUsize::new(1);
fn map_prefix_scan_error(err: EtcdPrefixScanError) -> MpscError {
@@ -96,6 +104,21 @@ fn merge_offset_cache_monotonic(current: &mut HashMap, fetched: Has
}
}
+fn prefetch_refill_launch_budget(target: usize, current: usize) -> usize {
+ target
+ .saturating_sub(current)
+ .min(PREFETCH_REFILL_BURST_MAX)
+ .max(1)
+}
+
+fn prefetch_no_message_retry_sleep(current: usize) -> Duration {
+ if current == 0 {
+ PREFETCH_NO_MESSAGE_RETRY_EMPTY_SLEEP
+ } else {
+ PREFETCH_NO_MESSAGE_RETRY_PARTIAL_SLEEP
+ }
+}
+
fn prefetch_job_stage_name(stage: u8) -> &'static str {
match stage {
0 => "init",
@@ -296,9 +319,7 @@ impl CommitSequencer {
let mut current_blocker_begin_at = wait_begin;
loop {
if shutdown.is_closed() {
- return Err(MpscError::Internal(
- "consumer closed during consume-offset commit wait".to_string(),
- ));
+ return Err(MpscError::Closed);
}
let observed_next_seq = self.next_seq.load(Ordering::SeqCst);
if observed_next_seq == seq {
@@ -366,9 +387,7 @@ impl CommitSequencer {
);
}
_ = shutdown.wait_closed() => {
- return Err(MpscError::Internal(
- "consumer closed during consume-offset commit wait".to_string(),
- ));
+ return Err(MpscError::Closed);
}
}
}
@@ -759,9 +778,16 @@ struct ReadyPathLatencySample {
}
/// Application-level payload (type-erased) to avoid coupling with upper layers.
-pub trait MqPayload: Downcast + Send {}
+pub trait MqPayload: Downcast + Send {
+ fn attach_cleanup(&mut self, cleanup: PayloadCleanup) -> Result<(), PayloadCleanup> {
+ Err(cleanup)
+ }
+}
impl_downcast!(MqPayload);
+pub type PayloadCleanupFuture = Pin + Send + 'static>>;
+pub type PayloadCleanup = Box PayloadCleanupFuture + Send + 'static>;
+
/// Callback result: deliver a payload or indicate retry/non-retry.
pub enum PayloadResult {
Ok(Box),
@@ -813,10 +839,12 @@ pub struct MpscConsumer {
///
/// 队列元素是一次完整 get 操作的 JoinHandle;consumer
/// 只需 pop 并等待其完成即可,保证按提交顺序消费。
- inflight_rx: mpsc::Receiver,
+ inflight_queue: Arc>>,
inflight_consume_notify: Arc,
/// 控制通道,仅用于下发回调设置等控制类命令。
cmd_tx: mpsc::Sender,
+ /// Local mirror of payload callback for non-prefetch direct paths.
+ payload_cb: Option,
/// delete callback invoked after successful consume-offset commit.
delete_cb: Option,
/// Shared shutdown controller used by higher layers to signal
@@ -1242,10 +1270,13 @@ impl MpscConsumer {
}
async fn recv_next_inflight_handle_with_idle_warn(&mut self) -> Option {
- match self.inflight_rx.try_recv() {
- Ok(handle) => return Some(handle),
- Err(tokio::sync::mpsc::error::TryRecvError::Disconnected) => return None,
- Err(tokio::sync::mpsc::error::TryRecvError::Empty) => {}
+ if let Some(handle) = self
+ .inflight_queue
+ .lock()
+ .expect("inflight queue mutex poisoned")
+ .pop_front()
+ {
+ return Some(handle);
}
let idle_warn_sleep = tokio::time::sleep(NO_MESSAGE_WARN_INTERVAL);
@@ -1255,10 +1286,19 @@ impl MpscConsumer {
if self.shutdown.is_closed() {
return None;
}
+ let queue_notify = self.inflight_consume_notify.notified();
+ tokio::pin!(queue_notify);
tokio::select! {
biased;
- handle_opt = self.inflight_rx.recv() => {
- return handle_opt;
+ _ = &mut queue_notify => {
+ if let Some(handle) = self
+ .inflight_queue
+ .lock()
+ .expect("inflight queue mutex poisoned")
+ .pop_front()
+ {
+ return Some(handle);
+ }
}
_ = &mut idle_warn_sleep => {
let parent_mpmc_id = match self.category {
@@ -1399,7 +1439,7 @@ impl MpscConsumer {
let global_lease_id = chan_mgr.global_lease.id() as i64;
let (
cmd_tx,
- inflight_rx,
+ inflight_queue,
target_inflight,
inflight_queue_size,
inflight_consume_notify,
@@ -1438,9 +1478,10 @@ impl MpscConsumer {
chan_mgr,
target_inflight,
inflight_queue_size,
- inflight_rx,
+ inflight_queue,
cmd_tx,
inflight_consume_notify,
+ payload_cb: None,
delete_cb: None,
shutdown,
category,
@@ -1480,6 +1521,10 @@ impl MpscConsumer {
&self.consumer_idx
}
+ pub fn channel_capacity(&self) -> i64 {
+ self.chan_mgr.capacity()
+ }
+
pub fn lease_manager(&self) -> &LeaseManager {
&self.lease_manager
}
@@ -1570,6 +1615,7 @@ impl MpscConsumer {
/// This method is synchronous and only pushes a control command to the
/// internal actor via `try_send`.
pub fn set_payload_callback(&mut self, cb: PayloadCallback) {
+ self.payload_cb = Some(cb.clone());
let _ = self.cmd_tx.try_send(ConsumerCmd::SetCallback(cb));
}
@@ -1619,8 +1665,7 @@ impl MpscConsumer {
} else {
self.recv_next_inflight_handle_with_idle_warn().await
};
- let inflight_item =
- handle_opt.ok_or_else(|| MpscError::Internal("prefetch actor closed".to_string()))?;
+ let inflight_item = handle_opt.ok_or(MpscError::Closed)?;
debug!(
"[MpscConsumer get_with_payload] instance_id={} chan_id={} seq={} producer_id={} consume_offset={} inflight_queue_size_after_pop={}",
self.instance_id,
@@ -1893,6 +1938,48 @@ impl MpscConsumer {
.await
}
+ pub async fn get_with_payload_via_broker(
+ &mut self,
+ broker: &BrokerHandle,
+ ) -> Result {
+ let cb = self
+ .payload_cb
+ .as_ref()
+ .ok_or_else(|| MpscError::Internal("payload callback not set".to_string()))?
+ .clone();
+ get_payload_via_broker(
+ broker,
+ self.chan_id,
+ self.consumer_idx.clone(),
+ cb,
+ self.delete_cb.clone(),
+ self.shutdown.clone(),
+ )
+ .await
+ }
+
+ pub async fn get_batch_with_payload_via_broker(
+ &mut self,
+ broker: &BrokerHandle,
+ batch_size: usize,
+ ) -> Result, MpscError> {
+ let cb = self
+ .payload_cb
+ .as_ref()
+ .ok_or_else(|| MpscError::Internal("payload callback not set".to_string()))?
+ .clone();
+ get_payload_batch_via_broker(
+ broker,
+ self.chan_id,
+ self.consumer_idx.clone(),
+ batch_size,
+ cb,
+ self.delete_cb.clone(),
+ self.shutdown.clone(),
+ )
+ .await
+ }
+
/// Runs the KV payload fetch stage with retry semantics.
/// Consume-offset commit is handled by the prefetch job.
async fn run_single_get(
@@ -1909,9 +1996,7 @@ impl MpscConsumer {
let mut payload_obj: Option> = None;
loop {
if shutdown.is_closed() {
- return Err(MpscError::Internal(
- "consumer closed during get_with_payload".to_string(),
- ));
+ return Err(MpscError::Closed);
}
let msg_key = keys::backend_message_key_with_category(
chan_id,
@@ -1978,10 +2063,7 @@ impl MpscConsumer {
loop {
if shutdown.is_closed() {
- return Err(MpscError::Internal(format!(
- "consumer closed during consume-offset commit: seq={} producer_id={} consume_offset={}",
- seq, producer_id, consume_offset
- )));
+ return Err(MpscError::Closed);
}
attempts += 1;
@@ -1996,10 +2078,7 @@ impl MpscConsumer {
let put_res = tokio::select! {
biased;
_ = shutdown.wait_closed() => {
- return Err(MpscError::Internal(format!(
- "consumer closed during consume-offset commit: seq={} producer_id={} consume_offset={}",
- seq, producer_id, consume_offset
- )));
+ return Err(MpscError::Closed);
}
res = tokio::time::timeout(
COMMIT_OFFSET_PUT_TIMEOUT,
@@ -2073,10 +2152,7 @@ impl MpscConsumer {
tokio::select! {
biased;
_ = shutdown.wait_closed() => {
- return Err(MpscError::Internal(format!(
- "consumer closed during consume-offset retry sleep: seq={} producer_id={} consume_offset={}",
- seq, producer_id, consume_offset
- )));
+ return Err(MpscError::Closed);
}
_ = sleep(COMMIT_OFFSET_RETRY_SLEEP) => {}
}
@@ -2176,6 +2252,544 @@ impl MpscConsumer {
}
}
+async fn get_payload_via_broker(
+ broker: &BrokerHandle,
+ chan_id: i64,
+ consumer_id: String,
+ cb: PayloadCallback,
+ delete_cb: Option,
+ shutdown: ShutdownCtl,
+) -> Result {
+ let fetched = broker
+ .fetch_next(BrokerFetchRequest {
+ channel_id: chan_id,
+ consumer_id: consumer_id.clone(),
+ now_ms: now_ms(),
+ })
+ .await
+ .map_err(|e| {
+ MpscError::Internal(format!(
+ "broker fetch failed: chan_id={} consumer_id={} err={}",
+ chan_id, consumer_id, e
+ ))
+ })?
+ .ok_or(MpscError::NoMessage)?;
+ let envelope = fetched.envelope;
+ let reservation_id = envelope.reservation_id;
+ let producer_id = envelope.producer_id.clone();
+ let payload_key = envelope.payload_key.clone();
+ let mut requeue_guard =
+ BrokerInflightRequeueGuard::new(broker.clone(), chan_id, vec![reservation_id]);
+ let payload = match run_payload_callback(
+ chan_id,
+ cb,
+ producer_id.clone(),
+ payload_key,
+ shutdown.clone(),
+ )
+ .await
+ {
+ Ok((payload, _kv_get_latency_ns)) => payload,
+ Err(err) => {
+ requeue_guard.requeue_now().await;
+ return Err(err);
+ }
+ };
+
+ let commit_outcome = match broker.commit(chan_id, reservation_id, now_ms()).await {
+ Ok(outcome) => outcome,
+ Err(err) => {
+ requeue_guard.requeue_now().await;
+ return Err(MpscError::Internal(format!(
+ "broker commit failed: chan_id={} consumer_id={} reservation_id={} err={}",
+ chan_id, consumer_id, reservation_id, err
+ )));
+ }
+ };
+ requeue_guard.mark_completed(reservation_id);
+ if !commit_outcome.first_commit {
+ return Err(MpscError::Internal(format!(
+ "broker commit returned duplicate first_commit=false: chan_id={} consumer_id={} reservation_id={}",
+ chan_id, consumer_id, reservation_id
+ )));
+ }
+
+ if let Some(envelope) = commit_outcome.cleanup {
+ spawn_broker_cleanup(broker.clone(), chan_id, delete_cb.clone(), envelope);
+ }
+
+ Ok(ConsumedPayload {
+ producer_id,
+ payload,
+ nonblocking_hit: true,
+ })
+}
+
+struct BrokerBatchPayload {
+ producer_id: String,
+ payload: Box,
+}
+
+struct BrokerInflightRequeueGuard {
+ broker: BrokerHandle,
+ chan_id: i64,
+ reservation_ids: Vec,
+}
+
+impl BrokerInflightRequeueGuard {
+ fn new(broker: BrokerHandle, chan_id: i64, reservation_ids: Vec) -> Self {
+ Self {
+ broker,
+ chan_id,
+ reservation_ids,
+ }
+ }
+
+ fn extend(&mut self, reservation_ids: I)
+ where
+ I: IntoIterator- ,
+ {
+ self.reservation_ids.extend(reservation_ids);
+ }
+
+ fn mark_completed(&mut self, reservation_id: u64) {
+ if let Some(pos) = self
+ .reservation_ids
+ .iter()
+ .position(|current| *current == reservation_id)
+ {
+ self.reservation_ids.remove(pos);
+ }
+ }
+
+ async fn requeue_now(&mut self) {
+ let reservation_ids = std::mem::take(&mut self.reservation_ids);
+ requeue_pending_broker_inflight(&self.broker, self.chan_id, reservation_ids).await;
+ }
+}
+
+impl Drop for BrokerInflightRequeueGuard {
+ fn drop(&mut self) {
+ let reservation_ids = std::mem::take(&mut self.reservation_ids);
+ if reservation_ids.is_empty() {
+ return;
+ }
+ let broker = self.broker.clone();
+ let chan_id = self.chan_id;
+ tokio::spawn(async move {
+ requeue_pending_broker_inflight(&broker, chan_id, reservation_ids).await;
+ });
+ }
+}
+
+async fn get_payload_batch_via_broker(
+ broker: &BrokerHandle,
+ chan_id: i64,
+ consumer_id: String,
+ batch_size: usize,
+ cb: PayloadCallback,
+ delete_cb: Option
,
+ shutdown: ShutdownCtl,
+) -> Result, MpscError> {
+ if batch_size == 0 {
+ return Ok(Vec::new());
+ }
+
+ let first = broker
+ .fetch_next(BrokerFetchRequest {
+ channel_id: chan_id,
+ consumer_id: consumer_id.clone(),
+ now_ms: now_ms(),
+ })
+ .await
+ .map_err(|e| {
+ MpscError::Internal(format!(
+ "broker fetch failed: chan_id={} consumer_id={} err={}",
+ chan_id, consumer_id, e
+ ))
+ })?
+ .ok_or(MpscError::NoMessage)?;
+
+ let mut fetched = Vec::with_capacity(batch_size);
+ let mut requeue_guard = BrokerInflightRequeueGuard::new(
+ broker.clone(),
+ chan_id,
+ vec![first.envelope.reservation_id],
+ );
+ fetched.push(first);
+
+ let remaining = batch_size.saturating_sub(1);
+ if remaining > 0 {
+ let mut more = match broker
+ .fetch_batch_available(
+ BrokerFetchRequest {
+ channel_id: chan_id,
+ consumer_id: consumer_id.clone(),
+ now_ms: now_ms(),
+ },
+ remaining,
+ )
+ .await
+ {
+ Ok(batch) => {
+ requeue_guard.extend(
+ batch
+ .messages
+ .iter()
+ .map(|message| message.envelope.reservation_id),
+ );
+ batch.messages
+ }
+ Err(err) => {
+ requeue_guard.requeue_now().await;
+ return Err(MpscError::Internal(format!(
+ "broker batch fetch failed: chan_id={} consumer_id={} err={}",
+ chan_id, consumer_id, err
+ )));
+ }
+ };
+ fetched.append(&mut more);
+ }
+
+ match load_broker_payloads_commit_on_ready(
+ broker,
+ chan_id,
+ &consumer_id,
+ fetched,
+ cb,
+ delete_cb,
+ shutdown.clone(),
+ requeue_guard,
+ )
+ .await
+ {
+ Ok(payloads) => Ok(payloads
+ .into_iter()
+ .map(|item| ConsumedPayload {
+ producer_id: item.producer_id,
+ payload: item.payload,
+ nonblocking_hit: true,
+ })
+ .collect()),
+ Err(err) => Err(err),
+ }
+}
+
+async fn load_broker_payloads_commit_on_ready(
+ broker: &BrokerHandle,
+ chan_id: i64,
+ consumer_id: &str,
+ fetched: Vec,
+ cb: PayloadCallback,
+ delete_cb: Option,
+ shutdown: ShutdownCtl,
+ mut requeue_guard: BrokerInflightRequeueGuard,
+) -> Result, MpscError> {
+ let reservation_ids: Vec = fetched
+ .iter()
+ .map(|message| message.envelope.reservation_id)
+ .collect();
+ let mut join_set = JoinSet::new();
+
+ for message in fetched {
+ let envelope = message.envelope;
+ let reservation_id = envelope.reservation_id;
+ let producer_id = envelope.producer_id.clone();
+ let payload_key = envelope.payload_key.clone();
+ let cb = cb.clone();
+ let shutdown = shutdown.clone();
+ join_set.spawn(async move {
+ let result =
+ run_payload_callback(chan_id, cb, producer_id.clone(), payload_key, shutdown)
+ .await
+ .map(|(payload, _kv_get_latency_ns)| BrokerBatchPayload {
+ producer_id,
+ payload,
+ });
+ (reservation_id, result)
+ });
+ }
+
+ let mut payload_results: HashMap> =
+ HashMap::with_capacity(reservation_ids.len());
+ let mut batch_load_failure: Option = None;
+ while let Some(join_res) = join_set.join_next().await {
+ match join_res {
+ Ok((reservation_id, Ok(payload))) => {
+ payload_results.insert(reservation_id, Ok(payload));
+ }
+ Ok((reservation_id, Err(err))) => {
+ payload_results.insert(reservation_id, Err(err));
+ join_set.abort_all();
+ break;
+ }
+ Err(err) => {
+ join_set.abort_all();
+ batch_load_failure = Some(MpscError::JoinError(err));
+ break;
+ }
+ }
+ }
+
+ let mut committed_payloads = Vec::with_capacity(reservation_ids.len());
+ let mut remaining_reservation_ids = Vec::new();
+ let mut stop_error = batch_load_failure;
+ let mut stop_after_current = stop_error.is_some();
+
+ for reservation_id in reservation_ids {
+ if stop_after_current {
+ remaining_reservation_ids.push(reservation_id);
+ continue;
+ }
+
+ let Some(payload_result) = payload_results.remove(&reservation_id) else {
+ stop_error = Some(MpscError::Internal(format!(
+ "broker batch payload load canceled before ordered commit: chan_id={} consumer_id={} reservation_id={}",
+ chan_id, consumer_id, reservation_id
+ )));
+ stop_after_current = true;
+ remaining_reservation_ids.push(reservation_id);
+ continue;
+ };
+
+ let payload = match payload_result {
+ Ok(payload) => payload,
+ Err(err) => {
+ stop_error = Some(err);
+ stop_after_current = true;
+ remaining_reservation_ids.push(reservation_id);
+ continue;
+ }
+ };
+
+ let commit_outcome = match broker.commit(chan_id, reservation_id, now_ms()).await {
+ Ok(outcome) => outcome,
+ Err(err) => {
+ stop_error = Some(MpscError::Internal(format!(
+ "broker commit failed during batch consume: chan_id={} consumer_id={} reservation_id={} err={}",
+ chan_id, consumer_id, reservation_id, err
+ )));
+ stop_after_current = true;
+ remaining_reservation_ids.push(reservation_id);
+ continue;
+ }
+ };
+ requeue_guard.mark_completed(reservation_id);
+ if !commit_outcome.first_commit {
+ stop_error = Some(MpscError::Internal(format!(
+ "broker commit returned duplicate during batch consume: chan_id={} consumer_id={} reservation_id={}",
+ chan_id, consumer_id, reservation_id
+ )));
+ stop_after_current = true;
+ remaining_reservation_ids.push(reservation_id);
+ continue;
+ }
+ if let Some(envelope) = commit_outcome.cleanup {
+ spawn_broker_cleanup(broker.clone(), chan_id, delete_cb.clone(), envelope);
+ }
+
+ committed_payloads.push(payload);
+ }
+
+ if !remaining_reservation_ids.is_empty() {
+ requeue_guard.requeue_now().await;
+ }
+
+ if !committed_payloads.is_empty() {
+ return Ok(committed_payloads);
+ }
+
+ Err(stop_error.unwrap_or_else(|| {
+ MpscError::Internal(format!(
+ "broker batch consume stopped without committed payloads: chan_id={} consumer_id={}",
+ chan_id, consumer_id
+ ))
+ }))
+}
+
+async fn run_payload_callback(
+ chan_id: i64,
+ cb: PayloadCallback,
+ producer_id: String,
+ payload_key: String,
+ shutdown: ShutdownCtl,
+) -> Result<(Box, u128), MpscError> {
+ use tokio::time::sleep;
+
+ let kv_get_begin = Instant::now();
+ loop {
+ if shutdown.is_closed() {
+ return Err(MpscError::Closed);
+ }
+ let f = cb.clone();
+ let producer_for_closure = producer_id.clone();
+ let key_for_closure = payload_key.clone();
+ let res = (f)(producer_for_closure, key_for_closure).await;
+
+ match res {
+ PayloadResult::Ok(payload) => {
+ return Ok((payload, kv_get_begin.elapsed().as_nanos()));
+ }
+ PayloadResult::Retryable(msg) => {
+ warn!(
+ "[MpscConsumer chan_id={}] get payload retryable: {}",
+ chan_id, msg
+ );
+ sleep(Duration::from_millis(50)).await;
+ }
+ PayloadResult::NonRetryable(msg) => {
+ return Err(MpscError::GetPayloadNonRetryable { message: msg });
+ }
+ }
+ }
+}
+
+async fn run_delete_callback_until_success(
+ chan_id: i64,
+ delete_cb: &DeleteCallback,
+ payload_key: String,
+) {
+ use tokio::time::sleep;
+
+ let mut retry_sleep = BROKER_CLEANUP_DELETE_RETRY_INITIAL_SLEEP;
+ loop {
+ let f = delete_cb.clone();
+ let key_clone = payload_key.clone();
+ let delete_begin = Instant::now();
+ let delete_fut = (f)(key_clone.clone());
+ tokio::pin!(delete_fut);
+ let res = loop {
+ tokio::select! {
+ res = &mut delete_fut => {
+ break res;
+ }
+ _ = sleep(DELETE_CALLBACK_WARN_INTERVAL) => {
+ warn!(
+ "[MpscConsumer chan_id={}] async broker delete callback still pending: key={} waited_ms={}",
+ chan_id,
+ key_clone,
+ delete_begin.elapsed().as_millis(),
+ );
+ }
+ }
+ };
+ match res {
+ DeleteResult::Ok => return,
+ DeleteResult::Retryable(msg) => {
+ warn!(
+ "[MpscConsumer chan_id={}] async broker delete payload retryable; retry_after_ms={}: {}",
+ chan_id,
+ retry_sleep.as_millis(),
+ msg
+ );
+ }
+ DeleteResult::NonRetryable(msg) => {
+ warn!(
+ "[MpscConsumer chan_id={}] async broker delete payload non-retryable; keep retrying to preserve broker byte budget; retry_after_ms={}: {}",
+ chan_id,
+ retry_sleep.as_millis(),
+ msg
+ );
+ }
+ }
+ sleep(retry_sleep).await;
+ retry_sleep = retry_sleep
+ .saturating_mul(2)
+ .min(BROKER_CLEANUP_DELETE_RETRY_MAX_SLEEP);
+ }
+}
+
+async fn run_broker_cleanup_ack_until_success(
+ broker: BrokerHandle,
+ chan_id: i64,
+ reservation_id: u64,
+) {
+ use tokio::time::sleep;
+
+ let mut retry_sleep = BROKER_CLEANUP_ACK_RETRY_INITIAL_SLEEP;
+ loop {
+ match broker.cleanup_ack(chan_id, reservation_id).await {
+ Ok(()) => return,
+ Err(err) => {
+ if broker_cleanup_ack_error_is_terminal(&err) {
+ warn!(
+ "async broker cleanup ack stopped after terminal broker error: chan_id={} reservation_id={} err={}",
+ chan_id,
+ reservation_id,
+ err
+ );
+ return;
+ }
+ warn!(
+ "async broker cleanup ack failed; retry_after_ms={}: chan_id={} reservation_id={} err={}",
+ retry_sleep.as_millis(),
+ chan_id,
+ reservation_id,
+ err
+ );
+ }
+ }
+ sleep(retry_sleep).await;
+ retry_sleep = retry_sleep
+ .saturating_mul(2)
+ .min(BROKER_CLEANUP_ACK_RETRY_MAX_SLEEP);
+ }
+}
+
+fn broker_cleanup_ack_error_is_terminal(err: &crate::BrokerError) -> bool {
+ match err {
+ crate::BrokerError::ActorClosed | crate::BrokerError::ChannelNotFound(_) => true,
+ crate::BrokerError::Rpc(message) => {
+ message.contains("System shutdown")
+ || message.contains("actor closed")
+ || message.contains("channel not found")
+ }
+ _ => false,
+ }
+}
+
+fn spawn_broker_cleanup(
+ broker: BrokerHandle,
+ chan_id: i64,
+ delete_cb: Option,
+ envelope: BrokerEnvelope,
+) {
+ tokio::spawn(async move {
+ let reservation_id = envelope.reservation_id;
+ if let Some(delete_cb) = delete_cb.as_ref() {
+ run_delete_callback_until_success(chan_id, delete_cb, envelope.payload_key.clone())
+ .await;
+ }
+ run_broker_cleanup_ack_until_success(broker, chan_id, reservation_id).await;
+ });
+}
+
+async fn requeue_pending_broker_inflight(
+ broker: &BrokerHandle,
+ chan_id: i64,
+ reservation_ids: Vec,
+) {
+ if reservation_ids.is_empty() {
+ return;
+ }
+ if let Err(err) = broker
+ .requeue_inflight_batch(chan_id, reservation_ids)
+ .await
+ {
+ warn!(
+ "best-effort broker batch requeue failed: chan_id={} err={}",
+ chan_id, err
+ );
+ }
+}
+
+fn now_ms() -> i64 {
+ SystemTime::now()
+ .duration_since(UNIX_EPOCH)
+ .expect("system clock is before UNIX_EPOCH")
+ .as_millis() as i64
+}
+
/// MPSC consumer actor,持有 selector、offset、lease 等完整状态。
/// 仅在 mpsc 模块内部可见,对上层 crate 透明。
pub struct ConsumedPayload {
@@ -2454,9 +3068,10 @@ struct ConsumerActor {
producer_selector: ProducerSelectorForConsumer,
/// payload 回调,由上层通过 ConsumerCmd::SetCallback 设置.
payload_cb: Option,
- /// 每个 producer 当前已预取但尚未持久化消费的“下一条 offset”
- /// 提示,用于避免在 etcd consume offset 尚未更新时重复预取
- /// 同一条消息。
+ /// 每个 producer 的本地 reservation cursor(下一条待预取 offset)。
+ ///
+ /// 这个 cursor 可能领先于 etcd consume offset,因为 actor 会在
+ /// consume-offset 持久化之前先连续发起多条 prefetch。
prefetch_offset_map: HashMap,
/// 本地缓存的 produce offset(来自 etcd),仅在无消息或
/// 初始化时 refresh;平时 select_next_message 只读该缓存。
@@ -2479,7 +3094,7 @@ struct ConsumerActor {
/// 向 consumer 暴露的预取队列 sender。
///
/// 队列元素为一次完整 get 操作的 JoinHandle。
- inflight_tx: mpsc::Sender,
+ inflight_queue: Arc>>,
/// inflight consume notify
inflight_consume_notify: Arc,
/// 共享的预取窗口目标。
@@ -2597,10 +3212,12 @@ impl ConsumerActor {
}
fn cached_next_hint(&self, producer_id: &str) -> i64 {
+ let committed_next = self.cached_consume_offset(producer_id);
self.prefetch_offset_map
.get(producer_id)
.copied()
- .unwrap_or_else(|| self.cached_consume_offset(producer_id))
+ .map(|hint| hint.max(committed_next))
+ .unwrap_or(committed_next)
}
fn cached_produce_offset(&self, producer_id: &str) -> i64 {
@@ -2616,6 +3233,12 @@ impl ConsumerActor {
|| self.prefetch_offset_map.contains_key(producer_id)
}
+ fn producer_has_prefetch_room(&self, producer_id: &str) -> bool {
+ let visible_tail = self.cached_produce_offset(producer_id);
+ let next_hint = self.cached_next_hint(producer_id);
+ next_hint <= visible_tail
+ }
+
fn refresh_ready_state_from_local(&mut self, producer_id: &str) -> bool {
let ready_before = self.ready_producers.contains(producer_id);
let stale_before = self.stale_no_room_producers.contains(producer_id);
@@ -2626,8 +3249,7 @@ impl ConsumerActor {
return ready_before || stale_before;
}
- let has_room =
- self.cached_produce_offset(producer_id) >= self.cached_next_hint(producer_id);
+ let has_room = self.producer_has_prefetch_room(producer_id);
if has_room {
self.ready_producers.insert(producer_id.to_string());
self.stale_no_room_producers.remove(producer_id);
@@ -2878,7 +3500,7 @@ impl ConsumerActor {
global_lease_id: i64,
) -> (
mpsc::Sender,
- mpsc::Receiver,
+ Arc>>,
Arc,
Arc,
Arc,
@@ -2889,7 +3511,7 @@ impl ConsumerActor {
let (cmd_tx, cmd_rx) = mpsc::channel(8);
let (meta_tx, meta_rx) = mpsc::channel(8);
let (produce_offset_tx, produce_offset_rx) = mpsc::channel(128);
- let (inflight_tx, inflight_rx) = mpsc::channel(32);
+ let inflight_queue = Arc::new(Mutex::new(VecDeque::new()));
let target_inflight = Arc::new(AtomicUsize::new(0));
let inflight_queue_size = Arc::new(AtomicUsize::new(0));
let inflight_consume_notify = Arc::new(Notify::new());
@@ -2911,7 +3533,7 @@ impl ConsumerActor {
ready_producers: HashSet::new(),
ready_trace_history: HashMap::new(),
stale_no_room_producers: HashSet::new(),
- inflight_tx,
+ inflight_queue: inflight_queue.clone(),
inflight_consume_notify: inflight_consume_notify.clone(),
target_inflight: target_inflight.clone(),
inflight_queue_size: inflight_queue_size.clone(),
@@ -2960,7 +3582,7 @@ impl ConsumerActor {
(
cmd_tx,
- inflight_rx,
+ inflight_queue,
target_inflight,
inflight_queue_size,
inflight_consume_notify,
@@ -3118,7 +3740,7 @@ impl ConsumerActor {
}
// Do not poll `prefetch_tick()` as a `tokio::select!` branch. If the
- // branch is canceled while `inflight_tx.send(...)` is pending, the
+ // branch is canceled while queueing a new inflight item is pending, the
// oneshot receiver inside `InflightItem` is dropped after the
// prefetch job has already started, which strands commit ordering.
self.drain_pending_actor_inputs(&mut rx, &mut meta_rx, &mut produce_offset_rx);
@@ -3163,22 +3785,35 @@ impl ConsumerActor {
return;
}
- for _ in 0..1 {
+ let initial_queue_size = self.inflight_queue_size.load(Ordering::SeqCst);
+ let burst_limit = prefetch_refill_launch_budget(target, initial_queue_size);
+ let mut launched = 0usize;
+ loop {
let current = self.inflight_queue_size.load(Ordering::SeqCst);
if current >= target {
- self.wait_actor_inputs_or_inflight_consume(rx, meta_rx, produce_offset_rx)
- .await;
+ if launched == 0 {
+ self.wait_actor_inputs_or_inflight_consume(rx, meta_rx, produce_offset_rx)
+ .await;
+ }
+ return;
+ }
+ if launched >= burst_limit {
return;
}
match self.try_prefetch_one().await {
Ok(()) => {
+ launched += 1;
self.prefetch_no_message_next_warn_at =
tokio::time::Instant::now() + NO_MESSAGE_WARN_INTERVAL;
self.maybe_log_select_next_message_stats(false);
}
Err(MpscError::NoMessage) => {
self.select_next_message_stats.record_no_message_backoff();
+ if launched > 0 {
+ self.maybe_log_select_next_message_stats(false);
+ return;
+ }
let now = tokio::time::Instant::now();
if now >= self.prefetch_no_message_next_warn_at {
let parent_mpmc_id = match self.category {
@@ -3195,7 +3830,13 @@ impl ConsumerActor {
self.prefetch_no_message_next_warn_at = now + NO_MESSAGE_WARN_INTERVAL;
}
self.maybe_log_select_next_message_stats(false);
- self.wait_actor_inputs(rx, meta_rx, produce_offset_rx).await;
+ self.wait_actor_inputs_or_timeout(
+ rx,
+ meta_rx,
+ produce_offset_rx,
+ prefetch_no_message_retry_sleep(current),
+ )
+ .await;
return;
}
Err(other) => {
@@ -3213,6 +3854,7 @@ impl ConsumerActor {
Duration::from_millis(100),
)
.await;
+ return;
}
}
}
@@ -3223,7 +3865,7 @@ impl ConsumerActor {
/// 返回 `MpscError::NoMessage`。
async fn try_prefetch_one(&mut self) -> Result<(), MpscError> {
if self.shutdown.is_closed() {
- return Err(MpscError::Internal("consumer closed".to_string()));
+ return Err(MpscError::Closed);
}
let cb = self
.payload_cb
@@ -3305,16 +3947,17 @@ impl ConsumerActor {
queue_size_after_inc,
self.target_inflight.load(Ordering::SeqCst),
);
- self.inflight_tx
- .send(InflightItem {
+ self.inflight_queue
+ .lock()
+ .expect("inflight queue mutex poisoned")
+ .push_back(InflightItem {
seq,
producer_id: producer_id_for_queue,
consume_offset,
ready_path_trace,
rx,
- })
- .await
- .map_err(|_| MpscError::Internal("prefetch queue closed".to_string()))?;
+ });
+ self.inflight_consume_notify.notify_one();
debug!(
"[MpscConsumer enqueue] instance_id={} chan_id={} seq={} queue_send_completed queue_size_now={}",
self.instance_id,
@@ -3445,31 +4088,66 @@ impl ConsumerActor {
return Err(MpscError::NoMessage);
}
- self.producer_selector.moveon_round_robin();
- let producer_id = self
- .producer_selector
- .current_producer_idx()
- .ok_or(MpscError::NoMessage)?
- .to_string();
+ let ready_count = self.ready_producers.len();
+ for _ in 0..ready_count {
+ self.producer_selector.moveon_round_robin();
+ let producer_id = self
+ .producer_selector
+ .current_producer_idx()
+ .ok_or(MpscError::NoMessage)?
+ .to_string();
- let prod_off = self.cached_produce_offset(&producer_id);
- let next_hint = self.cached_next_hint(&producer_id);
+ let next_hint = self.cached_next_hint(&producer_id);
- if prod_off < next_hint {
+ if !self.producer_has_prefetch_room(&producer_id) {
+ if self.refresh_ready_state_from_local(&producer_id) {
+ self.rebuild_ready_selector();
+ }
+ continue;
+ }
+
+ let actual_offset = next_hint;
+ self.prefetch_offset_map
+ .insert(producer_id.clone(), actual_offset + 1);
if self.refresh_ready_state_from_local(&producer_id) {
self.rebuild_ready_selector();
}
- return Err(MpscError::NoMessage);
+
+ return Ok((producer_id, actual_offset));
}
- let actual_offset = next_hint;
- self.prefetch_offset_map
- .insert(producer_id.clone(), actual_offset + 1);
- if self.refresh_ready_state_from_local(&producer_id) {
- self.rebuild_ready_selector();
+ if !self.stale_no_room_producers.is_empty() {
+ self.probe_stale_no_room_producers_timed(trace).await?;
+ if !self.ready_producers.is_empty() {
+ let retry_ready_count = self.ready_producers.len();
+ for _ in 0..retry_ready_count {
+ self.producer_selector.moveon_round_robin();
+ let producer_id = self
+ .producer_selector
+ .current_producer_idx()
+ .ok_or(MpscError::NoMessage)?
+ .to_string();
+
+ let next_hint = self.cached_next_hint(&producer_id);
+ if !self.producer_has_prefetch_room(&producer_id) {
+ if self.refresh_ready_state_from_local(&producer_id) {
+ self.rebuild_ready_selector();
+ }
+ continue;
+ }
+
+ let actual_offset = next_hint;
+ self.prefetch_offset_map
+ .insert(producer_id.clone(), actual_offset + 1);
+ if self.refresh_ready_state_from_local(&producer_id) {
+ self.rebuild_ready_selector();
+ }
+ return Ok((producer_id, actual_offset));
+ }
+ }
}
- Ok((producer_id, actual_offset))
+ Err(MpscError::NoMessage)
}
async fn refresh_offsets_from_etcd_timed(
@@ -3623,8 +4301,22 @@ impl ConsumerActor {
#[cfg(test)]
mod tests {
- use super::{merge_monotonic_offset, merge_offset_cache_monotonic};
+ use super::{
+ get_payload_batch_via_broker, get_payload_via_broker, merge_monotonic_offset,
+ merge_offset_cache_monotonic, MqPayload, PayloadCallback, PayloadResult,
+ };
+ use crate::{
+ keys::MqCategory, BrokerChannelConfig, BrokerFetchRequest, BrokerHandle,
+ BrokerReserveRequest,
+ };
use std::collections::HashMap;
+ use std::sync::Arc;
+ use std::time::Duration;
+ use tokio::sync::Notify;
+
+ struct TestPayload;
+
+ impl MqPayload for TestPayload {}
#[test]
fn merge_monotonic_offset_keeps_cached_when_probe_missing() {
@@ -3654,6 +4346,276 @@ mod tests {
assert_eq!(current.get("producer_b"), Some(&41));
assert_eq!(current.get("producer_c"), Some(&7));
}
+
+ #[test]
+ fn visible_tail_does_not_allow_prefetch_past_last_published_offset() {
+ let visible_tail = 0;
+ let next_visible = 0;
+ let next_not_yet_published = 1;
+
+ assert!(next_visible <= visible_tail);
+ assert!(next_not_yet_published > visible_tail);
+ }
+
+ async fn fetch_next_for_test(
+ broker: &BrokerHandle,
+ channel_id: i64,
+ consumer_id: &str,
+ now_ms: i64,
+ ) -> crate::BrokerFetchedMessage {
+ tokio::time::timeout(
+ Duration::from_secs(1),
+ broker.fetch_next(BrokerFetchRequest {
+ channel_id,
+ consumer_id: consumer_id.to_string(),
+ now_ms,
+ }),
+ )
+ .await
+ .expect("timed out waiting for broker redelivery")
+ .unwrap()
+ .unwrap()
+ }
+
+ #[tokio::test]
+ async fn broker_single_consume_timeout_requeues_reserved_message() {
+ let broker = BrokerHandle::new_local_for_test(32);
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 72,
+ capacity: 2,
+ })
+ .await
+ .unwrap();
+
+ let reserved = broker
+ .reserve(BrokerReserveRequest {
+ channel_id: 72,
+ producer_id: "p0".to_string(),
+ category: MqCategory::Mpsc,
+ payload_bytes: 1,
+ now_ms: 10,
+ })
+ .await
+ .unwrap();
+ broker
+ .publish(72, reserved.envelope.reservation_id, 20)
+ .await
+ .unwrap();
+
+ let callback_started = Arc::new(Notify::new());
+ let cb_started_for_callback = callback_started.clone();
+ let cb: PayloadCallback = Arc::new(move |_producer_id: String, _key: String| {
+ let cb_started_for_callback = cb_started_for_callback.clone();
+ Box::pin(async move {
+ cb_started_for_callback.notify_one();
+ tokio::time::sleep(Duration::from_millis(50)).await;
+ PayloadResult::Ok(Box::new(TestPayload))
+ })
+ });
+
+ let mut consume = Box::pin(get_payload_via_broker(
+ &broker,
+ 72,
+ "c0".to_string(),
+ cb,
+ None,
+ crate::ShutdownCtl::new(),
+ ));
+ tokio::select! {
+ _ = callback_started.notified() => {}
+ result = &mut consume => panic!("consume completed before timeout setup: {:?}", result.err()),
+ }
+ assert!(tokio::time::timeout(Duration::from_millis(5), &mut consume)
+ .await
+ .is_err());
+ drop(consume);
+
+ let redelivered = fetch_next_for_test(&broker, 72, "c1", 30).await;
+ assert_eq!(
+ redelivered.envelope.reservation_id,
+ reserved.envelope.reservation_id
+ );
+ }
+
+ #[tokio::test]
+ async fn broker_batch_consume_timeout_requeues_reserved_messages_in_order() {
+ let broker = BrokerHandle::new_local_for_test(32);
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 73,
+ capacity: 2,
+ })
+ .await
+ .unwrap();
+
+ let first = broker
+ .reserve(BrokerReserveRequest {
+ channel_id: 73,
+ producer_id: "p0".to_string(),
+ category: MqCategory::Mpsc,
+ payload_bytes: 1,
+ now_ms: 10,
+ })
+ .await
+ .unwrap();
+ let second = broker
+ .reserve(BrokerReserveRequest {
+ channel_id: 73,
+ producer_id: "p0".to_string(),
+ category: MqCategory::Mpsc,
+ payload_bytes: 1,
+ now_ms: 11,
+ })
+ .await
+ .unwrap();
+ broker
+ .publish(73, first.envelope.reservation_id, 20)
+ .await
+ .unwrap();
+ broker
+ .publish(73, second.envelope.reservation_id, 21)
+ .await
+ .unwrap();
+
+ let callback_started = Arc::new(Notify::new());
+ let cb_started_for_callback = callback_started.clone();
+ let cb: PayloadCallback = Arc::new(move |_producer_id: String, _key: String| {
+ let cb_started_for_callback = cb_started_for_callback.clone();
+ Box::pin(async move {
+ cb_started_for_callback.notify_one();
+ tokio::time::sleep(Duration::from_millis(50)).await;
+ PayloadResult::Ok(Box::new(TestPayload))
+ })
+ });
+
+ let mut consume = Box::pin(get_payload_batch_via_broker(
+ &broker,
+ 73,
+ "c0".to_string(),
+ 2,
+ cb,
+ None,
+ crate::ShutdownCtl::new(),
+ ));
+ tokio::select! {
+ _ = callback_started.notified() => {}
+ result = &mut consume => panic!("batch consume completed before timeout setup: {:?}", result.err()),
+ }
+ assert!(tokio::time::timeout(Duration::from_millis(5), &mut consume)
+ .await
+ .is_err());
+ drop(consume);
+
+ let redelivered_first = fetch_next_for_test(&broker, 73, "c1", 30).await;
+ let redelivered_second = fetch_next_for_test(&broker, 73, "c1", 31).await;
+ assert_eq!(
+ redelivered_first.envelope.reservation_id,
+ first.envelope.reservation_id
+ );
+ assert_eq!(
+ redelivered_second.envelope.reservation_id,
+ second.envelope.reservation_id
+ );
+ }
+
+ #[tokio::test]
+ async fn broker_batch_consume_requeues_without_out_of_order_commit() {
+ let broker = BrokerHandle::new_local_for_test(32);
+ broker
+ .upsert_channel(BrokerChannelConfig {
+ channel_id: 71,
+ capacity: 2,
+ })
+ .await
+ .unwrap();
+
+ let first = broker
+ .reserve(BrokerReserveRequest {
+ channel_id: 71,
+ producer_id: "p0".to_string(),
+ category: MqCategory::Mpsc,
+ payload_bytes: 1,
+ now_ms: 10,
+ })
+ .await
+ .unwrap();
+ let second = broker
+ .reserve(BrokerReserveRequest {
+ channel_id: 71,
+ producer_id: "p0".to_string(),
+ category: MqCategory::Mpsc,
+ payload_bytes: 1,
+ now_ms: 11,
+ })
+ .await
+ .unwrap();
+ broker
+ .publish(71, first.envelope.reservation_id, 20)
+ .await
+ .unwrap();
+ broker
+ .publish(71, second.envelope.reservation_id, 21)
+ .await
+ .unwrap();
+
+ let first_key = first.envelope.payload_key.clone();
+ let cb: PayloadCallback = Arc::new(move |_producer_id: String, key: String| {
+ let first_key = first_key.clone();
+ Box::pin(async move {
+ if key == first_key {
+ tokio::time::sleep(Duration::from_millis(50)).await;
+ PayloadResult::NonRetryable("first payload failed".to_string())
+ } else {
+ PayloadResult::Ok(Box::new(TestPayload))
+ }
+ })
+ });
+
+ let err = get_payload_batch_via_broker(
+ &broker,
+ 71,
+ "c0".to_string(),
+ 2,
+ cb,
+ None,
+ crate::ShutdownCtl::new(),
+ )
+ .await
+ .err()
+ .expect("batch consume should fail when the first payload callback fails");
+ assert!(matches!(
+ err,
+ crate::MpscError::GetPayloadNonRetryable { .. }
+ ));
+
+ let redelivered_first = broker
+ .fetch_next(crate::BrokerFetchRequest {
+ channel_id: 71,
+ consumer_id: "c1".to_string(),
+ now_ms: 30,
+ })
+ .await
+ .unwrap()
+ .unwrap();
+ let redelivered_second = broker
+ .fetch_next(crate::BrokerFetchRequest {
+ channel_id: 71,
+ consumer_id: "c1".to_string(),
+ now_ms: 31,
+ })
+ .await
+ .unwrap()
+ .unwrap();
+ assert_eq!(
+ redelivered_first.envelope.reservation_id,
+ first.envelope.reservation_id
+ );
+ assert_eq!(
+ redelivered_second.envelope.reservation_id,
+ second.envelope.reservation_id
+ );
+ }
}
/// Producer selector for consumer-side weighted round robin.
diff --git a/fluxon_rs/fluxon_mq/src/create.rs b/fluxon_rs/fluxon_mq/src/create.rs
index 4fbb753..79da7ce 100644
--- a/fluxon_rs/fluxon_mq/src/create.rs
+++ b/fluxon_rs/fluxon_mq/src/create.rs
@@ -311,6 +311,7 @@ pub async fn create_mpsc_channel(
global_lease: global_lease_handle,
global_long_lease: global_long_lease_handle,
payload_lease: payload_lease_handle,
+ capacity: cfg.capacity,
etcd_client,
})
}
@@ -534,6 +535,7 @@ impl ChanManager {
global_lease,
global_long_lease,
payload_lease,
+ capacity: meta.capacity,
etcd_client: client,
})
}
diff --git a/fluxon_rs/fluxon_mq/src/error.rs b/fluxon_rs/fluxon_mq/src/error.rs
index b4f1171..9d25d39 100755
--- a/fluxon_rs/fluxon_mq/src/error.rs
+++ b/fluxon_rs/fluxon_mq/src/error.rs
@@ -12,12 +12,24 @@ pub enum MpscError {
#[error("no new message available")]
NoMessage,
+ #[error("consumer is closed")]
+ Closed,
+
#[error("etcd error: {0}")]
Etcd(#[from] etcd_client::Error),
#[error("spawn blocking task failed: {0}")]
JoinError(#[from] tokio::task::JoinError),
+ #[error(
+ "message buffer full: channel_id={channel_id} capacity={capacity} used_slots={used_slots}"
+ )]
+ MessageBufferFull {
+ channel_id: i64,
+ capacity: i64,
+ used_slots: i64,
+ },
+
#[error("put payload returned non-retryable error (code=2)")]
PutPayloadNonRetryable,
@@ -61,10 +73,12 @@ impl MpscError {
match self {
// 可重试类
MpscError::NoMessage => 1000,
+ MpscError::Closed => 1001,
// etcd / 系统
MpscError::Etcd(_) => 2000,
MpscError::JoinError(_) => 2001,
+ MpscError::MessageBufferFull { .. } => 2002,
// put payload
MpscError::PutPayloadNonRetryable => 3000,
diff --git a/fluxon_rs/fluxon_mq/src/keys.rs b/fluxon_rs/fluxon_mq/src/keys.rs
index 1d55754..e2c8a4e 100644
--- a/fluxon_rs/fluxon_mq/src/keys.rs
+++ b/fluxon_rs/fluxon_mq/src/keys.rs
@@ -1,9 +1,13 @@
use std::fmt::Write as _;
+use bitcode::{Decode, Encode};
+use serde::{Deserialize, Serialize};
+
/// MQ category for key generation.
-#[derive(Clone, Copy, Debug)]
+#[derive(Clone, Copy, Debug, Default, Serialize, Deserialize, Encode, Decode)]
pub enum MqCategory {
/// Standalone MPSC usage
+ #[default]
Mpsc,
/// MPSC acts as a submodule under an MPMC producer; carries parent mpmc id only.
/// The producer member id is the same as `producer_idx` passed alongside and
diff --git a/fluxon_rs/fluxon_mq/src/lib.rs b/fluxon_rs/fluxon_mq/src/lib.rs
index 3dded48..70b2024 100644
--- a/fluxon_rs/fluxon_mq/src/lib.rs
+++ b/fluxon_rs/fluxon_mq/src/lib.rs
@@ -1,3 +1,4 @@
+pub mod broker;
pub mod consumer;
pub mod create;
pub mod error;
@@ -10,6 +11,7 @@ pub mod nonblocking_monitor;
pub mod producer;
pub mod shutdown;
+pub use crate::broker::*;
pub use crate::consumer::DeleteResult;
pub use crate::consumer::MpscConsumer;
pub use crate::create::{create_mpsc_channel, ChanCreateConfig};
diff --git a/fluxon_rs/fluxon_mq/src/manager.rs b/fluxon_rs/fluxon_mq/src/manager.rs
index b6d581a..fb5ffdb 100644
--- a/fluxon_rs/fluxon_mq/src/manager.rs
+++ b/fluxon_rs/fluxon_mq/src/manager.rs
@@ -206,6 +206,7 @@ pub struct ChanManager {
/// 决定好 payload lease id,并通过 LeaseManager 注册
/// 对应的 kvclient keepalive;此处始终持有一个有效句柄。
pub payload_lease: GeneralLease,
+ pub(crate) capacity: i64,
pub(crate) etcd_client: etcd::Client,
}
@@ -227,4 +228,8 @@ impl ChanManager {
pub fn member_lease_id(&self) -> i64 {
self.member_lease.id() as i64
}
+
+ pub fn capacity(&self) -> i64 {
+ self.capacity
+ }
}
diff --git a/fluxon_rs/fluxon_mq/src/producer.rs b/fluxon_rs/fluxon_mq/src/producer.rs
index fb4e4ea..72082c2 100644
--- a/fluxon_rs/fluxon_mq/src/producer.rs
+++ b/fluxon_rs/fluxon_mq/src/producer.rs
@@ -28,10 +28,15 @@ use crate::nonblocking_monitor::{
};
use crate::shutdown::ShutdownCtl;
use crate::LifecycleView;
+use crate::{BrokerError, BrokerHandle, BrokerReserveRequest};
use tokio::sync::watch;
use tracing::warn;
const PRODUCE_OFFSET_ETCD_SLOW_WARN_THRESHOLD: Duration = Duration::from_secs(1);
+const BROKER_BACKPRESSURE_INITIAL_SLEEP_MS: u64 = 2;
+const BROKER_BACKPRESSURE_MAX_SLEEP_MS: u64 = 50;
+const BROKER_BACKPRESSURE_JITTER_MS: u64 = 7;
+const BROKER_BACKPRESSURE_WARN_INTERVAL: Duration = Duration::from_secs(5);
#[derive(Debug, Clone, Serialize, Deserialize)]
struct ProducerMemberMeta {
@@ -266,6 +271,10 @@ impl MpscProducer {
self.chan_mgr.payload_lease.id() as i64
}
+ pub fn channel_capacity(&self) -> i64 {
+ self.chan_mgr.capacity()
+ }
+
/// Shared shutdown controller for this producer instance.
pub fn shutdown_ctl(&self) -> ShutdownCtl {
self.shutdown.clone()
@@ -420,9 +429,7 @@ impl MpscProducer {
let put_payload = Arc::new(put_payload);
loop {
if self.shutdown.is_closed() {
- return Err(MpscError::Internal(
- "producer closed during put_with_payload".to_string(),
- ));
+ return Err(MpscError::Closed);
}
let key_clone = msg_key.clone();
let f = put_payload.clone();
@@ -479,6 +486,243 @@ impl MpscProducer {
}
Ok(())
}
+
+ /// Broker-backed put path.
+ ///
+ /// This keeps the existing payload callback contract but moves
+ /// message id allocation and publish visibility into the broker.
+ /// The current etcd-backed `put_with_payload` remains untouched
+ /// until call sites are switched to this path.
+ pub async fn put_with_payload_via_broker(
+ &mut self,
+ broker: &BrokerHandle,
+ payload_bytes: u64,
+ put_payload: F,
+ ) -> Result<(), MpscError>
+ where
+ F: Fn(String, i64, Option) -> i32 + Send + Sync + 'static,
+ {
+ let preferred_sub_cluster_for_call = self.preferred_sub_cluster_for_put()?;
+ let published_msg_id = put_payload_via_broker(
+ broker,
+ self.chan_id,
+ &self.producer_idx,
+ self.category,
+ payload_bytes,
+ self.shutdown.clone(),
+ preferred_sub_cluster_for_call,
+ put_payload,
+ )
+ .await?;
+ self.next_msg_id = self.next_msg_id.max(published_msg_id + 1);
+ Ok(())
+ }
+}
+
+async fn put_payload_via_broker(
+ broker: &BrokerHandle,
+ chan_id: i64,
+ producer_idx: &str,
+ category: MqCategory,
+ payload_bytes: u64,
+ shutdown: ShutdownCtl,
+ preferred_sub_cluster_for_call: Option,
+ put_payload: F,
+) -> Result
+where
+ F: Fn(String, i64, Option) -> i32 + Send + Sync + 'static,
+{
+ use limit_thirdparty::tokio::task;
+ use tokio::time::sleep;
+
+ let put_payload = Arc::new(put_payload);
+ let reserve_wait_begin = Instant::now();
+ let mut reserve_retry_attempt: u32 = 0;
+ let mut payload_retry_attempt: u32 = 0;
+ let mut next_reserve_warn_at = Instant::now() + BROKER_BACKPRESSURE_WARN_INTERVAL;
+ let mut next_payload_warn_at = Instant::now() + BROKER_BACKPRESSURE_WARN_INTERVAL;
+
+ loop {
+ if shutdown.is_closed() {
+ return Err(MpscError::Closed);
+ }
+
+ let reservation = match broker
+ .reserve(BrokerReserveRequest {
+ channel_id: chan_id,
+ producer_id: producer_idx.to_string(),
+ category,
+ payload_bytes,
+ now_ms: broker_now_ms(),
+ })
+ .await
+ {
+ Ok(reservation) => {
+ reserve_retry_attempt = 0;
+ reservation
+ }
+ Err(BrokerError::ChannelFull {
+ channel_id,
+ capacity,
+ used_slots,
+ }) => {
+ let now = Instant::now();
+ if now >= next_reserve_warn_at {
+ warn!(
+ "broker reserve backpressured: chan_id={} producer_idx={} capacity={} used_slots={} waited_ms={}",
+ channel_id,
+ producer_idx,
+ capacity,
+ used_slots,
+ reserve_wait_begin.elapsed().as_millis(),
+ );
+ next_reserve_warn_at = now + BROKER_BACKPRESSURE_WARN_INTERVAL;
+ }
+ let sleep_for =
+ broker_backpressure_sleep_duration(producer_idx, reserve_retry_attempt);
+ reserve_retry_attempt = reserve_retry_attempt.saturating_add(1);
+ sleep(sleep_for).await;
+ continue;
+ }
+ Err(BrokerError::PayloadBytesFull {
+ capacity_bytes,
+ used_bytes,
+ requested_bytes,
+ }) => {
+ let now = Instant::now();
+ if now >= next_reserve_warn_at {
+ warn!(
+ "broker payload budget backpressured: chan_id={} producer_idx={} requested_bytes={} capacity_bytes={} used_bytes={} waited_ms={}",
+ chan_id,
+ producer_idx,
+ requested_bytes,
+ capacity_bytes,
+ used_bytes,
+ reserve_wait_begin.elapsed().as_millis(),
+ );
+ next_reserve_warn_at = now + BROKER_BACKPRESSURE_WARN_INTERVAL;
+ }
+ let sleep_for =
+ broker_backpressure_sleep_duration(producer_idx, reserve_retry_attempt);
+ reserve_retry_attempt = reserve_retry_attempt.saturating_add(1);
+ sleep(sleep_for).await;
+ continue;
+ }
+ Err(BrokerError::PayloadTooLarge {
+ requested_bytes,
+ capacity_bytes,
+ }) => {
+ return Err(MpscError::Internal(format!(
+ "broker payload too large: chan_id={} producer_idx={} requested_bytes={} capacity_bytes={}",
+ chan_id, producer_idx, requested_bytes, capacity_bytes
+ )));
+ }
+ Err(other) => {
+ return Err(MpscError::Internal(format!(
+ "broker reserve failed: chan_id={} producer_idx={} err={}",
+ chan_id, producer_idx, other
+ )));
+ }
+ };
+ let reservation_id = reservation.envelope.reservation_id;
+ let msg_id = reservation.envelope.msg_id;
+ let msg_key = reservation.envelope.payload_key.clone();
+
+ let key_clone = msg_key.clone();
+ let f = put_payload.clone();
+ let hint = preferred_sub_cluster_for_call.clone();
+ let code = task::spawn_blocking(move || (f)(key_clone, msg_id, hint))
+ .await
+ .map_err(|e| {
+ abort_on_payload_failure_async(broker.clone(), chan_id, reservation_id);
+ MpscError::JoinError(e)
+ })?;
+
+ match code {
+ 0 => {
+ broker
+ .publish(chan_id, reservation_id, broker_now_ms())
+ .await
+ .map_err(|e| {
+ MpscError::Internal(format!(
+ "broker publish failed after payload write: chan_id={} producer_idx={} reservation_id={} msg_id={} err={}",
+ chan_id, producer_idx, reservation_id, msg_id, e
+ ))
+ })?;
+ return Ok(msg_id);
+ }
+ 1 => {
+ abort_broker_reservation_best_effort(broker, chan_id, reservation_id).await;
+ let now = Instant::now();
+ if now >= next_payload_warn_at {
+ warn!(
+ "broker payload write backpressured by owner pool: chan_id={} producer_idx={} waited_ms={}",
+ chan_id,
+ producer_idx,
+ reserve_wait_begin.elapsed().as_millis(),
+ );
+ next_payload_warn_at = now + BROKER_BACKPRESSURE_WARN_INTERVAL;
+ }
+ let sleep_for =
+ broker_backpressure_sleep_duration(producer_idx, payload_retry_attempt);
+ payload_retry_attempt = payload_retry_attempt.saturating_add(1);
+ sleep(sleep_for).await;
+ continue;
+ }
+ 2 => {
+ abort_broker_reservation_best_effort(broker, chan_id, reservation_id).await;
+ return Err(MpscError::PutPayloadNonRetryable);
+ }
+ other => {
+ abort_broker_reservation_best_effort(broker, chan_id, reservation_id).await;
+ return Err(MpscError::PutPayloadUnknownCode { code: other });
+ }
+ }
+ }
+}
+
+fn broker_backpressure_sleep_duration(producer_idx: &str, retry_attempt: u32) -> Duration {
+ let shift = retry_attempt.min(6);
+ let base_ms = BROKER_BACKPRESSURE_INITIAL_SLEEP_MS
+ .saturating_mul(1_u64 << shift)
+ .min(BROKER_BACKPRESSURE_MAX_SLEEP_MS);
+ let jitter_ms = if BROKER_BACKPRESSURE_JITTER_MS == 0 {
+ 0
+ } else {
+ producer_idx
+ .bytes()
+ .fold(retry_attempt as u64, |acc, byte| {
+ acc.wrapping_mul(31).wrapping_add(byte as u64)
+ })
+ % (BROKER_BACKPRESSURE_JITTER_MS + 1)
+ };
+ Duration::from_millis((base_ms + jitter_ms).min(BROKER_BACKPRESSURE_MAX_SLEEP_MS))
+}
+
+async fn abort_broker_reservation_best_effort(
+ broker: &BrokerHandle,
+ chan_id: i64,
+ reservation_id: u64,
+) {
+ if let Err(err) = broker.abort(chan_id, reservation_id).await {
+ warn!(
+ "best-effort broker abort failed: chan_id={} reservation_id={} err={}",
+ chan_id, reservation_id, err
+ );
+ }
+}
+
+fn abort_on_payload_failure_async(broker: BrokerHandle, chan_id: i64, reservation_id: u64) {
+ tokio::spawn(async move {
+ abort_broker_reservation_best_effort(&broker, chan_id, reservation_id).await;
+ });
+}
+
+fn broker_now_ms() -> i64 {
+ SystemTime::now()
+ .duration_since(UNIX_EPOCH)
+ .expect("system clock is before UNIX_EPOCH")
+ .as_millis() as i64
}
fn spawn_consumer_meta_watch(
diff --git a/fluxon_rs/fluxon_observability/src/types.rs b/fluxon_rs/fluxon_observability/src/types.rs
index 446c43d..42db8aa 100644
--- a/fluxon_rs/fluxon_observability/src/types.rs
+++ b/fluxon_rs/fluxon_observability/src/types.rs
@@ -20,6 +20,7 @@ impl FluxonMemberKind {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum FluxonMemberRole {
Master,
+ Broker,
OwnerClient,
ExternalClient,
SideTransferWorker,
@@ -30,6 +31,7 @@ impl FluxonMemberRole {
pub fn as_str(self) -> &'static str {
match self {
FluxonMemberRole::Master => "master",
+ FluxonMemberRole::Broker => "broker",
FluxonMemberRole::OwnerClient => "owner_client",
FluxonMemberRole::ExternalClient => "external_client",
FluxonMemberRole::SideTransferWorker => "side_transfer_worker",
diff --git a/fluxon_rs/fluxon_ops/build.rs b/fluxon_rs/fluxon_ops/build.rs
index 585fbfc..51e95c4 100644
--- a/fluxon_rs/fluxon_ops/build.rs
+++ b/fluxon_rs/fluxon_ops/build.rs
@@ -59,9 +59,17 @@ print(
}
fn render_log_shard_helper(repo_root: &Path) -> String {
- let helper_path = repo_root.join("deployment").join("utils").join("log_shard.py");
- fs::read_to_string(&helper_path)
- .unwrap_or_else(|e| panic!("read log shard helper failed: {} ({})", helper_path.display(), e))
+ let helper_path = repo_root
+ .join("deployment")
+ .join("utils")
+ .join("log_shard.py");
+ fs::read_to_string(&helper_path).unwrap_or_else(|e| {
+ panic!(
+ "read log shard helper failed: {} ({})",
+ helper_path.display(),
+ e
+ )
+ })
}
fn main() {
@@ -87,6 +95,10 @@ fn main() {
);
println!(
"cargo:rerun-if-changed={}",
- repo_root.join("deployment").join("utils").join("log_shard.py").display()
+ repo_root
+ .join("deployment")
+ .join("utils")
+ .join("log_shard.py")
+ .display()
);
}
diff --git a/fluxon_rs/fluxon_ops/src/lib.rs b/fluxon_rs/fluxon_ops/src/lib.rs
index 29d9434..3adb053 100644
--- a/fluxon_rs/fluxon_ops/src/lib.rs
+++ b/fluxon_rs/fluxon_ops/src/lib.rs
@@ -80,7 +80,8 @@ const DELETE_APPLY_NO_WAIT_DELAY_SECONDS: u64 = 30;
const EMBEDDED_SELECTION_SUPERVISOR_SOURCE: &str =
include_str!(concat!(env!("OUT_DIR"), "/selection_supervisor.py"));
-const EMBEDDED_LOG_SHARD_HELPER_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/log_shard.py"));
+const EMBEDDED_LOG_SHARD_HELPER_SOURCE: &str =
+ include_str!(concat!(env!("OUT_DIR"), "/log_shard.py"));
// Ops controller uses Fluxon user-RPC to talk to ops agents.
// Keep the timeout as a fixed constant to avoid config surface area.
@@ -351,7 +352,10 @@ fn workload_log_latest_shard_identity(logical_path: &Path) -> anyhow::Result anyhow::Result
Ok(resolved)
}
-fn ensure_embedded_selection_supervisor_runtime(workdir: &Path) -> anyhow::Result<(PathBuf, PathBuf)> {
+fn ensure_embedded_selection_supervisor_runtime(
+ workdir: &Path,
+) -> anyhow::Result<(PathBuf, PathBuf)> {
let runtime_dir = workdir.join(OPS_SELECTION_SUPERVISOR_DIR_NAME);
std::fs::create_dir_all(&runtime_dir).with_context(|| {
format!(
@@ -1657,10 +1663,11 @@ fn selection_owner_supervisor(
scope_key: Option<&str>,
exclude_pid: Option,
) -> anyhow::Result> {
- let owners: Vec = live_selection_supervisors(snapshot, Some(label), scope_key)?
- .into_iter()
- .filter(|supervisor| exclude_pid != Some(supervisor.pid()))
- .collect();
+ let owners: Vec =
+ live_selection_supervisors(snapshot, Some(label), scope_key)?
+ .into_iter()
+ .filter(|supervisor| exclude_pid != Some(supervisor.pid()))
+ .collect();
if owners.is_empty() {
return Ok(None);
}
@@ -2068,7 +2075,16 @@ fn wait_for_selection_attached(
argv: &[String],
cwd: Option<&str>,
) -> anyhow::Result {
- wait_for_selection_attached_for_scope(kind, name, authority, None, apply_id, owner_ts_ms, argv, cwd)
+ wait_for_selection_attached_for_scope(
+ kind,
+ name,
+ authority,
+ None,
+ apply_id,
+ owner_ts_ms,
+ argv,
+ cwd,
+ )
}
fn wait_for_selection_attached_without_present_for_scope(
@@ -2803,10 +2819,9 @@ impl SupervisorBackedWorkloads {
fn list_workloads(&self) -> anyhow::Result> {
let mut out: Vec = Vec::new();
let snapshot = selection_supervisor_proc_snapshot()?;
- for status in observe_all_selection_statuses_for_snapshot(
- &snapshot,
- Some(self.scope_key.as_str()),
- )? {
+ for status in
+ observe_all_selection_statuses_for_snapshot(&snapshot, Some(self.scope_key.as_str()))?
+ {
let kind = status.kind.with_context(|| {
format!(
"selection supervisor list item missing kind: label={}",
@@ -3159,7 +3174,10 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
None => {
let Some(path) = resolve_readable_log_path(&logical_path) else {
let resp = make_err_resp(
- format!("log file is not available yet: logical_path={}", logical_path.display()),
+ format!(
+ "log file is not available yet: logical_path={}",
+ logical_path.display()
+ ),
None,
);
return Ok(serde_json::to_vec(&resp).unwrap());
@@ -3186,138 +3204,11 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
};
let file_size = meta.len();
- let (start, end, start_cursor, end_cursor, effective_path, effective_file_size) =
- match req.direction {
- LogReadDirection::Forward => {
- if let Some(cursor) = req.cursor.as_ref() {
- if cursor.offset > file_size {
- let resp = make_err_resp(
- format!(
- "cursor out of range: shard={} cursor={} file_size={}",
- cursor.shard, cursor.offset, file_size
- ),
- Some(file_size),
- );
- return Ok(serde_json::to_vec(&resp).unwrap());
- }
- let mut effective_path = path.clone();
- let mut effective_shard = shard.clone();
- let mut effective_file_size = file_size;
- let mut start = cursor.offset;
- if cursor.offset == file_size {
- if let Ok(Some(next_shard)) =
- workload_log_next_shard(&logical_path, &cursor.shard)
- {
- let next_path = match workload_log_path_for_shard(&logical_path, &next_shard) {
- Ok(v) => v,
- Err(e) => {
- let resp = make_err_resp(format!("{}", e), Some(file_size));
- return Ok(serde_json::to_vec(&resp).unwrap());
- }
- };
- match std::fs::metadata(&next_path) {
- Ok(next_meta) => {
- effective_file_size = next_meta.len();
- effective_path = next_path;
- effective_shard = next_shard;
- start = 0;
- }
- Err(e) => {
- let resp = make_err_resp(
- format!(
- "stat next log shard failed: path={} err={}",
- next_path.display(),
- e
- ),
- Some(file_size),
- );
- return Ok(serde_json::to_vec(&resp).unwrap());
- }
- }
- } else if let Ok(Some(latest_shard)) =
- workload_log_latest_shard_identity(&logical_path)
- {
- if latest_shard != cursor.shard {
- let latest_path =
- match workload_log_path_for_shard(&logical_path, &latest_shard) {
- Ok(v) => v,
- Err(e) => {
- let resp = make_err_resp(format!("{}", e), Some(file_size));
- return Ok(serde_json::to_vec(&resp).unwrap());
- }
- };
- match std::fs::metadata(&latest_path) {
- Ok(latest_meta) => {
- effective_file_size = latest_meta.len();
- effective_path = latest_path;
- effective_shard = latest_shard;
- start = 0;
- }
- Err(e) => {
- let resp = make_err_resp(
- format!(
- "stat latest log shard failed: path={} err={}",
- latest_path.display(),
- e
- ),
- Some(file_size),
- );
- return Ok(serde_json::to_vec(&resp).unwrap());
- }
- }
- }
- }
- }
- let end = match max_bytes {
- Some(max_bytes) => {
- std::cmp::min(effective_file_size, start.saturating_add(max_bytes))
- }
- None => effective_file_size,
- };
- (
- start,
- end,
- Some(WorkloadLogCursor {
- shard: effective_shard.clone(),
- offset: start,
- }),
- Some(WorkloadLogCursor {
- shard: effective_shard.clone(),
- offset: end,
- }),
- effective_path,
- effective_file_size,
- )
- } else {
- let end = file_size;
- let start = match max_bytes {
- Some(max_bytes) => end.saturating_sub(max_bytes),
- None => 0,
- };
- (
- start,
- end,
- Some(WorkloadLogCursor {
- shard: shard.clone(),
- offset: start,
- }),
- Some(WorkloadLogCursor {
- shard: shard.clone(),
- offset: end,
- }),
- path.clone(),
- file_size,
- )
- }
- }
- LogReadDirection::Backward => {
- let Some(cursor) = req.cursor.as_ref() else {
- let resp = make_err_resp(
- "cursor is required for Backward reads".to_string(),
- Some(file_size),
- );
- return Ok(serde_json::to_vec(&resp).unwrap());
- };
+ let (start, end, start_cursor, end_cursor, effective_path, effective_file_size) = match req
+ .direction
+ {
+ LogReadDirection::Forward => {
+ if let Some(cursor) = req.cursor.as_ref() {
if cursor.offset > file_size {
let resp = make_err_resp(
format!(
@@ -3331,30 +3222,31 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
let mut effective_path = path.clone();
let mut effective_shard = shard.clone();
let mut effective_file_size = file_size;
- let mut end = cursor.offset;
- if cursor.offset == 0 {
- if let Ok(Some(prev_shard)) =
- workload_log_previous_shard(&logical_path, &cursor.shard)
+ let mut start = cursor.offset;
+ if cursor.offset == file_size {
+ if let Ok(Some(next_shard)) =
+ workload_log_next_shard(&logical_path, &cursor.shard)
{
- let prev_path = match workload_log_path_for_shard(&logical_path, &prev_shard) {
- Ok(v) => v,
- Err(e) => {
- let resp = make_err_resp(format!("{}", e), Some(file_size));
- return Ok(serde_json::to_vec(&resp).unwrap());
- }
- };
- match std::fs::metadata(&prev_path) {
- Ok(prev_meta) => {
- effective_file_size = prev_meta.len();
- effective_path = prev_path;
- effective_shard = prev_shard;
- end = effective_file_size;
+ let next_path =
+ match workload_log_path_for_shard(&logical_path, &next_shard) {
+ Ok(v) => v,
+ Err(e) => {
+ let resp = make_err_resp(format!("{}", e), Some(file_size));
+ return Ok(serde_json::to_vec(&resp).unwrap());
+ }
+ };
+ match std::fs::metadata(&next_path) {
+ Ok(next_meta) => {
+ effective_file_size = next_meta.len();
+ effective_path = next_path;
+ effective_shard = next_shard;
+ start = 0;
}
Err(e) => {
let resp = make_err_resp(
format!(
- "stat previous log shard failed: path={} err={}",
- prev_path.display(),
+ "stat next log shard failed: path={} err={}",
+ next_path.display(),
e
),
Some(file_size),
@@ -3362,11 +3254,47 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
return Ok(serde_json::to_vec(&resp).unwrap());
}
}
+ } else if let Ok(Some(latest_shard)) =
+ workload_log_latest_shard_identity(&logical_path)
+ {
+ if latest_shard != cursor.shard {
+ let latest_path =
+ match workload_log_path_for_shard(&logical_path, &latest_shard)
+ {
+ Ok(v) => v,
+ Err(e) => {
+ let resp =
+ make_err_resp(format!("{}", e), Some(file_size));
+ return Ok(serde_json::to_vec(&resp).unwrap());
+ }
+ };
+ match std::fs::metadata(&latest_path) {
+ Ok(latest_meta) => {
+ effective_file_size = latest_meta.len();
+ effective_path = latest_path;
+ effective_shard = latest_shard;
+ start = 0;
+ }
+ Err(e) => {
+ let resp = make_err_resp(
+ format!(
+ "stat latest log shard failed: path={} err={}",
+ latest_path.display(),
+ e
+ ),
+ Some(file_size),
+ );
+ return Ok(serde_json::to_vec(&resp).unwrap());
+ }
+ }
+ }
}
}
- let start = match max_bytes {
- Some(max_bytes) => end.saturating_sub(max_bytes),
- None => 0,
+ let end = match max_bytes {
+ Some(max_bytes) => {
+ std::cmp::min(effective_file_size, start.saturating_add(max_bytes))
+ }
+ None => effective_file_size,
};
(
start,
@@ -3382,8 +3310,103 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
effective_path,
effective_file_size,
)
+ } else {
+ let end = file_size;
+ let start = match max_bytes {
+ Some(max_bytes) => end.saturating_sub(max_bytes),
+ None => 0,
+ };
+ (
+ start,
+ end,
+ Some(WorkloadLogCursor {
+ shard: shard.clone(),
+ offset: start,
+ }),
+ Some(WorkloadLogCursor {
+ shard: shard.clone(),
+ offset: end,
+ }),
+ path.clone(),
+ file_size,
+ )
}
- };
+ }
+ LogReadDirection::Backward => {
+ let Some(cursor) = req.cursor.as_ref() else {
+ let resp = make_err_resp(
+ "cursor is required for Backward reads".to_string(),
+ Some(file_size),
+ );
+ return Ok(serde_json::to_vec(&resp).unwrap());
+ };
+ if cursor.offset > file_size {
+ let resp = make_err_resp(
+ format!(
+ "cursor out of range: shard={} cursor={} file_size={}",
+ cursor.shard, cursor.offset, file_size
+ ),
+ Some(file_size),
+ );
+ return Ok(serde_json::to_vec(&resp).unwrap());
+ }
+ let mut effective_path = path.clone();
+ let mut effective_shard = shard.clone();
+ let mut effective_file_size = file_size;
+ let mut end = cursor.offset;
+ if cursor.offset == 0 {
+ if let Ok(Some(prev_shard)) =
+ workload_log_previous_shard(&logical_path, &cursor.shard)
+ {
+ let prev_path =
+ match workload_log_path_for_shard(&logical_path, &prev_shard) {
+ Ok(v) => v,
+ Err(e) => {
+ let resp = make_err_resp(format!("{}", e), Some(file_size));
+ return Ok(serde_json::to_vec(&resp).unwrap());
+ }
+ };
+ match std::fs::metadata(&prev_path) {
+ Ok(prev_meta) => {
+ effective_file_size = prev_meta.len();
+ effective_path = prev_path;
+ effective_shard = prev_shard;
+ end = effective_file_size;
+ }
+ Err(e) => {
+ let resp = make_err_resp(
+ format!(
+ "stat previous log shard failed: path={} err={}",
+ prev_path.display(),
+ e
+ ),
+ Some(file_size),
+ );
+ return Ok(serde_json::to_vec(&resp).unwrap());
+ }
+ }
+ }
+ }
+ let start = match max_bytes {
+ Some(max_bytes) => end.saturating_sub(max_bytes),
+ None => 0,
+ };
+ (
+ start,
+ end,
+ Some(WorkloadLogCursor {
+ shard: effective_shard.clone(),
+ offset: start,
+ }),
+ Some(WorkloadLogCursor {
+ shard: effective_shard.clone(),
+ offset: end,
+ }),
+ effective_path,
+ effective_file_size,
+ )
+ }
+ };
if end < start {
let resp = make_err_resp(
@@ -3416,7 +3439,11 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
Ok(v) => v,
Err(e) => {
let resp = make_err_resp(
- format!("open log failed: path={} err={}", effective_path.display(), e),
+ format!(
+ "open log failed: path={} err={}",
+ effective_path.display(),
+ e
+ ),
Some(effective_file_size),
);
return Ok(serde_json::to_vec(&resp).unwrap());
@@ -3425,7 +3452,11 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
if let Err(e) = std::io::Seek::seek(&mut f, std::io::SeekFrom::Start(start)) {
let resp = make_err_resp(
- format!("seek log failed: path={} err={}", effective_path.display(), e),
+ format!(
+ "seek log failed: path={} err={}",
+ effective_path.display(),
+ e
+ ),
Some(effective_file_size),
);
return Ok(serde_json::to_vec(&resp).unwrap());
@@ -3434,7 +3465,11 @@ impl UserRpcHandler for ReadWorkloadLogChunkHandler {
let mut buf: Vec = vec![0; len];
if let Err(e) = std::io::Read::read_exact(&mut f, &mut buf) {
let resp = make_err_resp(
- format!("read log failed: path={} err={}", effective_path.display(), e),
+ format!(
+ "read log failed: path={} err={}",
+ effective_path.display(),
+ e
+ ),
Some(effective_file_size),
);
return Ok(serde_json::to_vec(&resp).unwrap());
@@ -4067,8 +4102,7 @@ fn desired_workload_matches_running(
&desired.name,
&desired.authority,
Some(workloads.scope_key.as_str()),
- )
- else {
+ ) else {
return false;
};
desired_workload_status_matches_goal(&status, desired)
@@ -14337,14 +14371,14 @@ mod tests {
assert_eq!(scoped_b.len(), 1);
assert_eq!(scoped_b[0].pid(), 22);
- let listed_a = observe_all_selection_statuses_for_snapshot(&snapshot, Some("/tmp/scope-a"))
- .unwrap();
+ let listed_a =
+ observe_all_selection_statuses_for_snapshot(&snapshot, Some("/tmp/scope-a")).unwrap();
assert_eq!(listed_a.len(), 1);
assert_eq!(listed_a[0].label, "DaemonSet/target");
assert_eq!(listed_a[0].pid, Some(11));
- let listed_b = observe_all_selection_statuses_for_snapshot(&snapshot, Some("/tmp/scope-b"))
- .unwrap();
+ let listed_b =
+ observe_all_selection_statuses_for_snapshot(&snapshot, Some("/tmp/scope-b")).unwrap();
assert_eq!(listed_b.len(), 1);
assert_eq!(listed_b[0].label, "DaemonSet/target");
assert_eq!(listed_b[0].pid, Some(22));
@@ -14548,8 +14582,8 @@ mod tests {
zombie_infos: Vec::new(),
};
- let listed = observe_apply_runtime_statuses_for_snapshot("apply-1", &snapshot, None)
- .unwrap();
+ let listed =
+ observe_apply_runtime_statuses_for_snapshot("apply-1", &snapshot, None).unwrap();
assert_eq!(listed.len(), 1);
assert_eq!(listed[0].name.as_deref(), Some("target-present"));
assert!(listed[0].present);
@@ -14774,12 +14808,8 @@ mod tests {
None
));
- let delete_old = workloads.delete_generation(
- WorkloadKind::Deployment,
- &name,
- &name,
- Some("apply-1"),
- );
+ let delete_old =
+ workloads.delete_generation(WorkloadKind::Deployment, &name, &name, Some("apply-1"));
if !delete_old.ok {
let err = delete_old.err.as_deref().unwrap_or_default();
assert!(
@@ -14813,8 +14843,7 @@ mod tests {
delete_current.ok,
"unguarded delete should bind and retire the current visible generation: {delete_current:?}"
);
- wait_for_selection_absent(WorkloadKind::Deployment, &name, &name, Some("apply-2"))
- .unwrap();
+ wait_for_selection_absent(WorkloadKind::Deployment, &name, &name, Some("apply-2")).unwrap();
}
#[test]
@@ -14826,9 +14855,12 @@ mod tests {
python_exe.display()
);
let workdir = tempfile::tempdir().unwrap();
- let runtime =
- SelectionSupervisorRuntime::materialize(workdir.path(), workdir.path(), python_exe.as_path())
- .unwrap();
+ let runtime = SelectionSupervisorRuntime::materialize(
+ workdir.path(),
+ workdir.path(),
+ python_exe.as_path(),
+ )
+ .unwrap();
assert!(runtime.script_path.exists());
assert!(
runtime
@@ -14849,9 +14881,12 @@ mod tests {
python_exe.display()
);
let workdir = tempfile::tempdir().unwrap();
- let runtime =
- SelectionSupervisorRuntime::materialize(workdir.path(), workdir.path(), python_exe.as_path())
- .unwrap();
+ let runtime = SelectionSupervisorRuntime::materialize(
+ workdir.path(),
+ workdir.path(),
+ python_exe.as_path(),
+ )
+ .unwrap();
let log_path = workdir.path().join("startup.log");
let command = vec![
python_exe.display().to_string(),
@@ -14876,7 +14911,9 @@ mod tests {
"--".to_string(),
"/bin/true".to_string(),
];
- let pid = runtime.spawn_detached_command(&log_path, command.as_slice()).unwrap();
+ let pid = runtime
+ .spawn_detached_command(&log_path, command.as_slice())
+ .unwrap();
let deadline = Instant::now() + Duration::from_secs(10);
let expected = "owner-ts-ms must be positive";
let mut saw_expected = false;
@@ -15164,7 +15201,9 @@ mod tests {
}),
max_bytes: Some(65536),
};
- let raw = handler.handle("n1".into(), &serde_json::to_vec(&req).unwrap()).unwrap();
+ let raw = handler
+ .handle("n1".into(), &serde_json::to_vec(&req).unwrap())
+ .unwrap();
let resp: ReadWorkloadLogResp = serde_json::from_slice(&raw).unwrap();
assert!(resp.ok, "{resp:?}");
assert_eq!(resp.text.as_deref(), Some("new\n"));
@@ -15209,7 +15248,9 @@ mod tests {
}),
max_bytes: Some(65536),
};
- let raw = handler.handle("n1".into(), &serde_json::to_vec(&req).unwrap()).unwrap();
+ let raw = handler
+ .handle("n1".into(), &serde_json::to_vec(&req).unwrap())
+ .unwrap();
let resp: ReadWorkloadLogResp = serde_json::from_slice(&raw).unwrap();
assert!(resp.ok, "{resp:?}");
assert_eq!(resp.text.as_deref(), Some("old\n"));
diff --git a/fluxon_rs/fluxon_pyo3/src/error.rs b/fluxon_rs/fluxon_pyo3/src/error.rs
index 97ab680..e153ebc 100644
--- a/fluxon_rs/fluxon_pyo3/src/error.rs
+++ b/fluxon_rs/fluxon_pyo3/src/error.rs
@@ -51,6 +51,26 @@ pub(crate) fn pyerr_message_consumption_no_new_message(
})
}
+pub(crate) fn pyerr_channel_closed(py: Python<'_>, message: &str, channel_id: i64) -> PyErr {
+ build_ext_error(py, "ChannelClosedError", message, |kw| {
+ kw.set_item("channel_id", channel_id).unwrap();
+ })
+}
+
+pub(crate) fn pyerr_producer_closed(
+ py: Python<'_>,
+ message: &str,
+ channel_id: i64,
+ producer_idx: Option<&str>,
+) -> PyErr {
+ build_ext_error(py, "ProducerClosedError", message, |kw| {
+ kw.set_item("channel_id", channel_id).unwrap();
+ if let Some(p) = producer_idx {
+ kw.set_item("producer_idx", p).unwrap();
+ }
+ })
+}
+
pub(crate) fn pyerr_message_consumption(
py: Python<'_>,
message: &str,
@@ -87,6 +107,18 @@ pub(crate) fn pyerr_chan_message_produce(
})
}
+pub(crate) fn pyerr_message_buffer_full(
+ py: Python<'_>,
+ message: &str,
+ channel_id: i64,
+ buffer_size: i64,
+) -> PyErr {
+ build_ext_error(py, "MessageBufferFullError", message, |kw| {
+ kw.set_item("channel_id", channel_id).unwrap();
+ kw.set_item("buffer_size", buffer_size).unwrap();
+ })
+}
+
// System/bridge category constructors (distinct helpers for clarity)
pub(crate) fn pyerr_etcd(py: Python<'_>, message: &str, component: &str) -> PyErr {
build_ext_error(py, "EtcdError", message, |kw| {
@@ -264,10 +296,13 @@ pub(crate) fn new_store_closed_error(py: Python<'_>, message: &str) -> PyObject
pub(crate) fn new_result_success(py: Python<'_>, value: PyObject) -> PyObject {
let api_error_module = py.import_bound("fluxon_py.api_error").unwrap();
let result_class = api_error_module.getattr("Result").unwrap();
- result_class
- .call_method1("new_ok", (value,))
- .unwrap()
- .into()
+ match result_class.call_method1("new_ok", (value,)) {
+ Ok(obj) => obj.into(),
+ Err(err) => {
+ let message = format!("Failed to build Result.new_ok: {}", err);
+ new_result_error(py, new_general_error(py, &message))
+ }
+ }
}
pub(crate) fn new_result_error(py: Python<'_>, error: PyObject) -> PyObject {
diff --git a/fluxon_rs/fluxon_pyo3/src/flatdict_zerocopy.rs b/fluxon_rs/fluxon_pyo3/src/flatdict_zerocopy.rs
index 335f36e..c80e775 100644
--- a/fluxon_rs/fluxon_pyo3/src/flatdict_zerocopy.rs
+++ b/fluxon_rs/fluxon_pyo3/src/flatdict_zerocopy.rs
@@ -1,6 +1,7 @@
-use std::collections::{BTreeMap, BTreeSet};
+use std::collections::{BTreeMap, BTreeSet, HashSet};
use std::os::raw::c_void;
-use std::sync::Arc;
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::{Arc, Mutex};
use fluxon_kv::memholder::kvclient_encode::{
BorrowedFlatKvValueRange, FLAT_KV_TYPE_BOOL, FLAT_KV_TYPE_BYTES, FLAT_KV_TYPE_FLOAT64,
@@ -30,25 +31,90 @@ const DLPACK_USED_CAPSULE_NAME_CSTR: &[u8] = b"used_dltensor\0";
#[derive(Clone)]
pub(crate) enum FlatDictDataOwner {
- OwnedBytes(Arc<[u8]>),
- UserMemHolder(Arc),
- ExternalMemHolder(Arc),
+ OwnedBytes(Arc),
+ UserMemHolder(Arc),
+ ExternalMemHolder(Arc),
}
impl FlatDictDataOwner {
pub(crate) fn from_owned_bytes(bytes: Vec) -> Self {
- Self::OwnedBytes(Arc::<[u8]>::from(bytes))
+ Self::OwnedBytes(Arc::new(FlatDictOwnedBytes {
+ bytes: Arc::<[u8]>::from(bytes),
+ }))
+ }
+
+ pub(crate) fn from_user_memholder(holder: Arc) -> Self {
+ Self::UserMemHolder(Arc::new(FlatDictUserMemHolder { holder }))
+ }
+
+ pub(crate) fn from_external_memholder(holder: Arc) -> Self {
+ Self::ExternalMemHolder(Arc::new(FlatDictExternalMemHolder { holder }))
}
fn bytes(&self) -> &[u8] {
match self {
- Self::OwnedBytes(bytes) => bytes.as_ref(),
- Self::UserMemHolder(holder) => holder.bytes(),
- Self::ExternalMemHolder(holder) => holder.bytes(),
+ Self::OwnedBytes(owner) => owner.bytes.as_ref(),
+ Self::UserMemHolder(owner) => owner.holder.bytes(),
+ Self::ExternalMemHolder(owner) => owner.holder.bytes(),
}
}
}
+pub(crate) type FlatDictCleanup = Box;
+
+#[derive(Clone)]
+pub(crate) struct FlatDictSharedCleanup {
+ state: Arc