Skip to content

Feature/offloadrpc#28

Open
yyyuanhao426-hash wants to merge 4 commits into
zchuango:supercache_devfrom
yyyuanhao426-hash:feature/offloadrpc
Open

Feature/offloadrpc#28
yyyuanhao426-hash wants to merge 4 commits into
zchuango:supercache_devfrom
yyyuanhao426-hash:feature/offloadrpc

Conversation

@yyyuanhao426-hash

@yyyuanhao426-hash yyyuanhao426-hash commented Jun 24, 2026

Copy link
Copy Markdown

Description

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • Mooncake EP (mooncake-ep)
  • Mooncake PG (mooncake-pg)
  • Integration (mooncake-integration)
  • P2P Store (mooncake-p2p-store)
  • Python Wheel (mooncake-wheel)
  • Common (mooncake-common)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Breaking change
  • Documentation update
  • Performance improvement
  • Other

How Has This Been Tested?

Test commands:

# Example: bash scripts/run_ci_test.sh

Test results:

  • Unit tests pass
  • Integration tests pass (if applicable)
  • Manual testing done (describe below)

Checklist

  • I have performed a self-review of my own code
  • I have formatted my code using ./scripts/code_format.sh
  • I have run pre-commit run --all-files and all hooks pass
  • I have updated the documentation (if applicable)
  • I have added tests to prove my changes are effective
  • For changes >500 LOC: I have filed an RFC issue

AI Assistance Disclosure

  • No AI tools were used
  • AI tools were used (specify below)

Saturday added 4 commits June 23, 2026 11:12
引入推送模式卸载(push-mode offload),数据持有者直接通过单边 WRITE 将数据写入请求方内存,取代原有的拉取模式(获取指针 + READ + 释放缓冲区)的三步序列。新增 RPC 方法 `batch_get_offload_object_push`、相关数据结构及传输任务,并通过环境变量 `MC_OFFLOAD_PUSH` 控制启用。此优化减少了 RPC 往返次数,提升了卸载性能。
在mooncake_perf_points.def中定义三个新的性能监控点:
- GET_SSD_OWNER_READ:所有者端SSD读取操作
- GET_SSD_OWNER_PUSH_WRITE:所有者端推送写入操作
- GET_SSD_OWNER_RELEASE:所有者端缓冲区释放操作

在real_client.cpp中相应位置添加性能监控代码,覆盖以下场景:
1. 拉取路径的批量获取(batch_get_offload_object)
2. 推送路径的批量获取和写入(batch_get_offload_object_push)
3. 请求端触发的缓冲区释放(release_offload_buffer)

这些监控点将帮助分析所有者端在SSD卸载操作中的性能表现。
将 GET_SSD_OWNER_READ 和 GET_SSD_OWNER_RELEASE 性能点的测量位置从 real_client.cpp 移至 file_storage.cpp 的实现内部。同时,在 FileStorage::BatchGet 中新增 GET_SSD_OWNER_ALLOC 和 GET_SSD_OWNER_LOAD 性能点,以细分 SSD 读取路径中的缓冲区分配和磁盘加载阶段。

这使得性能测量更精确地反映实际耗时操作,避免了在协程调度器中测量可能引入的误差,并确保了 push 和 pull 路径测量的一致性。
- 在 mooncake_perf_points.def 中添加三个性能监控点用于跟踪 owner 端磁盘加载各阶段耗时
- 在 storage_backend.cpp 中实现性能监控,分别测量读取计划构建、io_uring 读取和 POSIX 读取时间
- 新增详细设计文档 offload-push-mode.md,描述 push 模式架构、实现细节和性能优势
- push 模式通过环境变量 MC_OFFLOAD_PUSH 控制,可将 offload 读取从 3 次 RPC 减少为 1 次
- 对端 owner 主动执行 URMA write 将数据推送到请求方内存,减少网络往返延迟
@github-actions github-actions Bot added documentation Improvements or additions to documentation run-ci store Integration labels Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Integration run-ci store

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant