Skip to content

Ub transport store dev#4

Open
zchuango wants to merge 20 commits into
mainfrom
ub_transport_store_dev
Open

Ub transport store dev#4
zchuango wants to merge 20 commits into
mainfrom
ub_transport_store_dev

Conversation

@zchuango

@zchuango zchuango commented May 23, 2026

Copy link
Copy Markdown
Owner

Description

This PR enables the UB (Ultra Band) transport for Mooncake Store, along with several bug fixes and improvements for the UB transport layer in the transfer engine.

Key changes:

  1. UB Transport support in Mooncake Store: Added "ub" protocol handling in the store's client service, real client, and utility functions. This includes:

    • UB protocol initialization in Client::InitTransferEngine with device name discovery
    • UbSegmentDeleter for proper cleanup of UB-allocated segments
    • Integration with ub_allocate_memory / ub_free_memory in the store's memory allocation path
  2. UB Allocator: Introduced a new ub_allocator module (ub_allocator.h / ub_allocator.cpp) that provides:

    • ub_allocate_memory() — allocates NUMA-local memory via numa_alloc_local and tracks the allocation range
    • ub_free_memory() — frees NUMA-allocated memory and removes the tracked range
    • ub_is_store_memory() — checks if a given address range overlaps with any tracked UB store memory region
  3. UrmaEndpoint bug fixes and improvements:

    • Fixed polling logic: moved slices[i] = slice assignment before the success check so that failed slices are also properly returned to the caller
    • Added support for URMA_PORT_ACTIVE_DEFER port state in device open
    • Added error logging when retrieveRemoteSeg fails and when a jetty is not imported
    • Fixed endpoint deletion order: only delete the endpoint after retry count exceeds the maximum (previously it was deleted prematurely on first failure)
    • Set rjetty.tp_type and rjetty.flag fields during connection setup
  4. Build & packaging updates:

    • Updated urma dependency to v25.12.0.B081
    • Added ub_allocator.cpp to the UB sources in CMake
    • Added ub_allocator.h to the install headers
    • Added -lurma linkage in p2p-store build script
    • Excluded liburma.so* from auditwheel repair in the wheel build script

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • P2P Store (mooncake-p2p-store)
  • Mooncake EP (mooncake-ep)
  • Integration (mooncake-integration)
  • Python Wheel (mooncake-wheel)
  • PyTorch Backend (mooncake-pg)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • New feature
  • Bug fix
  • Refactor
  • Breaking change
  • Documentation update
  • Other

How Has This Been Tested?

  • Verified UB protocol initialization and segment registration in Mooncake Store with UB transport
  • Tested ub_allocate_memory / ub_free_memory allocation and deallocation paths
  • Validated ub_is_store_memory range overlap detection logic
  • Confirmed UrmaEndpoint polling correctly returns both successful and failed slices
  • Verified build succeeds with UB transport enabled (including CMake and wheel packaging)

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant