Skip to content

Source side remote view transport#163

Draft
zhou-yuhan wants to merge 12 commits into
mainfrom
yuhan/source-side-remote-view-transport-merged
Draft

Source side remote view transport#163
zhou-yuhan wants to merge 12 commits into
mainfrom
yuhan/source-side-remote-view-transport-merged

Conversation

@zhou-yuhan
Copy link
Copy Markdown
Collaborator

Motivation

Remote TP-sliced loads/updates were still hitting the wrong slow path: when a requested
view_id was not already routable, the destination daemon fell back to canonical transport
and reconstructed the TP slice locally. That preserved correctness, but caused destination-
side read amplification, strided repack, and repeated reconstruction across daemons.

We also found lifecycle gaps in the first source-side upgrade path:

  • derived view exports were not managed as bounded ephemeral cache entries;
  • repeated multi-version workloads could accumulate exports on the source daemon;
  • retirement/drain semantics were ad hoc;
  • source-side upgrade waiting inherited an internal timeout instead of the caller deadline.

What this PR does

Suppose daemon B wants to fetch tensor views stored on daemon A.

  • Implement source-side remote view transport so daemon B can fetch dense view bytes prepared
    by daemon A, instead of reconstructing from canonical bytes locally.
  • Add daemon-owned lifecycle management for derived view exports:
    • cache keying by (artifact_id, view_id, device)
    • pending/ready/draining state
    • TTL-scoped reuse
    • pressure-aware eviction
    • ordered retirement
  • Add BeginReplicaFetch / EndReplicaFetch so the source daemon tracks real data-plane use:
    • TTL refresh on actual use only
    • active_fetches protection for in-flight transfers
    • drain waits for real relay fetches to finish
  • Keep canonical fallback as the compatibility / resource-exhaustion backstop.
  • Rework request-budget semantics so source-side upgrade waiting follows the caller deadline,
    not pinned-allocation timeout.
  • Add route-kind / fallback-reason observability for create, reuse, refresh, drain, eviction,
    and fallback paths.

Test results

Unit tests

Added focused tests for the new lifecycle manager:

  • cache keying + reuse for identical (artifact_id, view_id, device)
  • TTL refresh on data-plane use only
  • eviction ordering: expired idle first, then oldest idle non-expired, never active/pending

SGLang integration

Validated with the remote relay benchmark harness on two workers.

Remote load (load_weight_remote, tensorcast relay):

  • qwen3-14b / qwen3-32b
  • tp=1,2,4
  • trial=3
  • all cases passed

Examples:

  • qwen3-14b tp=4: 26s -> 5s -> 5s
  • qwen3-32b tp=4: 23s -> 11s -> 12s

Remote update (update_weight_remote, tensorcast relay):

  • qwen3-14b / qwen3-32b
  • tp=1,2,4
  • trial=3
  • all cases passed

Examples:

  • qwen3-32b tp=2: 40s / 34s / 38s
  • qwen3-32b tp=4: 44s / 43s / 43s

Summary

This PR makes source-side remote view transport production-safe:

  • source daemon prepares and serves dense views;
  • destination daemon stops doing canonical reconstruction on the routed path;
  • derived views are reusable but bounded ephemeral cache entries;
  • lifecycle, timeout, and fallback semantics are explicit and observable.

@zhou-yuhan zhou-yuhan requested a review from wolegechu March 19, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant