[TRTLLM-11878][feat] Gen-only sync KV transfer for dis-agg by Shixiaowei02 · Pull Request #12882 · NVIDIA/TensorRT-LLM

Shixiaowei02 · 2026-04-09T06:55:29Z

Summary by CodeRabbit

New Features
- Implemented synchronous KV cache transfer mechanism for disaggregated serving, enabling blocking KV slice reception with optional auxiliary transfers.
Tests
- Added integration tests for disaggregated serving with synchronous transfers, cache manager v2, and asymmetric parallel topologies.
- Extended unit test coverage for synchronous cache transfer workflows.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Shixiaowei02 · 2026-04-09T06:57:34Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-04-09T07:03:19Z

PR_Github #42493 [ run ] triggered by Bot. Commit: d1ef1f0 Link to invocation

Shixiaowei02 · 2026-04-09T07:20:46Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-04-09T07:28:38Z

PR_Github #42497 [ run ] triggered by Bot. Commit: a2b012d Link to invocation

tensorrt-cicd · 2026-04-09T07:28:42Z

PR_Github #42493 [ run ] completed with state ABORTED. Commit: d1ef1f0

Link to invocation

coderabbitai · 2026-04-09T07:57:09Z

📝 Walkthrough

Walkthrough

The changes implement synchronous KV cache transfer in KvCacheTransceiverV2.request_and_receive_sync(), previously a stub. Three new integration tests cover sync transfer and KV cache manager v2 configurations. Unit tests are extended with a synchronous workflow variant. Test lists are updated to run the new tests.

Changes

Cohort / File(s)	Summary
Core Implementation `tensorrt_llm/_torch/disaggregation/transceiver.py`	Implemented `request_and_receive_sync()` with synchronous KV cache reception workflow: generates request ID, validates no duplicate RX session, manages request state transitions (`IN_PROGRESS` → `COMPLETED` or `ERROR`), creates/stores RX session and request mapping, waits for completion with blocking semantics, applies auxiliary transfers on success, and cleans up session/request entries.
Integration Tests `tests/integration/defs/accuracy/test_disaggregated_serving.py`	Added three new test methods: `test_gen_only_sync` (validates sync KV transfer with `TRTLLM_DISABLE_KV_CACHE_TRANSFER_OVERLAP`), `test_kv_cache_manager_v2` (tests KV cache manager v2 on both servers), and `test_kv_cache_manager_v2_ctx_tp2pp2_gen_tp4` (asymmetric topology with manager v2). All use Python transceivers and GSM8K dataset.
Unit Tests `tests/unittest/disaggregated/test_py_cache_transceiver_mp.py`	Extended workflow dispatch to handle new `"ctx_first_sync"` variant. Added `_run_ctx_first_sync_transfer()` function that executes context-first transfers with GEN side using synchronous `request_and_receive_sync()` while CTX side submits asynchronously. Updated test parameterization to include sync workflow.
Test Lists `tests/integration/test_lists/test-db/l0_dgx_h100.yml`, `tests/integration/test_lists/test-db/l0_dgx_h200.yml`	Added test selections for new disaggregated serving test cases: `test_gen_only_sync` and `test_kv_cache_manager_v2` on H100; `test_kv_cache_manager_v2_ctx_tp2pp2_gen_tp4` on H200.

Sequence Diagram(s)

sequenceDiagram
    participant GEN as GEN Side
    participant Transceiver as KvCacheTransceiverV2
    participant RXSession as RX Session
    participant CTX as CTX Side

    GEN->>Transceiver: request_and_receive_sync(req)
    activate Transceiver
    Transceiver->>Transceiver: compute request_id
    Transceiver->>Transceiver: check _recv_sessions<br/>(prevent duplicates)
    Transceiver->>Transceiver: set state to IN_PROGRESS
    Transceiver->>RXSession: create RX session
    activate RXSession
    Transceiver->>Transceiver: store mapping:<br/>request_id → RXSession
    Transceiver->>RXSession: submit KV slice for reception
    RXSession->>CTX: receive KV data
    activate CTX
    CTX-->>RXSession: KV data transfer
    deactivate CTX
    Transceiver->>RXSession: wait_complete(blocking=True)
    RXSession-->>Transceiver: WaitResult.COMPLETED/<br/>WaitResult.ERROR
    deactivate RXSession
    
    alt WaitResult.COMPLETED
        Transceiver->>Transceiver: _apply_aux() if needed
        Transceiver->>Transceiver: set state to COMPLETED
    else WaitResult.ERROR
        Transceiver->>Transceiver: set state to TRANS_ERROR
    end
    
    Transceiver->>Transceiver: close RX session
    Transceiver->>Transceiver: remove from _recv_sessions
    Transceiver->>Transceiver: remove from _recv_reqs
    Transceiver-->>GEN: return
    deactivate Transceiver

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is empty; only the template with comments and a checked PR checklist item are present, with no actual description content filling the required sections.	Add a Description section explaining the feature, a Test Coverage section listing the added tests, and ensure the PR checklist is properly reviewed and marked.
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: implementing synchronous KV cache transfer for the generation-only disaggregated serving mode.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/disaggregation/transceiver.py`:
- Around line 322-337: The RX flow around create_rx_session / session.receive /
session.wait_complete can throw and currently leaks entries in _recv_sessions
and _recv_reqs and may leave req.state inconsistent; wrap the session lifecycle
in a try/finally (or try/except/finally) so that session.close() is always
called and the rid entries are always deleted from _recv_sessions and
_recv_reqs, and ensure req.state is set to DISAGG_TRANS_ERROR on exceptions;
specifically modify the code around create_rx_session, session.receive,
session.wait_complete, _apply_aux and the existing state assignments to
guarantee cleanup and error-state assignment even when an exception is raised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d6023a3-741a-4f96-a0b0-96ffa3597667

📥 Commits

Reviewing files that changed from the base of the PR and between ce71620 and a2b012d.

📒 Files selected for processing (5)

tensorrt_llm/_torch/disaggregation/transceiver.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/test_lists/test-db/l0_dgx_h100.yml
tests/integration/test_lists/test-db/l0_dgx_h200.yml
tests/unittest/disaggregated/test_py_cache_transceiver_mp.py

tensorrt_llm/_torch/disaggregation/transceiver.py

Shixiaowei02 · 2026-04-09T12:06:38Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-04-09T12:13:30Z

PR_Github #42536 [ run ] triggered by Bot. Commit: 58a431e Link to invocation

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

Shixiaowei02 · 2026-04-10T02:51:44Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-04-10T02:57:51Z

PR_Github #42629 [ run ] triggered by Bot. Commit: 281a292 Link to invocation

github-actions bot assigned Shixiaowei02 Apr 9, 2026

Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch from f103a3c to d1ef1f0 Compare April 9, 2026 06:56

Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch 3 times, most recently from 933883a to a2b012d Compare April 9, 2026 07:20

Shixiaowei02 marked this pull request as ready for review April 9, 2026 07:52

Shixiaowei02 requested review from a team as code owners April 9, 2026 07:52

Shixiaowei02 requested review from chuangz0, dongxuy04, leslie-fang25, lfr-0531, pcastonguay and qiaoxj07 April 9, 2026 07:52

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

tensorrt_llm/_torch/disaggregation/transceiver.py Outdated Show resolved Hide resolved

Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch 2 times, most recently from bf7c600 to 58a431e Compare April 9, 2026 12:06

pcastonguay approved these changes Apr 9, 2026

View reviewed changes

Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch from 58a431e to 9d37e3e Compare April 10, 2026 02:51

Gen-only sync KV transfer for dis-agg

281a292

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch from 9d37e3e to 281a292 Compare April 10, 2026 02:51

Conversation

Shixiaowei02 commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Shixiaowei02 commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

Shixiaowei02 commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Shixiaowei02 commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

Shixiaowei02 commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Shixiaowei02 commented Apr 9, 2026 •

edited by coderabbitai bot

Loading