Skip to content

[TRTLLM-11878][feat] Gen-only sync KV transfer for dis-agg#12882

Open
Shixiaowei02 wants to merge 1 commit intoNVIDIA:mainfrom
Shixiaowei02:user/xiaoweis/ctx-only
Open

[TRTLLM-11878][feat] Gen-only sync KV transfer for dis-agg#12882
Shixiaowei02 wants to merge 1 commit intoNVIDIA:mainfrom
Shixiaowei02:user/xiaoweis/ctx-only

Conversation

@Shixiaowei02
Copy link
Copy Markdown
Collaborator

@Shixiaowei02 Shixiaowei02 commented Apr 9, 2026

Summary by CodeRabbit

  • New Features

    • Implemented synchronous KV cache transfer mechanism for disaggregated serving, enabling blocking KV slice reception with optional auxiliary transfers.
  • Tests

    • Added integration tests for disaggregated serving with synchronous transfers, cache manager v2, and asymmetric parallel topologies.
    • Extended unit test coverage for synchronous cache transfer workflows.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@Shixiaowei02
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42493 [ run ] triggered by Bot. Commit: d1ef1f0 Link to invocation

@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch 3 times, most recently from 933883a to a2b012d Compare April 9, 2026 07:20
@Shixiaowei02
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42497 [ run ] triggered by Bot. Commit: a2b012d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42493 [ run ] completed with state ABORTED. Commit: d1ef1f0

Link to invocation

@Shixiaowei02 Shixiaowei02 marked this pull request as ready for review April 9, 2026 07:52
@Shixiaowei02 Shixiaowei02 requested review from a team as code owners April 9, 2026 07:52
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

The changes implement synchronous KV cache transfer in KvCacheTransceiverV2.request_and_receive_sync(), previously a stub. Three new integration tests cover sync transfer and KV cache manager v2 configurations. Unit tests are extended with a synchronous workflow variant. Test lists are updated to run the new tests.

Changes

Cohort / File(s) Summary
Core Implementation
tensorrt_llm/_torch/disaggregation/transceiver.py
Implemented request_and_receive_sync() with synchronous KV cache reception workflow: generates request ID, validates no duplicate RX session, manages request state transitions (IN_PROGRESSCOMPLETED or ERROR), creates/stores RX session and request mapping, waits for completion with blocking semantics, applies auxiliary transfers on success, and cleans up session/request entries.
Integration Tests
tests/integration/defs/accuracy/test_disaggregated_serving.py
Added three new test methods: test_gen_only_sync (validates sync KV transfer with TRTLLM_DISABLE_KV_CACHE_TRANSFER_OVERLAP), test_kv_cache_manager_v2 (tests KV cache manager v2 on both servers), and test_kv_cache_manager_v2_ctx_tp2pp2_gen_tp4 (asymmetric topology with manager v2). All use Python transceivers and GSM8K dataset.
Unit Tests
tests/unittest/disaggregated/test_py_cache_transceiver_mp.py
Extended workflow dispatch to handle new "ctx_first_sync" variant. Added _run_ctx_first_sync_transfer() function that executes context-first transfers with GEN side using synchronous request_and_receive_sync() while CTX side submits asynchronously. Updated test parameterization to include sync workflow.
Test Lists
tests/integration/test_lists/test-db/l0_dgx_h100.yml, tests/integration/test_lists/test-db/l0_dgx_h200.yml
Added test selections for new disaggregated serving test cases: test_gen_only_sync and test_kv_cache_manager_v2 on H100; test_kv_cache_manager_v2_ctx_tp2pp2_gen_tp4 on H200.

Sequence Diagram(s)

sequenceDiagram
    participant GEN as GEN Side
    participant Transceiver as KvCacheTransceiverV2
    participant RXSession as RX Session
    participant CTX as CTX Side

    GEN->>Transceiver: request_and_receive_sync(req)
    activate Transceiver
    Transceiver->>Transceiver: compute request_id
    Transceiver->>Transceiver: check _recv_sessions<br/>(prevent duplicates)
    Transceiver->>Transceiver: set state to IN_PROGRESS
    Transceiver->>RXSession: create RX session
    activate RXSession
    Transceiver->>Transceiver: store mapping:<br/>request_id → RXSession
    Transceiver->>RXSession: submit KV slice for reception
    RXSession->>CTX: receive KV data
    activate CTX
    CTX-->>RXSession: KV data transfer
    deactivate CTX
    Transceiver->>RXSession: wait_complete(blocking=True)
    RXSession-->>Transceiver: WaitResult.COMPLETED/<br/>WaitResult.ERROR
    deactivate RXSession
    
    alt WaitResult.COMPLETED
        Transceiver->>Transceiver: _apply_aux() if needed
        Transceiver->>Transceiver: set state to COMPLETED
    else WaitResult.ERROR
        Transceiver->>Transceiver: set state to TRANS_ERROR
    end
    
    Transceiver->>Transceiver: close RX session
    Transceiver->>Transceiver: remove from _recv_sessions
    Transceiver->>Transceiver: remove from _recv_reqs
    Transceiver-->>GEN: return
    deactivate Transceiver
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is empty; only the template with comments and a checked PR checklist item are present, with no actual description content filling the required sections. Add a Description section explaining the feature, a Test Coverage section listing the added tests, and ensure the PR checklist is properly reviewed and marked.
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing synchronous KV cache transfer for the generation-only disaggregated serving mode.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/disaggregation/transceiver.py`:
- Around line 322-337: The RX flow around create_rx_session / session.receive /
session.wait_complete can throw and currently leaks entries in _recv_sessions
and _recv_reqs and may leave req.state inconsistent; wrap the session lifecycle
in a try/finally (or try/except/finally) so that session.close() is always
called and the rid entries are always deleted from _recv_sessions and
_recv_reqs, and ensure req.state is set to DISAGG_TRANS_ERROR on exceptions;
specifically modify the code around create_rx_session, session.receive,
session.wait_complete, _apply_aux and the existing state assignments to
guarantee cleanup and error-state assignment even when an exception is raised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d6023a3-741a-4f96-a0b0-96ffa3597667

📥 Commits

Reviewing files that changed from the base of the PR and between ce71620 and a2b012d.

📒 Files selected for processing (5)
  • tensorrt_llm/_torch/disaggregation/transceiver.py
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • tests/integration/test_lists/test-db/l0_dgx_h100.yml
  • tests/integration/test_lists/test-db/l0_dgx_h200.yml
  • tests/unittest/disaggregated/test_py_cache_transceiver_mp.py

@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch 2 times, most recently from bf7c600 to 58a431e Compare April 9, 2026 12:06
@Shixiaowei02
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42536 [ run ] triggered by Bot. Commit: 58a431e Link to invocation

@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch from 58a431e to 9d37e3e Compare April 10, 2026 02:51
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/ctx-only branch from 9d37e3e to 281a292 Compare April 10, 2026 02:51
@Shixiaowei02
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42629 [ run ] triggered by Bot. Commit: 281a292 Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants