[TRTLLM-11878][feat] Gen-only sync KV transfer for dis-agg#12882
[TRTLLM-11878][feat] Gen-only sync KV transfer for dis-agg#12882Shixiaowei02 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
f103a3c to
d1ef1f0
Compare
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #42493 [ run ] triggered by Bot. Commit: |
933883a to
a2b012d
Compare
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #42497 [ run ] triggered by Bot. Commit: |
|
PR_Github #42493 [ run ] completed with state |
📝 WalkthroughWalkthroughThe changes implement synchronous KV cache transfer in Changes
Sequence Diagram(s)sequenceDiagram
participant GEN as GEN Side
participant Transceiver as KvCacheTransceiverV2
participant RXSession as RX Session
participant CTX as CTX Side
GEN->>Transceiver: request_and_receive_sync(req)
activate Transceiver
Transceiver->>Transceiver: compute request_id
Transceiver->>Transceiver: check _recv_sessions<br/>(prevent duplicates)
Transceiver->>Transceiver: set state to IN_PROGRESS
Transceiver->>RXSession: create RX session
activate RXSession
Transceiver->>Transceiver: store mapping:<br/>request_id → RXSession
Transceiver->>RXSession: submit KV slice for reception
RXSession->>CTX: receive KV data
activate CTX
CTX-->>RXSession: KV data transfer
deactivate CTX
Transceiver->>RXSession: wait_complete(blocking=True)
RXSession-->>Transceiver: WaitResult.COMPLETED/<br/>WaitResult.ERROR
deactivate RXSession
alt WaitResult.COMPLETED
Transceiver->>Transceiver: _apply_aux() if needed
Transceiver->>Transceiver: set state to COMPLETED
else WaitResult.ERROR
Transceiver->>Transceiver: set state to TRANS_ERROR
end
Transceiver->>Transceiver: close RX session
Transceiver->>Transceiver: remove from _recv_sessions
Transceiver->>Transceiver: remove from _recv_reqs
Transceiver-->>GEN: return
deactivate Transceiver
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/disaggregation/transceiver.py`:
- Around line 322-337: The RX flow around create_rx_session / session.receive /
session.wait_complete can throw and currently leaks entries in _recv_sessions
and _recv_reqs and may leave req.state inconsistent; wrap the session lifecycle
in a try/finally (or try/except/finally) so that session.close() is always
called and the rid entries are always deleted from _recv_sessions and
_recv_reqs, and ensure req.state is set to DISAGG_TRANS_ERROR on exceptions;
specifically modify the code around create_rx_session, session.receive,
session.wait_complete, _apply_aux and the existing state assignments to
guarantee cleanup and error-state assignment even when an exception is raised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4d6023a3-741a-4f96-a0b0-96ffa3597667
📒 Files selected for processing (5)
tensorrt_llm/_torch/disaggregation/transceiver.pytests/integration/defs/accuracy/test_disaggregated_serving.pytests/integration/test_lists/test-db/l0_dgx_h100.ymltests/integration/test_lists/test-db/l0_dgx_h200.ymltests/unittest/disaggregated/test_py_cache_transceiver_mp.py
bf7c600 to
58a431e
Compare
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #42536 [ run ] triggered by Bot. Commit: |
58a431e to
9d37e3e
Compare
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
9d37e3e to
281a292
Compare
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #42629 [ run ] triggered by Bot. Commit: |
Summary by CodeRabbit
New Features
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.