Motivation
PR #253 introduces Prefill/Decode disaggregation — a critical production feature that splits inference across separate GPU instances via MORI-IO RDMA. Current test coverage is minimal:
- 1 test file (
test_kv_aggregator.py, 96 lines) covering only KVOutputAggregator
- 0 tests for the core transfer engine (1,624 lines), proxy (372 lines), scheduler integration (344 lines), and async worker plumbing (212 lines)
This issue tracks the plan to add layered test coverage following the same strategy as the plugin-mode CI (#255).
Approach: Layered Testing by Module
L1: CPU Unit Tests (P0 — gate for merge)
Pure logic tests with mocked GPU/RDMA/ZMQ dependencies. Run on ubuntu-latest in < 5 seconds.
tests/disaggregation/
├── test_kv_aggregator.py # Enhance existing
├── test_connector_metadata.py # New
├── test_kv_connector_scheduler.py # New
├── test_proxy.py # New
├── test_transfer_utils.py # New
└── test_scheduler_kv_integration.py # New
Mock strategy: Mock aiter, mori.io, torch.distributed, zmq at sys.modules level.
test_kv_aggregator.py (enhance)
test_connector_metadata.py
test_kv_connector_scheduler.py
test_proxy.py
test_transfer_utils.py
test_scheduler_kv_integration.py
L2: CPU Integration Tests (P0)
| Test |
Description |
| ZMQ handshake roundtrip |
Listener + client threads in-process, verify metadata exchange |
| Service discovery registration |
Simulate proxy ZMQ ROUTER, verify msgpack format and dedup |
| AsyncIOProcManager KV aggregation |
Mock multiple worker KV outputs, verify call_func_with_aggregation |
_pop_done_transfers all-status check |
Bug: current code only checks status_list[-1]. Test with [FAIL, SUCCESS] → should NOT mark done |
| OpenAI server kv_params roundtrip |
Request with kv_transfer_params → response contains output |
| Proxy prefill→decode read-mode flow |
Simulate: prefill response → extract block metadata → decode request |
L3: GPU Tests (P1 — design only)
| Test |
Env |
Description |
register_kv_caches RDMA metadata |
1 GPU |
Real KV tensors → verify RDMA metadata non-null |
| MoRIIO wrapper tensor registration |
1 GPU |
CUDA tensor → packed metadata valid |
| Single-node loopback transfer |
2+ GPU |
Producer → consumer RDMA read, verify data match |
| E2E proxy+prefill+decode |
8 GPU |
Full 3-process inference |
| Multi-request concurrent |
8 GPU |
Concurrent P/D pipeline |
Known Bugs to Cover
_pop_done_transfers only checks status_list[-1] — should check ALL statuses
start_load_kv busy-wait — while need_handshake: continue burns CPU
- Proxy 600-hour timeout —
aiohttp.ClientTimeout(total=6*6000*6000) should be configurable
Estimated Effort
| Layer |
Files |
Test Cases |
Lines (est.) |
| L1 |
6 |
~55 |
~800 |
| L2 |
1 |
~6 |
~300 |
| L3 |
design only |
~5 |
~200 |
| Total |
7 |
~66 |
~1,300 |
CI Integration
Add to existing workflow or new atom-pd-test.yaml:
pd-unit-tests:
name: PD Disaggregation Unit Tests (CPU)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: "3.12"
- run: pip install pytest msgpack msgspec numpy aiohttp quart
- run: pip install torch --index-url https://download.pytorch.org/whl/cpu
- run: pytest tests/disaggregation/ -v --tb=short
Reference
- Design doc:
docs/plans/2026-03-04-pd-disaggregation-test-coverage-design.md
- Related: PR #253, Issue #255
Motivation
PR #253 introduces Prefill/Decode disaggregation — a critical production feature that splits inference across separate GPU instances via MORI-IO RDMA. Current test coverage is minimal:
test_kv_aggregator.py, 96 lines) covering onlyKVOutputAggregatorThis issue tracks the plan to add layered test coverage following the same strategy as the plugin-mode CI (#255).
Approach: Layered Testing by Module
L1: CPU Unit Tests (P0 — gate for merge)
Pure logic tests with mocked GPU/RDMA/ZMQ dependencies. Run on
ubuntu-latestin < 5 seconds.Mock strategy: Mock
aiter,mori.io,torch.distributed,zmqatsys.moduleslevel.test_kv_aggregator.py (enhance)
reset()clears pendingworld_size <= 0raises ValueErrortest_connector_metadata.py
add_new_req_to_recvbuilds correct ReqMeta from kv_transfer_paramsadd_new_req_to_savebuilds correct ReqMetarequest_id_to_transfer_idmapping passthroughtest_kv_connector_scheduler.py
get_num_new_matched_tokensreturns(prompt_len, True)fordo_remote_prefill(0, False)—kv_async_taggedidempotent(0, False)update_state_after_allocconsumer: queues req, sets transfer_id mappingupdate_state_after_allocproducer: does NOT queuedo_remote_prefillflag cleared after processingbuild_connector_metadrains pending queue into metadatabuild_connector_metaon empty queue → no crashrequest_finishedproducer output contains block_table, engine_id, host, portrequest_finishedconsumer cleans up transfer_id mappingtest_proxy.py
_append_whole_dict_uniquededuplicatesindexfield_extract_ip_porton valid URL_extract_ip_porton invalid URL raises ValueErrormax_tokens=1andstream=Falsetest_transfer_utils.py
convert_virtual_to_physical_pagesdefault 16→1 expansionmerge_contiguous_blocks— all contiguous → 1 merged_compute_block_transfer_offsetsMHA (5D) vs MLA (3D)make_zmq_pathIPv4, IPv6, no-portRoleManagersingleton + thread safetyset_role/get_roleround-tripget_port_offsetformula:dp_rank * tp_size + tp_ranktest_scheduler_kv_integration.py
Nonekv_connector_output → no crashconnector_meta_outputattached to ScheduledBatchL2: CPU Integration Tests (P0)
call_func_with_aggregation_pop_done_transfersall-status checkstatus_list[-1]. Test with[FAIL, SUCCESS]→ should NOT mark donekv_transfer_params→ response contains outputL3: GPU Tests (P1 — design only)
register_kv_cachesRDMA metadataKnown Bugs to Cover
_pop_done_transfersonly checksstatus_list[-1]— should check ALL statusesstart_load_kvbusy-wait —while need_handshake: continueburns CPUaiohttp.ClientTimeout(total=6*6000*6000)should be configurableEstimated Effort
CI Integration
Add to existing workflow or new
atom-pd-test.yaml:Reference
docs/plans/2026-03-04-pd-disaggregation-test-coverage-design.md