docs(vllm): DfkvStoreConnector — README + deploy guide + config reference#47
Merged
Conversation
…ide + config reference The merged vLLM connector (PR dingodb#46) had only a terse integration/vllm/README and no deploy doc. Add complete docs reflecting the shipped code: - README.md: add the vLLM connector to Engine integrations + Layout + bump the test count (53 -> 88 ctest entries, add the RDMA datapath CI job). - integration/vllm/README.md: complete the env-var table (incl. the critical PYTHONHASHSEED=0 for cross-process/restart key determinism), the full kv_connector_extra_config keys with defaults (load_async, enable_cross_layers_blocks, lookup_rpc_port), a geometry guard for shared pools, and the SG + JIT notes. - docs/vllm/DEPLOY.md (new): end-to-end deploy (build -> dfkv cluster -> connector -> vLLM -> verify) with a full config reference and per-scenario recommended settings (single/multi-DP/shared-pool/long-context), geometry guard, measured results, and a troubleshooting table. Mirrors docs/lmcache/DEPLOY.md.
…iCache, LMCache, and vLLM The 'distributed KV cache for SGLang HiCache' title undersold the repo now that it backs three engines. Lead with LLM inference + list the three adapters. Also fix the now-contradictory 'without ... MDS ... dependency' line (dfkv ships its own dfkv_mds).
…pth claim - §5: correct the DFKV_RDMA_DEPTH note — depth is throughput-flat (2026-06 benchmark), not a write-bandwidth booster; the lever is batch_concurrency + fewer/larger keys. - §9 (new): which post-dingodb#46 features do NOT apply to the HiCache/MLA path and why (SG = nothing to coalesce for one-object-per-page; io_uring = flat on single disk; depth = flat). Plus: vLLM and HiCache instances of the SAME model can share the dfkv cluster/ring but do NOT reuse each other's KV (different key schemes + KV layouts) — share nodes/capacity, isolate keyspace via distinct model_hash/name.
ketor
added a commit
that referenced
this pull request
Jun 19, 2026
New direct vLLM integration + scatter-gather datapath since v1.5.2: - vLLM DfkvStoreConnector (KVConnectorBase_V1, GPUDirect RDMA, bypass LMCache) — #46 - Scatter-gather batch API (batch_put_sg/batch_get_auto_sg, QP max_sge 2->30): one multi-SGE RDMA per chunk, ~20x fewer keys/disk-reads — #46 - io_uring async GET serve loop (opt-in DFKV_SERVER_URING, default off) — #46 - 7 fresh-eyes review fixes (per-item SG failure, recv-thread hardening, empty-key skip, io_uring EINTR/short-read, true out_lens) + 2 regression tests — #46 - Docs: vLLM deploy guide + config reference, README multi-engine, HiCache boundary — #47 No wire change (kProtoVersion still 1); v1.5.x compatible. CI green incl. TSan + RDMA datapath.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #46 (the vLLM connector merge), which shipped with only a terse
integration/vllm/READMEand no deploy doc.What
PYTHONHASHSEED=0(cross-process / cross-restart key determinism; the kv_store: O_DIRECT for all block I/O (fallocate+write, aligned read+t… #1 'writes succeed but reads never hit' misconfig) — plus the fullkv_connector_extra_configkeys with defaults (load_async,enable_cross_layers_blocks,lookup_rpc_port), a geometry guard for shared pools, and the SG + first-request-JIT notes.docs/lmcache/DEPLOY.md.Accuracy
All params verified against the merged source: env vars read by
libdfkv.so+ the connector,extra_configdefaults read fromconnector.py/scheduler.py/worker.py, server flags fromdfkv_server_main.cc, and the perf/JIT/depth-flat findings from the on-hardware validation. Docs-only; no code change.