Skip to content

docs(vllm): DfkvStoreConnector — README + deploy guide + config reference#47

Merged
ketor merged 3 commits into
dingodb:mainfrom
ketor:docs/vllm-connector
Jun 19, 2026
Merged

docs(vllm): DfkvStoreConnector — README + deploy guide + config reference#47
ketor merged 3 commits into
dingodb:mainfrom
ketor:docs/vllm-connector

Conversation

@ketor

@ketor ketor commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #46 (the vLLM connector merge), which shipped with only a terse integration/vllm/README and no deploy doc.

What

  • README.md: add the vLLM connector to Engine integrations + Layout; bump the test count (53 → 88 ctest entries) and add the RDMA-datapath CI job.
  • integration/vllm/README.md: complete the env-var table — including the critical PYTHONHASHSEED=0 (cross-process / cross-restart key determinism; the kv_store: O_DIRECT for all block I/O (fallocate+write, aligned read+t… #1 'writes succeed but reads never hit' misconfig) — plus the full kv_connector_extra_config keys with defaults (load_async, enable_cross_layers_blocks, lookup_rpc_port), a geometry guard for shared pools, and the SG + first-request-JIT notes.
  • docs/vllm/DEPLOY.md (new): end-to-end deploy (build → dfkv cluster → connector → vLLM → verify) with a full config reference and per-scenario recommended settings (single / multi-DP / shared-pool / long-context), the geometry guard, measured results, and a troubleshooting table. Mirrors docs/lmcache/DEPLOY.md.

Accuracy

All params verified against the merged source: env vars read by libdfkv.so + the connector, extra_config defaults read from connector.py/scheduler.py/worker.py, server flags from dfkv_server_main.cc, and the perf/JIT/depth-flat findings from the on-hardware validation. Docs-only; no code change.

ketor added 3 commits June 20, 2026 01:44
…ide + config reference

The merged vLLM connector (PR dingodb#46) had only a terse integration/vllm/README and no
deploy doc. Add complete docs reflecting the shipped code:

- README.md: add the vLLM connector to Engine integrations + Layout + bump the test
  count (53 -> 88 ctest entries, add the RDMA datapath CI job).
- integration/vllm/README.md: complete the env-var table (incl. the critical
  PYTHONHASHSEED=0 for cross-process/restart key determinism), the full
  kv_connector_extra_config keys with defaults (load_async, enable_cross_layers_blocks,
  lookup_rpc_port), a geometry guard for shared pools, and the SG + JIT notes.
- docs/vllm/DEPLOY.md (new): end-to-end deploy (build -> dfkv cluster -> connector ->
  vLLM -> verify) with a full config reference and per-scenario recommended settings
  (single/multi-DP/shared-pool/long-context), geometry guard, measured results, and a
  troubleshooting table. Mirrors docs/lmcache/DEPLOY.md.
…iCache, LMCache, and vLLM

The 'distributed KV cache for SGLang HiCache' title undersold the repo now that it
backs three engines. Lead with LLM inference + list the three adapters. Also fix the
now-contradictory 'without ... MDS ... dependency' line (dfkv ships its own dfkv_mds).
…pth claim

- §5: correct the DFKV_RDMA_DEPTH note — depth is throughput-flat (2026-06 benchmark),
  not a write-bandwidth booster; the lever is batch_concurrency + fewer/larger keys.
- §9 (new): which post-dingodb#46 features do NOT apply to the HiCache/MLA path and why
  (SG = nothing to coalesce for one-object-per-page; io_uring = flat on single disk;
  depth = flat). Plus: vLLM and HiCache instances of the SAME model can share the
  dfkv cluster/ring but do NOT reuse each other's KV (different key schemes + KV
  layouts) — share nodes/capacity, isolate keyspace via distinct model_hash/name.
@ketor ketor merged commit 4312625 into dingodb:main Jun 19, 2026
6 checks passed
ketor added a commit that referenced this pull request Jun 19, 2026
New direct vLLM integration + scatter-gather datapath since v1.5.2:
- vLLM DfkvStoreConnector (KVConnectorBase_V1, GPUDirect RDMA, bypass LMCache) — #46
- Scatter-gather batch API (batch_put_sg/batch_get_auto_sg, QP max_sge 2->30): one
  multi-SGE RDMA per chunk, ~20x fewer keys/disk-reads — #46
- io_uring async GET serve loop (opt-in DFKV_SERVER_URING, default off) — #46
- 7 fresh-eyes review fixes (per-item SG failure, recv-thread hardening, empty-key
  skip, io_uring EINTR/short-read, true out_lens) + 2 regression tests — #46
- Docs: vLLM deploy guide + config reference, README multi-engine, HiCache boundary — #47

No wire change (kProtoVersion still 1); v1.5.x compatible. CI green incl. TSan + RDMA datapath.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant