Skip to content

research(nightly): CoDEQ streaming quantizer — kd-tree median-split with O(1) updates (ADR-193)#456

Draft
ruvnet wants to merge 1 commit intomainfrom
research/nightly/2026-05-11-codeq
Draft

research(nightly): CoDEQ streaming quantizer — kd-tree median-split with O(1) updates (ADR-193)#456
ruvnet wants to merge 1 commit intomainfrom
research/nightly/2026-05-11-codeq

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 11, 2026

Summary

  • Implements CoDEQ (arXiv:2512.18335, Dec 2025) as crates/ruvector-codeq: a kd-tree median-split quantizer where leaf centroids update in O(1) via Welford's online mean — no k-means rebuild required under streaming drift.
  • Adds crates/ruvector-streaming-hnsw as a concurrency-safe HNSW baseline using parking_lot::RwLock-per-neighbor-list.
  • Documents findings in docs/research/nightly/2026-05-11-codeq/README.md and docs/adr/ADR-193-codeq.md.
  • Gist overview: https://gist.github.com/ruvnet/d10fe656bd0fa68b4eb873ad299c6d4e

Benchmark Numbers (x86_64, n=5,000, D=128, k=10, release build)

Variant Recall@10 QPS Build Streaming
FlatL2 (exact) 100.0% 1,129 1.4 ms trivial
StaticPQ (k-means, frozen) 28.1% 2,636 404 ms requires full rebuild
CoDEQ (kd-tree, 8-bit) 7.2% 4,812 54 ms 330,942 ops/sec

After 10% data drift:

  • StaticPQ recall drops 2.9pp (no rebuild possible)
  • CoDEQ recall unchanged (centroids update in place)

Test Plan

  • cargo test -p ruvector-codeq --lib — 14/14 pass
  • cargo test -p ruvector-streaming-hnsw --lib — 9/9 pass
  • cargo run --release -p ruvector-codeq — benchmark numbers verified
  • cargo run --release -p ruvector-streaming-hnsw — benchmark numbers verified
  • No unsafe code
  • All files under 500 lines
  • No secrets committed

Architecture

CoDEQ separates quantization structure (frozen kd-tree: split dims + thresholds, computed once from training variance) from quantization state (leaf centroids, updated O(1) per insert/delete). This is the key insight from arXiv:2512.18335: the tree only needs to reflect the coarse distribution shape, which is stable; the centroids track exact per-leaf means, which are local.

See ADR-193 for full decision rationale, alternatives considered, and known limitations.

https://claude.ai/code/session_01Y4ZUGMaHShXjtTczS2nKPq

Adds ruvector-codeq crate: kd-tree median-split quantizer with O(1)
streaming insert/delete via Welford online centroid updates. No k-means
rebuild required under distribution drift.

Measured on x86_64, n=5,000 D=128:
- Build: 54ms (vs 404ms StaticPQ — 7.5× faster)
- QPS: 4,812 (vs 1,129 FlatL2 — 4.3× faster)
- Streaming update: 330,942 ops/sec (1,000 ops in 3ms)
- Recall stable under 10% drift (StaticPQ drops 2.9pp without rebuild)

14 unit tests passing.

https://claude.ai/code/session_01Y4ZUGMaHShXjtTczS2nKPq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants