research(nightly): CoDEQ streaming quantizer — kd-tree median-split with O(1) updates (ADR-193)#456
Draft
research(nightly): CoDEQ streaming quantizer — kd-tree median-split with O(1) updates (ADR-193)#456
Conversation
Adds ruvector-codeq crate: kd-tree median-split quantizer with O(1) streaming insert/delete via Welford online centroid updates. No k-means rebuild required under distribution drift. Measured on x86_64, n=5,000 D=128: - Build: 54ms (vs 404ms StaticPQ — 7.5× faster) - QPS: 4,812 (vs 1,129 FlatL2 — 4.3× faster) - Streaming update: 330,942 ops/sec (1,000 ops in 3ms) - Recall stable under 10% drift (StaticPQ drops 2.9pp without rebuild) 14 unit tests passing. https://claude.ai/code/session_01Y4ZUGMaHShXjtTczS2nKPq
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates/ruvector-codeq: a kd-tree median-split quantizer where leaf centroids update in O(1) via Welford's online mean — no k-means rebuild required under streaming drift.crates/ruvector-streaming-hnswas a concurrency-safe HNSW baseline usingparking_lot::RwLock-per-neighbor-list.docs/research/nightly/2026-05-11-codeq/README.mdanddocs/adr/ADR-193-codeq.md.Benchmark Numbers (x86_64, n=5,000, D=128, k=10, release build)
After 10% data drift:
Test Plan
cargo test -p ruvector-codeq --lib— 14/14 passcargo test -p ruvector-streaming-hnsw --lib— 9/9 passcargo run --release -p ruvector-codeq— benchmark numbers verifiedcargo run --release -p ruvector-streaming-hnsw— benchmark numbers verifiedArchitecture
CoDEQ separates quantization structure (frozen kd-tree: split dims + thresholds, computed once from training variance) from quantization state (leaf centroids, updated O(1) per insert/delete). This is the key insight from arXiv:2512.18335: the tree only needs to reflect the coarse distribution shape, which is stable; the centroids track exact per-leaf means, which are local.
See ADR-193 for full decision rationale, alternatives considered, and known limitations.
https://claude.ai/code/session_01Y4ZUGMaHShXjtTczS2nKPq