research(nightly): muvera — MUVERA multi-vector retrieval via Fixed Dimensional Encodings (NeurIPS 2024)#442
Draft
research(nightly): muvera — MUVERA multi-vector retrieval via Fixed Dimensional Encodings (NeurIPS 2024)#442
Conversation
… FDE (NeurIPS 2024) Implements Fixed Dimensional Encoding (FDE) for ColBERT-style late-interaction retrieval, reducing multi-vector MaxSim search to standard single-vector MIPS. Three variants: BruteForceMaxSim (exact), FlatFdeIndex (FDE+flat scan), HnswFdeIndex (FDE+greedy HNSW). 11 tests pass; 42x QPS speedup over BruteForce at n=10K. Crate: crates/ruvector-muvera/ Paper: arXiv:2405.19504 (NeurIPS 2024) https://claude.ai/code/session_01YLmQSPdeQLt1jdLKFKETMN
Decision record for the new ruvector-muvera crate implementing MUVERA Fixed Dimensional Encodings (NeurIPS 2024). Documents the problem (no multi-vector primitive in ruvector), decision, alternatives considered (PLAID, per-token HNSW, MRL-HNSW, binary FDE), and consequence matrix. https://claude.ai/code/session_01YLmQSPdeQLt1jdLKFKETMN
…l survey Deep research document covering SOTA in late-interaction dense retrieval (ColBERT, PLAID, MUVERA, EMVB), implementation notes, benchmark results with real cargo-run numbers, production failure modes, and improvement roadmap. Benchmark highlights: - HnswFDE: 42.4x QPS vs BruteForce MaxSim at n=10K (131 vs 3 QPS) - FlatFDE: 9.5x speedup at n=500 with 50% memory reduction - 11 tests pass, cargo build --release clean https://claude.ai/code/session_01YLmQSPdeQLt1jdLKFKETMN
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Nightly Research: MUVERA — Multi-Vector Retrieval via Fixed Dimensional Encodings
Date: 2026-05-08
ADR: ADR-193
Research doc: docs/research/nightly/2026-05-08-muvera/README.md
Paper: Karpukhin et al., NeurIPS 2024, arXiv:2405.19504
Gist: https://gist.github.com/ruvnet/627c64b9986a8cf1b53385708c093481
What
Adds
crates/ruvector-muvera— the first multi-vector / late-interaction retrieval primitive in ruvector. MUVERA converts ColBERT-style multi-vector documents (one float vector per token) into a single Fixed Dimensional Encoding (FDE) whose inner product provably approximates MaxSim. After encoding, any standard single-vector MIPS index (flat scan, HNSW) works directly.Why
ruvector had zero support for late-interaction retrieval. ColBERT and its derivatives consistently outperform bi-encoders by 3–7% nDCG@10 on BEIR benchmarks, but their multi-vector representation is incompatible with ruvector's existing
AnnIndextrait. MUVERA provides the missing bridge without requiring bespoke infrastructure.Deliverables
crates/ruvector-muvera/—cargo build --release✓,cargo test11/11 ✓docs/adr/ADR-193-muvera.mddocs/research/nightly/2026-05-08-muvera/README.mdBenchmark results (Intel Xeon @ 2.10 GHz, release build, synthetic Gaussian data)
HnswFDE: 42.4× QPS speedup over exact BruteForce MaxSim at n=10K.
FlatFDE at n=500: 9.5× speedup with 50% memory reduction.
Architecture
Known limitations (documented in research doc)
Next steps