research(nightly): muvera — MUVERA multi-vector retrieval via Fixed Dimensional Encodings (NeurIPS 2024) by ruvnet · Pull Request #442 · ruvnet/RuVector

ruvnet · 2026-05-08T16:09:20Z

Nightly Research: MUVERA — Multi-Vector Retrieval via Fixed Dimensional Encodings

Date: 2026-05-08
ADR: ADR-193
Research doc: docs/research/nightly/2026-05-08-muvera/README.md
Paper: Karpukhin et al., NeurIPS 2024, arXiv:2405.19504
Gist: https://gist.github.com/ruvnet/627c64b9986a8cf1b53385708c093481

What

Adds crates/ruvector-muvera — the first multi-vector / late-interaction retrieval primitive in ruvector. MUVERA converts ColBERT-style multi-vector documents (one float vector per token) into a single Fixed Dimensional Encoding (FDE) whose inner product provably approximates MaxSim. After encoding, any standard single-vector MIPS index (flat scan, HNSW) works directly.

Why

ruvector had zero support for late-interaction retrieval. ColBERT and its derivatives consistently outperform bi-encoders by 3–7% nDCG@10 on BEIR benchmarks, but their multi-vector representation is incompatible with ruvector's existing AnnIndex trait. MUVERA provides the missing bridge without requiring bespoke infrastructure.

Deliverables

Working Rust crate crates/ruvector-muvera/ — cargo build --release ✓, cargo test 11/11 ✓
ADR-193 docs/adr/ADR-193-muvera.md
Research doc docs/research/nightly/2026-05-08-muvera/README.md

Benchmark results (Intel Xeon @ 2.10 GHz, release build, synthetic Gaussian data)

Variant	n_docs	QPS	Build (ms)	Mem (KB)
BruteForceMaxSim	10,000	3	74	160,000
FlatFDE	10,000	14	2,441	320,000
HnswFDE	10,000	131	75,306	320,625

HnswFDE: 42.4× QPS speedup over exact BruteForce MaxSim at n=10K.

FlatFDE at n=500: 9.5× speedup with 50% memory reduction.

Architecture

FdeEncoder (R×D random projection matrix)
    └── MultiVecIndex trait
            ├── BruteForceMaxSim  — exact O(n·|Q|·|D|·d)
            ├── FlatFdeIndex      — FDE + O(n·R·D) flat scan
            └── HnswFdeIndex      — FDE + greedy HNSW, M=16

Known limitations (documented in research doc)

HNSW build is O(n²) PoC; needs hierarchical HNSW for production (tracked as future ADR)
Recall on random Gaussian data is low by design — FDE requires semantic embedding structure
FDE memory overhead when R ≥ T (use R < tokens_per_doc in production)

Next steps

Binary FDE (1-bit sign encoding, 32× memory reduction)
IDF-weighted FDE accumulation
Hierarchical HNSW build (O(n·log n))
Integration with ruvector-acorn for predicate-filtered multi-vector search

… FDE (NeurIPS 2024) Implements Fixed Dimensional Encoding (FDE) for ColBERT-style late-interaction retrieval, reducing multi-vector MaxSim search to standard single-vector MIPS. Three variants: BruteForceMaxSim (exact), FlatFdeIndex (FDE+flat scan), HnswFdeIndex (FDE+greedy HNSW). 11 tests pass; 42x QPS speedup over BruteForce at n=10K. Crate: crates/ruvector-muvera/ Paper: arXiv:2405.19504 (NeurIPS 2024) https://claude.ai/code/session_01YLmQSPdeQLt1jdLKFKETMN

Decision record for the new ruvector-muvera crate implementing MUVERA Fixed Dimensional Encodings (NeurIPS 2024). Documents the problem (no multi-vector primitive in ruvector), decision, alternatives considered (PLAID, per-token HNSW, MRL-HNSW, binary FDE), and consequence matrix. https://claude.ai/code/session_01YLmQSPdeQLt1jdLKFKETMN

…l survey Deep research document covering SOTA in late-interaction dense retrieval (ColBERT, PLAID, MUVERA, EMVB), implementation notes, benchmark results with real cargo-run numbers, production failure modes, and improvement roadmap. Benchmark highlights: - HnswFDE: 42.4x QPS vs BruteForce MaxSim at n=10K (131 vs 3 QPS) - FlatFDE: 9.5x speedup at n=500 with 50% memory reduction - 11 tests pass, cargo build --release clean https://claude.ai/code/session_01YLmQSPdeQLt1jdLKFKETMN

claude added 3 commits May 8, 2026 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(nightly): muvera — MUVERA multi-vector retrieval via Fixed Dimensional Encodings (NeurIPS 2024)#442

research(nightly): muvera — MUVERA multi-vector retrieval via Fixed Dimensional Encodings (NeurIPS 2024)#442
ruvnet wants to merge 3 commits intomainfrom
research/nightly/2026-05-08-muvera

ruvnet commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruvnet commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Nightly Research: MUVERA — Multi-Vector Retrieval via Fixed Dimensional Encodings

What

Why

Deliverables

Benchmark results (Intel Xeon @ 2.10 GHz, release build, synthetic Gaussian data)

Architecture

Known limitations (documented in research doc)

Next steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ruvnet commented May 8, 2026 •

edited

Loading