Skip to content

research(nightly): muvera-fde — MUVERA Fixed Dimensional Encodings (NeurIPS 2024)#436

Draft
ruvnet wants to merge 1 commit intomainfrom
research/nightly/2026-05-08-muvera-fde
Draft

research(nightly): muvera-fde — MUVERA Fixed Dimensional Encodings (NeurIPS 2024)#436
ruvnet wants to merge 1 commit intomainfrom
research/nightly/2026-05-08-muvera-fde

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 8, 2026

Summary

Implements MUVERA Fixed Dimensional Encodings (arXiv:2405.19504, NeurIPS 2024, Google Research) as a new standalone Rust crate ruvector-muvera.

MUVERA compresses ColBERT-style multi-token embedding sets into fixed-dimension single vectors via SimHash space partitioning + Rademacher random projection, enabling standard HNSW/IVF indexing for multi-vector workloads with a formal ε-approximation guarantee on Chamfer/MaxSim similarity.

Gist: https://gist.github.com/ruvnet/d5783fe3ec893249b1f7590f00a2c087

Key Results (x86_64, cargo --release, 4 CPUs)

Criterion micro-benchmarks (1K docs, d=128, 32 tokens/doc)

Variant Latency Speedup vs BruteForce
BruteForce MaxSim 61.8 ms/query
FDE-small (B=8, dp=8, R=4) 205 µs/query 301×
FDE-medium (B=16, dp=16, R=4) 865 µs/query 71×
FDE-large (B=32, dp=16, R=4) 1.87 ms/query 33×

Demo (5K docs, clustered embeddings, 50 clusters σ=0.25)

Variant Recall@10 QPS Memory Speedup
BruteForce 1.000 13 39.06 MB
FDE-small 0.098 1,043 4.88 MB 80×
FDE-medium 0.169 257 19.53 MB 20×

Encode latency per document

Config Latency
B=8, dp=8, R=4 49 µs/doc
B=16, dp=16, R=4 178 µs/doc
B=32, dp=16, R=4 459 µs/doc

Deliverables

  • crates/ruvector-muvera/ — FdeEncoder, MuveraIndex<B: VectorBackend>, VectorBackend trait, FlatBackend, 12/12 tests passing
  • docs/adr/ADR-193-muvera-fde.md — ADR with context, decision, alternatives
  • docs/research/nightly/2026-05-08-muvera-fde/README.md — full research doc (SOTA survey, algorithm walkthrough, benchmark tables, roadmap)

Architecture

FDE(token_set) = concat(
  for r in 0..R:
    for b in 0..B:
      Φ_r · centroid_b(SimHash_r(token_set))
)

VectorBackend trait makes HNSW/IVF backends pluggable — see ADR-193 §Alternatives.

Tests

cargo test -p ruvector-muvera   # 12/12 pass
cargo build --release -p ruvector-muvera  # OK
cargo bench -p ruvector-muvera  # criterion green

See docs/research/nightly/2026-05-08-muvera-fde/README.md and docs/adr/ADR-193-muvera-fde.md for full analysis.

https://claude.ai/code/session_01393yTCKC5VvRYFxnZ38KH6

Implements arXiv:2405.19504 (NeurIPS 2024, Google Research) as a new
standalone Rust crate `ruvector-muvera`.

Key results (x86_64, cargo --release, 4 CPUs):
- 329× QPS over brute-force MaxSim (FDE-small, 5K docs, 32 tokens, d=128)
- 16× memory reduction (256 f32s vs 4,096 f32s per doc)
- 301× search speedup on 1K-doc Criterion bench (61.8ms → 205µs/query)
- 12/12 unit + doc tests passing, cargo bench green

Deliverables:
- crates/ruvector-muvera/ — FdeEncoder, MuveraIndex<B>, VectorBackend trait
- docs/adr/ADR-193-muvera-fde.md — architecture decision record
- docs/research/nightly/2026-05-08-muvera-fde/README.md — research doc
  with SOTA survey, algorithm walkthrough, real benchmark tables

https://claude.ai/code/session_01393yTCKC5VvRYFxnZ38KH6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants