research(nightly): soar-ivf — SOAR orthogonality-amplified IVF spilling (NeurIPS 2023)#440
Draft
research(nightly): soar-ivf — SOAR orthogonality-amplified IVF spilling (NeurIPS 2023)#440
Conversation
…lified residual spilling
Implements SOAR-IVF (Sun et al., NeurIPS 2023, arXiv:2404.00774) as a new
standalone Rust crate. First IVF-based index in the ruvector workspace and
first open-source Rust implementation of SOAR.
Three index variants under SoarIndex / IndexKind:
- Flat: exact brute-force baseline
- IvfPq: IVF + Product Quantization (ADC)
- SoarIvfPq: IVF + PQ + orthogonality-amplified secondary spilling
Benchmark results (Intel Xeon @ 2.10GHz, --release):
- SOAR nprobe=1: +10.4pp recall@10 vs IVF-PQ (59.9% vs 49.5%), n=2K D=64
- SOAR nprobe=2: +1.8pp recall@10 vs IVF-PQ (42.9% vs 41.1%), n=10K D=128
- Memory overhead: +17% for secondary lists (266 KB vs 227 KB)
- Build time overhead: <2% vs plain IVF-PQ
Files:
crates/ruvector-soar/Cargo.toml
crates/ruvector-soar/src/{lib,error,kmeans,pq,index,main}.rs
crates/ruvector-soar/benches/soar_bench.rs
cargo build --release -p ruvector-soar ✓
cargo test -p ruvector-soar — 5/5 tests pass ✓
https://claude.ai/code/session_018ZoaZ5LadzrnnQYeKNUe2c
Research document: docs/research/nightly/2026-05-08-soar-ivf/README.md - SOTA survey (NeurIPS 2023, competitor analysis, related 2024 work) - Full algorithm walkthrough and blog-readable explanation - Measured benchmark results from cargo run --release - Practical failure modes and production improvement roadmap ADR-193: docs/adr/ADR-193-soar-ivf.md - Context: no IVF-based index existed in ruvector workspace - Decision: SoarIndex with Flat / IvfPq / SoarIvfPq variants - Consequences: +17% memory, +10pp recall at nprobe=1, 5 alternatives considered https://claude.ai/code/session_018ZoaZ5LadzrnnQYeKNUe2c
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates/ruvector-soar: first IVF-based index in ruvector and first open-source Rust implementation of SOAR (NeurIPS 2023, arXiv:2404.00774)docs/research/nightly/2026-05-08-soar-ivf/README.md: SOTA survey, algorithm walkthrough, benchmark results, failure modes, roadmapdocs/adr/ADR-193-soar-ivf.md: decision record with full alternatives analysisGist: https://gist.github.com/ruvnet/5e14de7710aed52b8d28c9ba739849d1
What is SOAR?
SOAR (Spilling with Orthogonality-Amplified Residuals, Sun et al. Google Research, NeurIPS 2023) solves the IVF boundary problem: vectors near Voronoi cell boundaries miss queries that land in the adjacent cell. SOAR assigns each vector a secondary cluster chosen by minimising:
This penalises secondary residuals parallel to the primary residual, ensuring the secondary cluster is strong in exactly the query directions where the primary is weak. Deployed at Google Cloud Vertex AI Vector Search.
Benchmark results (Intel Xeon @ 2.10GHz, cargo run --release)
cargo build --release -p ruvector-soar✅cargo test -p ruvector-soar— 5/5 tests pass ✅Test plan
cargo build --release -p ruvector-soarpassescargo test -p ruvector-soar— 5 tests greencargo run --release -p ruvector-soar -- --fastproduces reasonable recall numbershttps://claude.ai/code/session_018ZoaZ5LadzrnnQYeKNUe2c