-
Notifications
You must be signed in to change notification settings - Fork 376
Add RFC Process and RFC 0001 — Multi-Vector Distance Functions #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # RFC Title | ||
|
|
||
| <!-- Replace "RFC Title" with a short, descriptive title for your proposal. --> | ||
|
|
||
| <!-- | ||
| This template is a recommended starting point. Feel free to add, remove, or | ||
| reorganize sections to best convey your proposal. | ||
| --> | ||
|
|
||
| | | | | ||
| |---|---| | ||
| | **Status** | Draft <!-- Draft, InReview, Accepted, Rejected --> | | ||
| | **Authors** | Your Name | | ||
| | **Contributors** | <!-- Others who contributed to this RFC --> | | ||
| | **Created** | YYYY-MM-DD | | ||
| | **Updated** | YYYY-MM-DD | | ||
|
|
||
| ## Summary | ||
|
|
||
| <!-- One paragraph describing the proposal at a high level. What are you building and why? --> | ||
|
|
||
| ## Motivation | ||
|
|
||
| ### Background | ||
|
|
||
| <!-- Context a reader needs to understand the problem. What exists today? --> | ||
|
|
||
| ### Problem Statement | ||
|
|
||
| <!-- What specific problem does this RFC solve? Include quantitative data if available. --> | ||
|
|
||
| ### Goals | ||
|
|
||
| <!-- Numbered list of concrete, measurable goals. --> | ||
|
|
||
| 1. Goal one | ||
| 2. Goal two | ||
|
|
||
| ## Proposal | ||
|
|
||
| <!-- The core technical proposal. Include: | ||
| - Type definitions (structs, traits, type aliases) | ||
| - API signatures | ||
| - Module structure | ||
| - Algorithms / pseudocode where helpful | ||
|
|
||
| Use Rust code blocks for type/API definitions. --> | ||
|
|
||
| ## Trade-offs | ||
|
|
||
| <!-- Describe the key design trade-offs and alternative approaches considered. | ||
| For each alternative, explain what it is, its pros/cons, and why it was | ||
| not chosen (or under what conditions it might be preferred). --> | ||
|
|
||
| ## Benchmark Results | ||
|
|
||
| <!-- If applicable, include performance measurements. | ||
| State the configuration (hardware, dataset, parameters) and present results in tables. --> | ||
|
|
||
| ## Future Work | ||
|
|
||
| <!-- Items explicitly deferred from this RFC. Use a checkbox list. --> | ||
|
|
||
| - [ ] Future item one | ||
| - [ ] Future item two | ||
|
|
||
| ## References | ||
|
|
||
| <!-- Links to papers, prior art, related issues, or PRs. --> | ||
|
|
||
| 1. [Reference title](URL) |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,299 @@ | ||||||
| # Multi-Vector Distance Functions | ||||||
|
|
||||||
| | | | | ||||||
| |---|---| | ||||||
| | **Status** | InReview | | ||||||
| | **Authors** | Suryansh Gupta | | ||||||
| | **Contributors** | Suryansh Gupta, Mark Hildebrand | | ||||||
| | **Created** | 2026-01-06 | | ||||||
| | **Updated** | 2026-02-06 | | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| This RFC proposes a high-performance Chamfer distance implementation for multi-vector (ColBERT-style late interaction) representations in DiskANN. The design uses a **query-transposed tiling** approach that transposes queries into a block layout while keeping documents in row-major format, achieving up to **2.67x speedup** over SIMD baseline. The implementation builds on existing types from `diskann-quantization` (`Mat`, `MaxSim`, `Chamfer`) and implements `DistanceFunctionMut` from `diskann-vector` for ecosystem compatibility. | ||||||
|
|
||||||
| ## Motivation | ||||||
|
|
||||||
| ### Background | ||||||
|
|
||||||
| Traditional vector search represents each document as a single embedding. Multi-vector representations (used in models like ColBERT) encode each document/query as a **bag of embeddings** — typically one per token. This enables: | ||||||
|
|
||||||
| - **Fine-grained matching**: Token-level similarity captures nuanced semantic relationships | ||||||
| - **Late interaction**: Document embeddings are pre-computed; only lightweight aggregation at query time | ||||||
| - **Better recall**: If any query token matches any document token well, the document scores high | ||||||
|
|
||||||
| ### Problem Statement | ||||||
|
|
||||||
| Chamfer distance for multi-vector search requires O(Q × D × Dim) operations per query-document pair, where: | ||||||
|
|
||||||
| - Q = number of query tokens | ||||||
| - D = number of document tokens | ||||||
| - Dim = embedding dimensionality | ||||||
|
|
||||||
| For typical configurations (Q=32, D=128, Dim=384), this is ~1.5M floating-point operations per pair. Naive implementations become a bottleneck for large-scale search. | ||||||
|
|
||||||
| ### Goals | ||||||
|
|
||||||
| 1. Implement high-performance Chamfer distance starting with `f32` embeddings, with future support for `f16` and `u8` types | ||||||
| 2. Achieve 2x+ speedup over baseline SIMD through memory layout optimization | ||||||
| 3. Maintain compatibility with DiskANN's `DistanceFunctionMut` trait | ||||||
| 4. Provide a clean API that enables standalone distance function usage without full index integration | ||||||
| 5. Achieve performance within 10–20% of `faer` SGEMM-based Chamfer computation, when both our implementation and `faer` are restricted to AVX2 (no AVX-512 on either side) | ||||||
|
|
||||||
| ## Proposal | ||||||
|
|
||||||
| ### Approach: Query-Transposed Tiling | ||||||
|
|
||||||
| We propose the **query-transposed tiling** approach as the primary Chamfer distance implementation for DiskANN integration. This approach transposes the query into a block-transposed layout, keeps documents in row-major format, and processes pairs of document vectors together to amortize query memory loads. A pre-allocated scratch buffer tracks per-query max similarities and is reused across distance calls. | ||||||
|
|
||||||
| This is the recommended default because it preserves the existing document storage format (no index migration), while still achieving significant speedups through SIMD tiling. | ||||||
|
|
||||||
| ### Chamfer Distance Definition | ||||||
|
|
||||||
| For query multi-vector Q and document multi-vector D: | ||||||
|
|
||||||
| ``` | ||||||
| Chamfer(Q, D) = Σᵢ minⱼ -IP(qᵢ, dⱼ) | ||||||
| ``` | ||||||
|
|
||||||
| Since `InnerProduct::evaluate` in `diskann-vector` returns the negated inner product (`-IP`), the kernel finds the minimum negated IP per query vector (equivalent to finding the maximum similarity), then sums across all query vectors. The result is a distance compatible with DiskANN's min-heap. | ||||||
|
|
||||||
| ### Types | ||||||
|
|
||||||
| The design builds on the multi-vector matrix types already defined in `diskann_quantization::multi_vector`: | ||||||
|
|
||||||
| #### Query and Document (from `diskann-quantization`) | ||||||
|
|
||||||
| ```rust | ||||||
| use diskann_quantization::multi_vector::{Mat, MatRef, Standard, QueryMatRef}; | ||||||
|
|
||||||
| /// Owning row-major matrix: rows = tokens, cols = dimensions | ||||||
| type MultiVector = Mat<Standard<f32>>; | ||||||
|
|
||||||
| /// Borrowed view into a multi-vector | ||||||
| type MultiVectorRef<'a> = MatRef<'a, Standard<f32>>; | ||||||
| ``` | ||||||
|
|
||||||
| `Standard<f32>` provides contiguous row-major storage with `as_slice()` for BLAS compatibility and zero-copy views via `MatRef`. `QueryMatRef` (a newtype over `MatRef`) distinguishes query from document matrices for asymmetric distance functions. | ||||||
|
||||||
| `Standard<f32>` provides contiguous row-major storage with `as_slice()` for BLAS compatibility and zero-copy views via `MatRef`. `QueryMatRef` (a newtype over `MatRef`) distinguishes query from document matrices for asymmetric distance functions. | |
| `Standard<f32>` provides contiguous row-major storage; the current API exposes row slices via `rows()`/`get_row()` and matrix views via `as_view()`/`MatRef`, which can be used to integrate with BLAS or other kernels. `QueryMatRef` (a newtype over `MatRef`) distinguishes query from document matrices for asymmetric distance functions. |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example calls query.as_ref(), but Mat<...> in diskann_quantization::multi_vector uses as_view() to produce a MatRef (there’s no as_ref() method). As written, the snippet won’t compile—please update to the actual view API.
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_sim.evaluate(&transposed_query, doc.as_ref()); has two issues in the example: (1) doc.as_ref() isn’t an API on the multi-vector Mat types (use the real view method, e.g. as_view(), or pass an existing MatRef), and (2) the proposed evaluate returns a Result but it’s ignored here. Please adjust the snippet to be compilable and show intended error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing not covered here is how this behaves under multi-threading. it's possible that single threaded tests may look fine, but if effort is not applied to restrict working set sizes to the size of the L1/L2 as much as possible, then multiple-threads could stomp on one-another's usage of L3 when running multiple computations in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit - can this table be adjusted to it's readable without rendering the markdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also see situations where there are, say, 900+ documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use AVX-512 if available? We have support, and it could likely really benefit dense operations like this.