Skip to content

Conversation

@suri-kumkaran
Copy link
Contributor

Summary

Introduces the rfcs/ directory with the RFC process and the first RFC proposing high-performance Chamfer distance for multi-vector representations in DiskANN.

Changes

New Files

File Description
rfcs/README.md RFC process guide: when to write one, lifecycle, submission steps, index
rfcs/0000-template.md RFC template with recommended structure
rfcs/0001-multi-vector-distance-functions.md RFC proposing Chamfer distance for ColBERT-style multi-vector search

RFC 0001 — Multi-Vector Distance Functions

Proposes a query-transposed tiling approach for Chamfer distance computation:

  • Transposes queries into a block layout, keeps documents in row-major format (no index migration)
  • Processes pairs of document vectors together, amortizing query memory loads (~50% bandwidth reduction)
  • Reuses a MaxSim scratch buffer across distance calls (zero hot-path allocation)
  • Achieves up to 2.67x speedup over SIMD baseline (avg 1.93x across 10 configurations)

Builds on existing types from diskann-quantization (Mat, MatRef, MaxSim, Chamfer) and proposes MaxSim implementing DistanceFunctionMut from diskann-vector for ecosystem integration.

Experimental benchmarks available in PR #730.

Review Focus

This PR is the RFC itself — please review the proposal and leave comments directly on the RFC file:

  • Is query-transposed tiling the right default over document-transposed approaches?
  • API shape: MaxSim implementing DistanceFunctionMut directly
  • TransposedMultiVector type design and Repr trait integration
  • Trade-offs between proposed and alternative approaches

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an RFC framework to the DiskANN repository and introduces the first RFC describing a proposed high-performance Chamfer distance implementation for ColBERT-style multi-vector representations.

Changes:

  • Introduces rfcs/ with an RFC process guide and an index.
  • Adds an RFC template (0000-template.md) to standardize future proposals.
  • Adds RFC 0001 proposing “query-transposed tiling” for faster multi-vector Chamfer distance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
rfcs/README.md Documents the RFC process, lifecycle, and index of RFCs.
rfcs/0000-template.md Provides a reusable RFC template and metadata format.
rfcs/0001-multi-vector-distance-functions.md Proposes the multi-vector Chamfer distance approach, API shape, and benchmarks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

type MultiVectorRef<'a> = MatRef<'a, Standard<f32>>;
```

`Standard<f32>` provides contiguous row-major storage with `as_slice()` for BLAS compatibility and zero-copy views via `MatRef`. `QueryMatRef` (a newtype over `MatRef`) distinguishes query from document matrices for asymmetric distance functions.
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard<f32>/Mat in diskann_quantization::multi_vector don't expose an as_slice() API for the full matrix; the current API provides row slices via rows()/get_row() and views via as_view(). This statement is likely inaccurate—please update the RFC to describe the actual access patterns/APIs (or point to the correct method if one exists).

Suggested change
`Standard<f32>` provides contiguous row-major storage with `as_slice()` for BLAS compatibility and zero-copy views via `MatRef`. `QueryMatRef` (a newtype over `MatRef`) distinguishes query from document matrices for asymmetric distance functions.
`Standard<f32>` provides contiguous row-major storage; the current API exposes row slices via `rows()`/`get_row()` and matrix views via `as_view()`/`MatRef`, which can be used to integrate with BLAS or other kernels. `QueryMatRef` (a newtype over `MatRef`) distinguishes query from document matrices for asymmetric distance functions.

Copilot uses AI. Check for mistakes.
Comment on lines +207 to +209
// 2. Transpose query into SIMD-friendly block layout (once per query)
let transposed_query = TransposedMultiVector::from_view(query.as_ref());

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example calls query.as_ref(), but Mat<...> in diskann_quantization::multi_vector uses as_view() to produce a MatRef (there’s no as_ref() method). As written, the snippet won’t compile—please update to the actual view API.

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +218
// Populate per-query-vector max similarities
max_sim.evaluate(&transposed_query, doc.as_ref());

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_sim.evaluate(&transposed_query, doc.as_ref()); has two issues in the example: (1) doc.as_ref() isn’t an API on the multi-vector Mat types (use the real view method, e.g. as_view(), or pass an existing MatRef), and (2) the proposed evaluate returns a Result but it’s ignored here. Please adjust the snippet to be compilable and show intended error handling.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.01%. Comparing base (f3fcaae) to head (4b43589).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #731   +/-   ##
=======================================
  Coverage   89.01%   89.01%           
=======================================
  Files         428      428           
  Lines       78294    78294           
=======================================
  Hits        69692    69692           
  Misses       8602     8602           

see 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@suri-kumkaran
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Microsoft"


## When to Write an RFC

An RFC is required for changes that affect the public API or architecture of the project, or whenever an important design decision needs review from maintainers and stakeholders. Examples include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are not at the stage where all changes to the public API need an RFC. Obviously, big changes need more thorough consensus, but this project is still developing so rapidly that, for example, adding a variant to a public enum in an internally supporting crate that should not have external users technically changes the public API but, in my opinion, should not require an RFC. Requiring such will bog us down in bureaucracy.


## How to Submit an RFC

1. Copy [0000-template.md](0000-template.md) to `NNNN-short-title.md` (use the next available number).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can use PR numbers instead of total ordering? I'm worried that ensuring all RFCs are sequentially ordered otherwise would be a giant pain.

2. Fill in all sections. Remove instructional comments.
3. Open a pull request with the RFC file. The PR description should summarize the proposal.
4. Discuss in the PR. Update the RFC based on feedback.
5. Once accepted, the RFC is merged and the status is updated to **Accepted**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging an RFC should be considered acceptance. This way, there's no need for additional book-keeping.

| # | Title | Status | Author | Resolved Date |
|---|-------|--------|--------|---------------|
| [0000](0000-template.md) | RFC Template | - | - | - |
| [0001](0001-multi-vector-distance-functions.md) | Multi-Vector Distance Functions | InReview | Suryansh Gupta | - |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to not have an index like this. Without tooling, I think it's highly unlikely that it will be updated.

2. Achieve 2x+ speedup over baseline SIMD through memory layout optimization
3. Maintain compatibility with DiskANN's `DistanceFunctionMut` trait
4. Provide a clean API that enables standalone distance function usage without full index integration
5. Achieve performance within 10–20% of `faer` SGEMM-based Chamfer computation, when both our implementation and `faer` are restricted to AVX2 (no AVX-512 on either side)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use AVX-512 if available? We have support, and it could likely really benefit dense operations like this.


### Alternative Approaches

The experimental crate explored six approaches total. The query-transposed tiling approach was selected as the proposal, but the alternatives remain available and may be better for specific workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing not covered here is how this behaves under multi-threading. it's possible that single threaded tests may look fine, but if effort is not applied to restrict working set sizes to the size of the L1/L2 as much as possible, then multiple-threads could stomp on one-another's usage of L3 when running multiple computations in parallel.

| dim=384, doc=32, query=8 | 8,412 | 1.41x | **1.65x** | 1.30x | 1.24x |
| dim=384, doc=64, query=16 | 38,162 | 1.30x | 1.47x | **1.70x** | 1.66x |
| dim=384, doc=128, query=32 | 171,431 | 1.53x | 1.94x | 2.04x | **2.16x** |
| **Average** | — | **1.38x** | **1.79x** | **1.93x** | **1.59x** |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also see situations where there are, say, 900+ documents?

### Speedup vs SIMD Baseline (Median, Lower Latency = Better)

| Configuration | SIMD (µs) | transposed_simd | transposed_tiling | query_transposed_tiling | sgemm |
|--------------|-----------|-----------------|-------------------|------------------------|-------|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit - can this table be adjusted to it's readable without rendering the markdown?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants