Experimental Multi-Vector Chamfer Distance with SIMD & BLAS Optimizations#730
Experimental Multi-Vector Chamfer Distance with SIMD & BLAS Optimizations#730suri-kumkaran wants to merge 4 commits intomainfrom
Conversation
| /// | ||
| /// The data is stored in row-major order: `[row0_col0, row0_col1, ..., row0_colN, row1_col0, ...]`. | ||
| #[inline] | ||
| pub fn as_slice(&self) -> &[T] { |
There was a problem hiding this comment.
Something to think about: the creep of Standard specific methods is not something I think we should lean into - especially if we want this to replace diskann_utils::views::Matrix and friends.
I've found myself needing something like this for some other multi-vector related work and I think it makes sense to have something like
trait Dense: Repr {
type Element;
unsafe fn as_slice(&self, ptr: NonNull<u8>) -> &[Self::Element];
}
trait DenseMut: Dense + ReprMut {
unsafe fn as_slice_mut(&mut self, ptr: NonNull<u8>) -> &mut [Self::Element];
}This way, MinMax, transposed, blocked etc. can all opt-in to this as well. That said, I feel that the lack of ability to add inherent methods to be a little unfortunate.
|
@suri-kumkaran please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
Summary
Experimental multi-vector support with fast Chamfer distance for
f32embeddings and benchmarking infrastructure.Changes
Core Types
MultiVector- Row-major token embeddingsTransposedMultiVector- Block-transposed layout (16 vectors/block, SIMD-optimized)Chamfer<Approach>- Generic distance using Inner Product (implementsDistanceFunction)Implementations
NaiveApproach- Scalar baselineSimdApproach- SIMD viadiskann_vector::InnerProductTransposedApproach- Block-transposed SIMDTransposedWithTilingApproach- Query tiling (transposes docs, processes query pairs)QueryTransposedWithTilingApproach- Doc tiling (transposes query, processes doc pairs)SgemmApproach- BLAS SGEMM + SIMD row-max (via faer library)Benchmark Results (100 points, 10 iterations)
Machine: Intel Core i7-1365U, AVX2 supported, AVX-512 not supported, 32 GB RAM
Note: Times are median over 50 measurements, each measuring 10 consecutive distance computations across 100 points.
Speedup vs SIMD Baseline (Median, Lower Latency = Better)
Dimension 256
Dimension 384
Future Work
f16,u8quantized, etc.)Testing
cargo build --release -p multi-vector cargo run --release -p multi-vector --bin multivec-bench -- run \ --input-file multi-vector/examples/bench.json --output-file results.jsonContributing
This work is experimental and will be submitted as separate PRs.