Conversation
jinsolp
left a comment
There was a problem hiding this comment.
Thanks @viclafargue ! Sharing my first batch of comments for build and extend.
Aside from this, I also think ivf_sq.hpp overall needs some documentation!
| auto orig_centroids_view = raft::make_device_matrix_view<const float, int64_t>( | ||
| index->centers().data_handle(), n_lists, dim); | ||
|
|
||
| constexpr size_t kReasonableMaxBatchSize = 65536; |
There was a problem hiding this comment.
This is an arbitrary value used in other index types. I think that the build performance was historically not the main concern in cuVS. This probably was a safe value that ensured that we never ran out of memory in smaller systems. But, thanks for pointing this out as it might be interesting to see if we could improve the way this value is determined (dataset dimensions and available VRAM). cc @achirkin @tfeher
| } | ||
| } | ||
|
|
||
| if (params.add_data_on_build) { detail::extend<T, IdxT>(handle, &idx, dataset, nullptr, n_rows); } |
There was a problem hiding this comment.
maybe check whether if we're adding data on build, we don't have adaptive_centers=true? Think this will take us through unnecessary code paths in the extend function.
There was a problem hiding this comment.
I actually decided to drop the adaptative_centers config parameter and feature altogether, because moving the centroids without updating the residual encoding lists is not a good idea and updating the list would be prohibitive in term of perf. Addressed in 206cb2e.
|
@viclafargue can you please share the build time speedups over Faiss IVFSQ on GPU and CPU? Those are going to be crucial as the major value prop of cuVS over Faiss (both cpu and gpu) is index build speedup. |
Closes #1291.
Overview
IVF-SQ combines an inverted file (IVF) partitioning scheme with 8-bit scalar quantization (SQ8) of residuals. Each float32 dimension is compressed to a single uint8 code, giving a 4x memory reduction over IVF-Flat while retaining high recall. The index implements various metrics (L2, inner-product, and cosine distance), data type (float, half) and also filtering.
Build
kmeans_trainset_fraction) is sampled from the dataset. Balanced K-Means is run on it to producen_listscentroids that partition the vector space.sq_delta[d] = (range + 2*margin) / 255. These two per-dimension parameters (sq_vminthe lower end of the range,sq_deltathe scale or quantization step) are stored in the index and are all that is needed to encode/decode any vector.add_data_on_buildis true (the default), the full dataset is inserted via the extend path described below.Extend
Extend adds new vectors to an existing index without retraining centroids or SQ parameters and in a batched fashion:
(adaptative centers : when enabled, centroids are incrementally updated as new data arrives, and center norms are recomputed).uint8and write the code into the interleaved list layout.Search
Search proceeds in three stages:
queries x centers^T), with metric-specific pre/post-processing. The topn_probesnearest clusters per query are selected viaselect_k.n_queries,n_probes) with one block per (query, probe) pair:query[d],centroid[d],vmin[d]anddelta[d]) are pre-loaded into shared memory once per block in a metric-specific fashion (metric templated). This avoids redundant global memory reads in the hot loop.select_kcall, followed by index postprocessing to map list-local positions back to original dataset IDs.Benchmarks on B200