feat: GPU support for building DISKANN based indexes. by liorf95 · Pull Request #1460 · zilliztech/knowhere

liorf95 · 2026-02-08T12:11:52Z

See issue #1422.

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>

mergify · 2026-02-08T12:12:29Z

@liorf95 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

If you're fixing a bug, label it as kind/bug.
For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

alexanderguzhva · 2026-02-09T22:09:53Z

-constexpr uint32_t kK = 10;
+constexpr uint32_t kK = 64;
+#ifdef KNOWHERE_WITH_CUVS
+constexpr uint32_t defaultMaxDegree = 64;


why is there a difference for the value of defaultMaxDegree for CUVS and non-CUVS versions for the testing? Any particular reasons?

The default value of max_degree for diskann based indexes is 56, so we did not want to modify UT for those basic scenarios.
However, cuVS related limitations demand max_degree to be a power of 2 (e.g., 32, 64, 128, etc.)

alexanderguzhva · 2026-02-09T22:14:01Z

 constexpr uint32_t kLargeDim = 256;
-constexpr uint32_t kK = 10;
+constexpr uint32_t kK = 64;
+#ifdef KNOWHERE_WITH_CUVS


what are the requirements for the GPU memory for a cuVS-based version of this test? It would be nice to have it as a comment in the beginning of the file, including a minimum sifficient GPU, if appropriate. Something like This test is expected to run on a GPU with at least 2 GB of RAM, such as NVIDIA A10 or NVIDIA T4

alexanderguzhva · 2026-02-09T22:20:59Z

    diskann::get_bin_metadata(base_file, base_num, base_dim);
+#ifdef KNOWHERE_WITH_CUVS
+    raft::device_resources dev_resources;
+    if(compareMetric == diskann::L2 && is_gpu_available()) {


and what is going to happen if (compareMetric != diskann::L2)?

Currently, the cuvs Vamana build only supports the L2Expanded metric, so that the index will be built with cpu as usual.

alexanderguzhva · 2026-02-09T22:21:30Z

+              " Gib free memory out of " <<  gpu_total_mem/(1024*1024*1024L) << " Gib total";
+      ram_budget = std::min<double>(ram_budget,(double)0.9*(gpu_free_mem/(1024*1024*1024)));
+      shard_r = std::max<uint32_t>((uint32_t)32,(uint32_t)R/2);
+    }


} else { // add to log why GPU is not going to be used }

alexanderguzhva · 2026-02-09T22:25:46Z

+      bool built_with_gpu=false;
+#ifdef KNOWHERE_WITH_CUVS
+      //currently cuvs vamana build only supports L2Expanded metric
+      if (compareMetric == diskann::L2 && is_gpu_available() &&


this and the following calls imply that if the metric is right, the library is built with #define KNOWHERE_WITH_CUVS and is_gpu_available(), then the code will ALWAYS be built using a GPU.
Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value is true. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.

this and the following calls imply that if the metric is right, the library is built with #define KNOWHERE_WITH_CUVS and is_gpu_available(), then the code will ALWAYS be built using a GPU. Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value is true. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.

you can already control that with env varibale NVIDIA_VISIBLE_DEVICES - set it to none and the index will be built with CPU

alexanderguzhva · 2026-02-09T22:26:39Z

+                                           1);
+        }
+        built_with_gpu=true;
+      }


} else { // add to log why GPU is not going to be used }

Same comment to other possible else branches, if appropriate.

alexanderguzhva · 2026-02-09T22:29:35Z

+#ifdef KNOWHERE_WITH_CUVS
+    if(is_gpu_available()) {
+      if (R != 32 && R != 64 && R != 128) {
+        LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128";


LOG_KNOWHERE_ERROR_ << "Invalid R value (" << R << ") for cuvs - should be only 32 or 64 or 128";

Same comment applies for the following lines as well

alexanderguzhva · 2026-02-09T22:32:30Z

+                            int iters);
+bool is_gpu_available();
+
+void kmeans_gpu(


personally, I would prefer FAISS based names here. So, that would be train (kmeans_gpu) and assign (predict_gpu).
Alternatively, please add comments on what these functions do, because it looks confusing for those who are not aware.

We will add comments on what these functions do.

alexanderguzhva · 2026-02-09T22:37:30Z

+      raft::resources res;
+      LOG_KNOWHERE_INFO_ << "Running k-means with " << k << " clusters...using GPU!";
+      kmeans_gpu(res,train_data_float.get(), num_train, train_dim,
+              k, 10, centroids.data());


would it be possible to extract 10 (number of iterations?) as a constexpr value nearly, please? Same for the CPU branch

alexanderguzhva · 2026-02-09T22:40:13Z

+
+void gpu_get_mem_info(raft::resources &dev_resources, size_t &gpu_free_mem,size_t &gpu_total_mem);
+
+#endif


what about selecting the right GPU to deal with? It is expected to be done by a caller using raft::resources object, correct?

what about selecting the right GPU to deal with? It is expected to be done by a caller using raft::resources object, correct?

The current implementation doesnt handle multi GPU logic.IT is assumed that each process use one GPU. however with multi GPU platform in k8s with milvus you can assign each datanode a different GPU exclusively and accelerate the build further.

alexanderguzhva · 2026-02-10T01:40:36Z

/kind improvement

alexanderguzhva · 2026-02-10T01:40:46Z

issue: #1422

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>

hhy3 · 2026-02-11T06:40:33Z

+        LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128";
+        return -1;
+      }
+      if (L != 32 && L != 64 && L != 128 && L != 256) {


what is the specific reason to choose the seven magic numbers? can it be more flexible?

with cuvs 25.10 which is the version used, cuvs has validation for visted size which must be a power of 2.
This list is reasnoble considering this constraint as we see that higher L than 256 doesnt benfit too much

maybe restrict it to 2^k with a larger boundary like L \in [32, 2048] instead of some magic number is better. maybe some larger or harder datasets or in higher recall requirement need higher parameter :)

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>

alexanderguzhva · 2026-03-03T03:26:59Z

/lgtm

alexanderguzhva · 2026-03-03T03:27:25Z

issue: #1422
/kind improvement

alexanderguzhva · 2026-03-03T03:27:44Z

/approve

sre-ci-robot · 2026-03-03T03:27:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderguzhva, liorf95

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [alexanderguzhva]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liliu-z · 2026-03-03T08:28:10Z

+	raft::copy(dataset.data_handle(), pinned_data, total,
+			raft::resource::get_cuda_stream(dev_resources));
+
+	cudaFreeHost(pinned_data);


raft::copy is async on the stream. cudaFreeHost here frees pinned memory while DMA is still reading from it — use-after-free. kmeans_gpu and brute_force_gpu have the same issue (async copy to host, then immediate return without sync). predict_gpu in this same file does it correctly. Add raft::resource::sync_stream(dev_resources) before freeing/returning in all three functions.

liliu-z · 2026-03-03T08:28:14Z

    knowhere::WaitAllSuccess(futures);
+#ifdef KNOWHERE_WITH_CUVS
+    // GPU path
+    raft::resources res;


raft::resources is created once in the outer scope and captured by reference into multiple thread pool tasks. It holds a CUDA stream, cuBLAS handle, and workspace allocator — none are thread-safe. Concurrent predict_gpu calls will interleave operations on the same stream and corrupt allocator state. Either serialize the GPU path (like generate_pq_pivots already does for its GPU loop), or create a separate raft::resources per task.

liliu-z · 2026-03-03T08:28:18Z

+      const int* deg_size = std::find(std::begin(DEGREE_SIZES), std::end(DEGREE_SIZES), R);
+      if (deg_size == std::end(DEGREE_SIZES)) {
+        LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be power of 2 and maximum 256";
+        return -1;


R=56 (a common DiskANN default) will return -1 here if a GPU is present. GPU acceleration is an optimization — it shouldn't break existing configs. Consider falling back to CPU build with a warning instead of failing, e.g. set a use_gpu = false flag when params don't meet cuVS constraints.

liliu-z · 2026-03-03T08:28:22Z

 constexpr uint32_t kDim = 128;
 constexpr uint32_t kLargeDim = 256;
-constexpr uint32_t kK = 10;
+constexpr uint32_t kK = 64;


kK changed from 10→64 and search_list_size from 36→64 globally, affecting CPU-only test paths too. Was this to accommodate GPU-built graphs needing wider search, or just to match the new kK? If the former, consider keeping separate params for GPU/CPU tests to avoid masking potential recall differences.

liorf95 · 2026-03-03T09:10:38Z

Since it was all already merged into main, I will handle all the above new comments on a new PR from a new forked branch.

liorf95 · 2026-03-04T13:07:42Z

All the above new review comments were addressed in the new PR #1481.

liorf95 added 2 commits February 8, 2026 10:34

Add GPU support for building DISKANN based indexes.

a7b71a2

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>

Add new diskann_gpu files.

d3ce199

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>

sre-ci-robot requested review from foxspy and zhengbuqian February 8, 2026 12:11

sre-ci-robot added the size/XXL label Feb 8, 2026

mergify Bot added the dco-passed label Feb 8, 2026

mergify Bot added do-not-merge/missing-related-issue ci-passed labels Feb 8, 2026

alexanderguzhva reviewed Feb 9, 2026

View reviewed changes

sre-ci-robot added the kind/improvement label Feb 10, 2026

After maintainer code review comments

904fa79

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>

mergify Bot removed the ci-passed label Feb 10, 2026

hhy3 reviewed Feb 11, 2026

View reviewed changes

rephrase GPU R and L validation

b2e74c2

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>

mergify Bot added the ci-passed label Feb 18, 2026

Merge branch 'main' into add_gpu

e975cd9

mergify Bot added ci-passed and removed ci-passed labels Feb 22, 2026

sre-ci-robot assigned alexanderguzhva Mar 3, 2026

sre-ci-robot added the lgtm label Mar 3, 2026

alexanderguzhva removed the do-not-merge/missing-related-issue label Mar 3, 2026

sre-ci-robot added the approved label Mar 3, 2026

sre-ci-robot merged commit 20b6dce into zilliztech:main Mar 3, 2026
11 checks passed

liliu-z reviewed Mar 3, 2026

View reviewed changes


		void gpu_get_mem_info(raft::resources &dev_resources, size_t &gpu_free_mem,size_t &gpu_total_mem);

		#endif

Conversation

liorf95 commented Feb 8, 2026

Uh oh!

mergify Bot commented Feb 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexanderguzhva commented Feb 10, 2026

Uh oh!

alexanderguzhva commented Feb 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexanderguzhva commented Mar 3, 2026

Uh oh!

alexanderguzhva commented Mar 3, 2026

Uh oh!

alexanderguzhva commented Mar 3, 2026

Uh oh!

sre-ci-robot commented Mar 3, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liorf95 commented Mar 3, 2026

Uh oh!

liorf95 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants