Skip to content

feat: GPU support for building DISKANN based indexes.#1460

Merged
sre-ci-robot merged 5 commits intozilliztech:mainfrom
liorf95:add_gpu
Mar 3, 2026
Merged

feat: GPU support for building DISKANN based indexes.#1460
sre-ci-robot merged 5 commits intozilliztech:mainfrom
liorf95:add_gpu

Conversation

@liorf95
Copy link
Copy Markdown
Contributor

@liorf95 liorf95 commented Feb 8, 2026

See issue #1422.

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
@mergify
Copy link
Copy Markdown

mergify Bot commented Feb 8, 2026

@liorf95 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

Comment thread tests/ut/test_diskann.cc
constexpr uint32_t kK = 10;
constexpr uint32_t kK = 64;
#ifdef KNOWHERE_WITH_CUVS
constexpr uint32_t defaultMaxDegree = 64;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a difference for the value of defaultMaxDegree for CUVS and non-CUVS versions for the testing? Any particular reasons?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of max_degree for diskann based indexes is 56, so we did not want to modify UT for those basic scenarios.
However, cuVS related limitations demand max_degree to be a power of 2 (e.g., 32, 64, 128, etc.)

Comment thread tests/ut/test_diskann.cc
constexpr uint32_t kLargeDim = 256;
constexpr uint32_t kK = 10;
constexpr uint32_t kK = 64;
#ifdef KNOWHERE_WITH_CUVS
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the requirements for the GPU memory for a cuVS-based version of this test? It would be nice to have it as a comment in the beginning of the file, including a minimum sifficient GPU, if appropriate. Something like This test is expected to run on a GPU with at least 2 GB of RAM, such as NVIDIA A10 or NVIDIA T4

diskann::get_bin_metadata(base_file, base_num, base_dim);
#ifdef KNOWHERE_WITH_CUVS
raft::device_resources dev_resources;
if(compareMetric == diskann::L2 && is_gpu_available()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what is going to happen if (compareMetric != diskann::L2)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the cuvs Vamana build only supports the L2Expanded metric, so that the index will be built with cpu as usual.

" Gib free memory out of " << gpu_total_mem/(1024*1024*1024L) << " Gib total";
ram_budget = std::min<double>(ram_budget,(double)0.9*(gpu_free_mem/(1024*1024*1024)));
shard_r = std::max<uint32_t>((uint32_t)32,(uint32_t)R/2);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
    // add to log why GPU is not going to be used
}

bool built_with_gpu=false;
#ifdef KNOWHERE_WITH_CUVS
//currently cuvs vamana build only supports L2Expanded metric
if (compareMetric == diskann::L2 && is_gpu_available() &&
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the following calls imply that if the metric is right, the library is built with #define KNOWHERE_WITH_CUVS and is_gpu_available(), then the code will ALWAYS be built using a GPU.
Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value is true. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the following calls imply that if the metric is right, the library is built with #define KNOWHERE_WITH_CUVS and is_gpu_available(), then the code will ALWAYS be built using a GPU. Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value is true. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.

you can already control that with env varibale NVIDIA_VISIBLE_DEVICES - set it to none and the index will be built with CPU

1);
}
built_with_gpu=true;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
     // add to log why GPU is not going to be used
}

Same comment to other possible else branches, if appropriate.

Comment thread thirdparty/DiskANN/src/aux_utils.cpp Outdated
#ifdef KNOWHERE_WITH_CUVS
if(is_gpu_available()) {
if (R != 32 && R != 64 && R != 128) {
LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG_KNOWHERE_ERROR_ << "Invalid R value (" << R << ") for cuvs - should be only 32 or 64 or 128";

Same comment applies for the following lines as well

int iters);
bool is_gpu_available();

void kmeans_gpu(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, I would prefer FAISS based names here. So, that would be train (kmeans_gpu) and assign (predict_gpu).
Alternatively, please add comments on what these functions do, because it looks confusing for those who are not aware.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will add comments on what these functions do.

raft::resources res;
LOG_KNOWHERE_INFO_ << "Running k-means with " << k << " clusters...using GPU!";
kmeans_gpu(res,train_data_float.get(), num_train, train_dim,
k, 10, centroids.data());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to extract 10 (number of iterations?) as a constexpr value nearly, please? Same for the CPU branch


void gpu_get_mem_info(raft::resources &dev_resources, size_t &gpu_free_mem,size_t &gpu_total_mem);

#endif
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about selecting the right GPU to deal with? It is expected to be done by a caller using raft::resources object, correct?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about selecting the right GPU to deal with? It is expected to be done by a caller using raft::resources object, correct?

The current implementation doesnt handle multi GPU logic.IT is assumed that each process use one GPU. however with multi GPU platform in k8s with milvus you can assign each datanode a different GPU exclusively and accelerate the build further.

@alexanderguzhva
Copy link
Copy Markdown
Collaborator

/kind improvement

@alexanderguzhva
Copy link
Copy Markdown
Collaborator

issue: #1422

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
@mergify mergify Bot removed the ci-passed label Feb 10, 2026
Comment thread thirdparty/DiskANN/src/aux_utils.cpp Outdated
LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128";
return -1;
}
if (L != 32 && L != 64 && L != 128 && L != 256) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the specific reason to choose the seven magic numbers? can it be more flexible?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with cuvs 25.10 which is the version used, cuvs has validation for visted size which must be a power of 2.
This list is reasnoble considering this constraint as we see that higher L than 256 doesnt benfit too much

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe restrict it to 2^k with a larger boundary like L \in [32, 2048] instead of some magic number is better. maybe some larger or harder datasets or in higher recall requirement need higher parameter :)

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
@mergify mergify Bot added the ci-passed label Feb 18, 2026
@mergify mergify Bot added ci-passed and removed ci-passed labels Feb 22, 2026
@alexanderguzhva
Copy link
Copy Markdown
Collaborator

/lgtm

@alexanderguzhva
Copy link
Copy Markdown
Collaborator

issue: #1422
/kind improvement

@alexanderguzhva
Copy link
Copy Markdown
Collaborator

/approve

@sre-ci-robot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderguzhva, liorf95

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 20b6dce into zilliztech:main Mar 3, 2026
11 checks passed
raft::copy(dataset.data_handle(), pinned_data, total,
raft::resource::get_cuda_stream(dev_resources));

cudaFreeHost(pinned_data);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raft::copy is async on the stream. cudaFreeHost here frees pinned memory while DMA is still reading from it — use-after-free. kmeans_gpu and brute_force_gpu have the same issue (async copy to host, then immediate return without sync). predict_gpu in this same file does it correctly. Add raft::resource::sync_stream(dev_resources) before freeing/returning in all three functions.

knowhere::WaitAllSuccess(futures);
#ifdef KNOWHERE_WITH_CUVS
// GPU path
raft::resources res;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raft::resources is created once in the outer scope and captured by reference into multiple thread pool tasks. It holds a CUDA stream, cuBLAS handle, and workspace allocator — none are thread-safe. Concurrent predict_gpu calls will interleave operations on the same stream and corrupt allocator state. Either serialize the GPU path (like generate_pq_pivots already does for its GPU loop), or create a separate raft::resources per task.

const int* deg_size = std::find(std::begin(DEGREE_SIZES), std::end(DEGREE_SIZES), R);
if (deg_size == std::end(DEGREE_SIZES)) {
LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be power of 2 and maximum 256";
return -1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R=56 (a common DiskANN default) will return -1 here if a GPU is present. GPU acceleration is an optimization — it shouldn't break existing configs. Consider falling back to CPU build with a warning instead of failing, e.g. set a use_gpu = false flag when params don't meet cuVS constraints.

Comment thread tests/ut/test_diskann.cc
constexpr uint32_t kDim = 128;
constexpr uint32_t kLargeDim = 256;
constexpr uint32_t kK = 10;
constexpr uint32_t kK = 64;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kK changed from 10→64 and search_list_size from 36→64 globally, affecting CPU-only test paths too. Was this to accommodate GPU-built graphs needing wider search, or just to match the new kK? If the former, consider keeping separate params for GPU/CPU tests to avoid masking potential recall differences.

@liorf95
Copy link
Copy Markdown
Contributor Author

liorf95 commented Mar 3, 2026

Since it was all already merged into main, I will handle all the above new comments on a new PR from a new forked branch.

@liorf95
Copy link
Copy Markdown
Contributor Author

liorf95 commented Mar 4, 2026

All the above new review comments were addressed in the new PR #1481.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants