feat: GPU support for building DISKANN based indexes.#1460
feat: GPU support for building DISKANN based indexes.#1460sre-ci-robot merged 5 commits intozilliztech:mainfrom
Conversation
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
|
@liorf95 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
| constexpr uint32_t kK = 10; | ||
| constexpr uint32_t kK = 64; | ||
| #ifdef KNOWHERE_WITH_CUVS | ||
| constexpr uint32_t defaultMaxDegree = 64; |
There was a problem hiding this comment.
why is there a difference for the value of defaultMaxDegree for CUVS and non-CUVS versions for the testing? Any particular reasons?
There was a problem hiding this comment.
The default value of max_degree for diskann based indexes is 56, so we did not want to modify UT for those basic scenarios.
However, cuVS related limitations demand max_degree to be a power of 2 (e.g., 32, 64, 128, etc.)
| constexpr uint32_t kLargeDim = 256; | ||
| constexpr uint32_t kK = 10; | ||
| constexpr uint32_t kK = 64; | ||
| #ifdef KNOWHERE_WITH_CUVS |
There was a problem hiding this comment.
what are the requirements for the GPU memory for a cuVS-based version of this test? It would be nice to have it as a comment in the beginning of the file, including a minimum sifficient GPU, if appropriate. Something like This test is expected to run on a GPU with at least 2 GB of RAM, such as NVIDIA A10 or NVIDIA T4
| diskann::get_bin_metadata(base_file, base_num, base_dim); | ||
| #ifdef KNOWHERE_WITH_CUVS | ||
| raft::device_resources dev_resources; | ||
| if(compareMetric == diskann::L2 && is_gpu_available()) { |
There was a problem hiding this comment.
and what is going to happen if (compareMetric != diskann::L2)?
There was a problem hiding this comment.
Currently, the cuvs Vamana build only supports the L2Expanded metric, so that the index will be built with cpu as usual.
| " Gib free memory out of " << gpu_total_mem/(1024*1024*1024L) << " Gib total"; | ||
| ram_budget = std::min<double>(ram_budget,(double)0.9*(gpu_free_mem/(1024*1024*1024))); | ||
| shard_r = std::max<uint32_t>((uint32_t)32,(uint32_t)R/2); | ||
| } |
There was a problem hiding this comment.
} else {
// add to log why GPU is not going to be used
}| bool built_with_gpu=false; | ||
| #ifdef KNOWHERE_WITH_CUVS | ||
| //currently cuvs vamana build only supports L2Expanded metric | ||
| if (compareMetric == diskann::L2 && is_gpu_available() && |
There was a problem hiding this comment.
this and the following calls imply that if the metric is right, the library is built with #define KNOWHERE_WITH_CUVS and is_gpu_available(), then the code will ALWAYS be built using a GPU.
Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value is true. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.
There was a problem hiding this comment.
this and the following calls imply that if the metric is right, the library is built with
#define KNOWHERE_WITH_CUVSandis_gpu_available(), then the code will ALWAYS be built using a GPU. Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value istrue. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.
you can already control that with env varibale NVIDIA_VISIBLE_DEVICES - set it to none and the index will be built with CPU
| 1); | ||
| } | ||
| built_with_gpu=true; | ||
| } |
There was a problem hiding this comment.
} else {
// add to log why GPU is not going to be used
}Same comment to other possible else branches, if appropriate.
| #ifdef KNOWHERE_WITH_CUVS | ||
| if(is_gpu_available()) { | ||
| if (R != 32 && R != 64 && R != 128) { | ||
| LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128"; |
There was a problem hiding this comment.
LOG_KNOWHERE_ERROR_ << "Invalid R value (" << R << ") for cuvs - should be only 32 or 64 or 128";Same comment applies for the following lines as well
| int iters); | ||
| bool is_gpu_available(); | ||
|
|
||
| void kmeans_gpu( |
There was a problem hiding this comment.
personally, I would prefer FAISS based names here. So, that would be train (kmeans_gpu) and assign (predict_gpu).
Alternatively, please add comments on what these functions do, because it looks confusing for those who are not aware.
There was a problem hiding this comment.
We will add comments on what these functions do.
| raft::resources res; | ||
| LOG_KNOWHERE_INFO_ << "Running k-means with " << k << " clusters...using GPU!"; | ||
| kmeans_gpu(res,train_data_float.get(), num_train, train_dim, | ||
| k, 10, centroids.data()); |
There was a problem hiding this comment.
would it be possible to extract 10 (number of iterations?) as a constexpr value nearly, please? Same for the CPU branch
|
|
||
| void gpu_get_mem_info(raft::resources &dev_resources, size_t &gpu_free_mem,size_t &gpu_total_mem); | ||
|
|
||
| #endif |
There was a problem hiding this comment.
what about selecting the right GPU to deal with? It is expected to be done by a caller using raft::resources object, correct?
There was a problem hiding this comment.
what about selecting the right GPU to deal with? It is expected to be done by a caller using
raft::resourcesobject, correct?
The current implementation doesnt handle multi GPU logic.IT is assumed that each process use one GPU. however with multi GPU platform in k8s with milvus you can assign each datanode a different GPU exclusively and accelerate the build further.
|
/kind improvement |
|
issue: #1422 |
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
| LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128"; | ||
| return -1; | ||
| } | ||
| if (L != 32 && L != 64 && L != 128 && L != 256) { |
There was a problem hiding this comment.
what is the specific reason to choose the seven magic numbers? can it be more flexible?
There was a problem hiding this comment.
with cuvs 25.10 which is the version used, cuvs has validation for visted size which must be a power of 2.
This list is reasnoble considering this constraint as we see that higher L than 256 doesnt benfit too much
There was a problem hiding this comment.
maybe restrict it to 2^k with a larger boundary like L \in [32, 2048] instead of some magic number is better. maybe some larger or harder datasets or in higher recall requirement need higher parameter :)
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com> Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
|
/lgtm |
|
issue: #1422 |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexanderguzhva, liorf95 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| raft::copy(dataset.data_handle(), pinned_data, total, | ||
| raft::resource::get_cuda_stream(dev_resources)); | ||
|
|
||
| cudaFreeHost(pinned_data); |
There was a problem hiding this comment.
raft::copy is async on the stream. cudaFreeHost here frees pinned memory while DMA is still reading from it — use-after-free. kmeans_gpu and brute_force_gpu have the same issue (async copy to host, then immediate return without sync). predict_gpu in this same file does it correctly. Add raft::resource::sync_stream(dev_resources) before freeing/returning in all three functions.
| knowhere::WaitAllSuccess(futures); | ||
| #ifdef KNOWHERE_WITH_CUVS | ||
| // GPU path | ||
| raft::resources res; |
There was a problem hiding this comment.
raft::resources is created once in the outer scope and captured by reference into multiple thread pool tasks. It holds a CUDA stream, cuBLAS handle, and workspace allocator — none are thread-safe. Concurrent predict_gpu calls will interleave operations on the same stream and corrupt allocator state. Either serialize the GPU path (like generate_pq_pivots already does for its GPU loop), or create a separate raft::resources per task.
| const int* deg_size = std::find(std::begin(DEGREE_SIZES), std::end(DEGREE_SIZES), R); | ||
| if (deg_size == std::end(DEGREE_SIZES)) { | ||
| LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be power of 2 and maximum 256"; | ||
| return -1; |
There was a problem hiding this comment.
R=56 (a common DiskANN default) will return -1 here if a GPU is present. GPU acceleration is an optimization — it shouldn't break existing configs. Consider falling back to CPU build with a warning instead of failing, e.g. set a use_gpu = false flag when params don't meet cuVS constraints.
| constexpr uint32_t kDim = 128; | ||
| constexpr uint32_t kLargeDim = 256; | ||
| constexpr uint32_t kK = 10; | ||
| constexpr uint32_t kK = 64; |
There was a problem hiding this comment.
kK changed from 10→64 and search_list_size from 36→64 globally, affecting CPU-only test paths too. Was this to accommodate GPU-built graphs needing wider search, or just to match the new kK? If the former, consider keeping separate params for GPU/CPU tests to avoid masking potential recall differences.
|
Since it was all already merged into main, I will handle all the above new comments on a new PR from a new forked branch. |
|
All the above new review comments were addressed in the new PR #1481. |
See issue #1422.