-
Notifications
You must be signed in to change notification settings - Fork 194
SNMG Batched KMeans Python API #2154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
viclafargue
wants to merge
3
commits into
rapidsai:main
Choose a base branch
from
viclafargue:snmg-ooc-kmeans-python-api
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| /* | ||
| * SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| #include <cuvs/cluster/kmeans.h> | ||
| #include <cuvs/core/c_api.h> | ||
| #include <dlpack/dlpack.h> | ||
| #include <stdint.h> | ||
|
|
||
| #include <cuvs/core/export.h> | ||
|
|
||
| #ifdef __cplusplus | ||
| extern "C" { | ||
| #endif | ||
|
|
||
| /** | ||
| * @defgroup mg_kmeans_c Multi-GPU k-means clustering APIs | ||
| * @{ | ||
| */ | ||
|
|
||
| /** | ||
| * @brief Find clusters with single-node multi-GPU k-means using host data. | ||
| * | ||
| * X, sample_weight, and centroids must be host-accessible, row-major, | ||
| * C-contiguous DLPack tensors. X and centroids must have dtype float32 or | ||
| * float64, and sample_weight must match X when provided. | ||
| * | ||
| * @note In cuVS 26.08 (next ABI major version) this signature will be | ||
| * replaced by cuvsMultiGpuKMeansFit_v2. | ||
| * | ||
| * @param[in] res cuvsMultiGpuResources_t opaque C handle | ||
| * created by cuvsMultiGpuResourcesCreate or | ||
| * cuvsMultiGpuResourcesCreateWithDeviceIds. | ||
| * @param[in] params Parameters for KMeans model. | ||
| * @param[in] X Host training instances to cluster. | ||
| * [dim = n_samples x n_features] | ||
| * @param[in] sample_weight Optional host weights for each observation in X. | ||
| * [len = n_samples] | ||
| * @param[inout] centroids Host centroids. When init is Array, used as the | ||
| * initial cluster centers. The final generated | ||
| * centroids are copied back to this tensor. | ||
| * [dim = n_clusters x n_features] | ||
| * @param[out] inertia Sum of squared distances of samples to their | ||
| * closest cluster center. | ||
| * @param[out] n_iter Number of iterations run. | ||
| */ | ||
| CUVS_EXPORT cuvsError_t cuvsMultiGpuKMeansFit(cuvsResources_t res, | ||
| cuvsKMeansParams_t params, | ||
| DLManagedTensor* X, | ||
| DLManagedTensor* sample_weight, | ||
| DLManagedTensor* centroids, | ||
| double* inertia, | ||
| int* n_iter); | ||
|
|
||
| /** | ||
| * @brief Find clusters with single-node multi-GPU k-means (v2 params layout). | ||
| * | ||
| * Mirrors cuvsMultiGpuKMeansFit but takes cuvsKMeansParams_v2_t. Will become | ||
| * the unsuffixed cuvsMultiGpuKMeansFit in cuVS 26.08. | ||
| * | ||
| * @param[in] res cuvsMultiGpuResources_t opaque C handle. | ||
| * @param[in] params Parameters for KMeans model (v2 layout). | ||
| * @param[in] X Host training instances to cluster. | ||
| * [dim = n_samples x n_features] | ||
| * @param[in] sample_weight Optional host weights for each observation in X. | ||
| * [len = n_samples] | ||
| * @param[inout] centroids Host centroids. When init is Array, used as the | ||
| * initial cluster centers. The final generated | ||
| * centroids are copied back to this tensor. | ||
| * [dim = n_clusters x n_features] | ||
| * @param[out] inertia Sum of squared distances of samples to their | ||
| * closest cluster center. | ||
| * @param[out] n_iter Number of iterations run. | ||
| */ | ||
| CUVS_EXPORT cuvsError_t cuvsMultiGpuKMeansFit_v2(cuvsResources_t res, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we dont need this suffix. Can add breaking changes in this release. |
||
| cuvsKMeansParams_v2_t params, | ||
| DLManagedTensor* X, | ||
| DLManagedTensor* sample_weight, | ||
| DLManagedTensor* centroids, | ||
| double* inertia, | ||
| int* n_iter); | ||
|
|
||
| /** | ||
| * @} | ||
| */ | ||
|
|
||
| #ifdef __cplusplus | ||
| } | ||
| #endif | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,223 @@ | ||
| /* | ||
| * SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| #include <cstdint> | ||
| #include <optional> | ||
|
|
||
| #include <dlpack/dlpack.h> | ||
|
|
||
| #include <cuvs/cluster/kmeans.hpp> | ||
| #include <cuvs/cluster/mg_kmeans.h> | ||
| #include <cuvs/core/c_api.h> | ||
|
|
||
| #include <raft/core/device_mdarray.hpp> | ||
| #include <raft/core/resource/cuda_stream.hpp> | ||
| #include <raft/core/resource/multi_gpu.hpp> | ||
| #include <raft/core/resources.hpp> | ||
| #include <raft/util/cudart_utils.hpp> | ||
|
|
||
| #include "../core/exceptions.hpp" | ||
| #include "../core/interop.hpp" | ||
|
|
||
| namespace { | ||
|
|
||
| template <typename ParamsT> | ||
| cuvs::cluster::kmeans::params convert_params(const ParamsT& params) | ||
| { | ||
| auto kmeans_params = cuvs::cluster::kmeans::params(); | ||
| kmeans_params.metric = static_cast<cuvs::distance::DistanceType>(params.metric); | ||
| kmeans_params.init = static_cast<cuvs::cluster::kmeans::params::InitMethod>(params.init); | ||
| kmeans_params.n_clusters = params.n_clusters; | ||
| kmeans_params.max_iter = params.max_iter; | ||
| kmeans_params.tol = params.tol; | ||
| kmeans_params.n_init = params.n_init; | ||
| kmeans_params.oversampling_factor = params.oversampling_factor; | ||
| kmeans_params.batch_samples = params.batch_samples; | ||
| kmeans_params.batch_centroids = params.batch_centroids; | ||
| kmeans_params.init_size = params.init_size; | ||
| kmeans_params.streaming_batch_size = params.streaming_batch_size; | ||
| return kmeans_params; | ||
| } | ||
|
|
||
| void validate_host_tensor(DLManagedTensor* tensor, const char* name) | ||
| { | ||
| RAFT_EXPECTS(tensor != nullptr, "%s must not be NULL", name); | ||
| auto dl_tensor = tensor->dl_tensor; | ||
| RAFT_EXPECTS(dl_tensor.data != nullptr, "%s data must not be NULL", name); | ||
| RAFT_EXPECTS(dl_tensor.shape != nullptr, "%s shape must not be NULL", name); | ||
| RAFT_EXPECTS( | ||
| cuvs::core::is_dlpack_host_compatible(dl_tensor), "%s must be host accessible", name); | ||
| RAFT_EXPECTS(dl_tensor.device.device_type != kDLCUDA, "%s must reside in host memory", name); | ||
| RAFT_EXPECTS(cuvs::core::is_c_contiguous(tensor), "%s must be C-contiguous", name); | ||
| } | ||
|
|
||
| bool dtype_equal(const DLTensor& lhs, const DLTensor& rhs) | ||
| { | ||
| return lhs.dtype.code == rhs.dtype.code && lhs.dtype.bits == rhs.dtype.bits && | ||
| lhs.dtype.lanes == rhs.dtype.lanes; | ||
| } | ||
|
|
||
| void validate_float_dtype(const DLTensor& tensor, const char* name) | ||
| { | ||
| RAFT_EXPECTS( | ||
| tensor.dtype.code == kDLFloat && (tensor.dtype.bits == 32 || tensor.dtype.bits == 64), | ||
| "%s must have dtype float32 or float64", | ||
| name); | ||
| RAFT_EXPECTS(tensor.dtype.lanes == 1, "%s must have one DLPack lane", name); | ||
| } | ||
|
|
||
| template <typename ParamsT> | ||
| void validate_inputs(const ParamsT& params, | ||
| DLManagedTensor* X_tensor, | ||
| DLManagedTensor* sample_weight_tensor, | ||
| DLManagedTensor* centroids_tensor) | ||
| { | ||
| RAFT_EXPECTS(params.n_clusters > 0, "n_clusters must be positive"); | ||
| RAFT_EXPECTS(!params.hierarchical, "hierarchical kmeans is not supported by SNMG kmeans"); | ||
|
|
||
| validate_host_tensor(X_tensor, "X"); | ||
| validate_host_tensor(centroids_tensor, "centroids"); | ||
|
|
||
| auto X = X_tensor->dl_tensor; | ||
| auto centroids = centroids_tensor->dl_tensor; | ||
|
|
||
| RAFT_EXPECTS(X.ndim == 2, "X must be a 2D tensor"); | ||
| RAFT_EXPECTS(centroids.ndim == 2, "centroids must be a 2D tensor"); | ||
| RAFT_EXPECTS(X.shape[0] > 0, "X must have at least one row"); | ||
| RAFT_EXPECTS(X.shape[1] > 0, "X must have at least one column"); | ||
| RAFT_EXPECTS(centroids.shape[0] == params.n_clusters, | ||
| "centroids row count must equal n_clusters"); | ||
| RAFT_EXPECTS(centroids.shape[1] == X.shape[1], | ||
| "centroids column count must equal X column count"); | ||
|
|
||
| validate_float_dtype(X, "X"); | ||
| RAFT_EXPECTS(dtype_equal(X, centroids), "centroids dtype must match X dtype"); | ||
|
|
||
| if (sample_weight_tensor != nullptr) { | ||
| validate_host_tensor(sample_weight_tensor, "sample_weight"); | ||
| auto sample_weight = sample_weight_tensor->dl_tensor; | ||
| RAFT_EXPECTS(sample_weight.ndim == 1, "sample_weight must be a 1D tensor"); | ||
| RAFT_EXPECTS(sample_weight.shape[0] == X.shape[0], | ||
| "sample_weight length must equal X row count"); | ||
| RAFT_EXPECTS(dtype_equal(X, sample_weight), "sample_weight dtype must match X dtype"); | ||
| } | ||
| } | ||
|
|
||
| template <typename T, typename ParamsT, typename IdxT = int64_t> | ||
| void fit_snmg(cuvsResources_t res, | ||
| const ParamsT& params, | ||
| DLManagedTensor* X_tensor, | ||
| DLManagedTensor* sample_weight_tensor, | ||
| DLManagedTensor* centroids_tensor, | ||
| double* inertia, | ||
| int* n_iter) | ||
| { | ||
| auto res_ptr = reinterpret_cast<raft::resources*>(res); | ||
| RAFT_EXPECTS(res_ptr != nullptr, "res must not be NULL"); | ||
| RAFT_EXPECTS(raft::resource::is_multi_gpu(*res_ptr), | ||
| "cuvsMultiGpuKMeansFit requires a MultiGpuResources handle"); | ||
|
|
||
| auto X = X_tensor->dl_tensor; | ||
| auto centroids = centroids_tensor->dl_tensor; | ||
|
|
||
| auto n_samples = static_cast<IdxT>(X.shape[0]); | ||
| auto n_features = static_cast<IdxT>(X.shape[1]); | ||
| auto n_clusters = static_cast<IdxT>(params.n_clusters); | ||
|
|
||
| auto X_view = raft::make_host_matrix_view<T const, IdxT>( | ||
| reinterpret_cast<T const*>(X.data), n_samples, n_features); | ||
|
|
||
| std::optional<raft::host_vector_view<T const, IdxT>> sample_weight; | ||
| if (sample_weight_tensor != nullptr) { | ||
| auto sw = sample_weight_tensor->dl_tensor; | ||
| sample_weight = | ||
| raft::make_host_vector_view<T const, IdxT>(reinterpret_cast<T const*>(sw.data), n_samples); | ||
| } | ||
|
|
||
| auto const& rank0_res = raft::resource::set_current_device_to_rank(*res_ptr, 0); | ||
| auto stream = raft::resource::get_cuda_stream(rank0_res); | ||
| auto d_centroids = raft::make_device_matrix<T, IdxT>(rank0_res, n_clusters, n_features); | ||
| auto n_centroid_values = n_clusters * n_features; | ||
|
|
||
| if (params.init == Array) { | ||
| raft::update_device(d_centroids.data_handle(), | ||
| reinterpret_cast<T const*>(centroids.data), | ||
| n_centroid_values, | ||
| stream); | ||
| raft::resource::sync_stream(rank0_res, stream); | ||
| } | ||
|
|
||
| T inertia_temp = T{0}; | ||
| IdxT n_iter_temp = IdxT{0}; | ||
| auto kmeans_params = convert_params(params); | ||
| cuvs::cluster::kmeans::fit(*res_ptr, | ||
| kmeans_params, | ||
| X_view, | ||
| sample_weight, | ||
| d_centroids.view(), | ||
| raft::make_host_scalar_view<T>(&inertia_temp), | ||
| raft::make_host_scalar_view<IdxT>(&n_iter_temp)); | ||
|
|
||
| raft::update_host( | ||
| reinterpret_cast<T*>(centroids.data), d_centroids.data_handle(), n_centroid_values, stream); | ||
| raft::resource::sync_stream(rank0_res, stream); | ||
|
|
||
| *inertia = static_cast<double>(inertia_temp); | ||
| *n_iter = static_cast<int>(n_iter_temp); | ||
| } | ||
|
|
||
| template <typename ParamsT> | ||
| void dispatch_fit(cuvsResources_t res, | ||
| ParamsT params, | ||
| DLManagedTensor* X, | ||
| DLManagedTensor* sample_weight, | ||
| DLManagedTensor* centroids, | ||
| double* inertia, | ||
| int* n_iter) | ||
| { | ||
| RAFT_EXPECTS(res != 0, "res must not be NULL"); | ||
| RAFT_EXPECTS(params != nullptr, "params must not be NULL"); | ||
| RAFT_EXPECTS(inertia != nullptr, "inertia must not be NULL"); | ||
| RAFT_EXPECTS(n_iter != nullptr, "n_iter must not be NULL"); | ||
|
|
||
| validate_inputs(*params, X, sample_weight, centroids); | ||
|
|
||
| auto dataset = X->dl_tensor; | ||
| if (dataset.dtype.code == kDLFloat && dataset.dtype.bits == 32) { | ||
| fit_snmg<float>(res, *params, X, sample_weight, centroids, inertia, n_iter); | ||
| } else if (dataset.dtype.code == kDLFloat && dataset.dtype.bits == 64) { | ||
| fit_snmg<double>(res, *params, X, sample_weight, centroids, inertia, n_iter); | ||
| } else { | ||
| RAFT_FAIL("Unsupported dataset DLtensor dtype: %d and bits: %d", | ||
| dataset.dtype.code, | ||
| dataset.dtype.bits); | ||
| } | ||
| } | ||
|
|
||
| } // namespace | ||
|
|
||
| extern "C" cuvsError_t cuvsMultiGpuKMeansFit(cuvsResources_t res, | ||
| cuvsKMeansParams_t params, | ||
| DLManagedTensor* X, | ||
| DLManagedTensor* sample_weight, | ||
| DLManagedTensor* centroids, | ||
| double* inertia, | ||
| int* n_iter) | ||
| { | ||
| return cuvs::core::translate_exceptions( | ||
| [=] { dispatch_fit(res, params, X, sample_weight, centroids, inertia, n_iter); }); | ||
| } | ||
|
|
||
| extern "C" cuvsError_t cuvsMultiGpuKMeansFit_v2(cuvsResources_t res, | ||
| cuvsKMeansParams_v2_t params, | ||
| DLManagedTensor* X, | ||
| DLManagedTensor* sample_weight, | ||
| DLManagedTensor* centroids, | ||
| double* inertia, | ||
| int* n_iter) | ||
| { | ||
| return cuvs::core::translate_exceptions( | ||
| [=] { dispatch_fit(res, params, X, sample_weight, centroids, inertia, n_iter); }); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation refers to
cuvsMultiGpuResources_tbut parameter iscuvsResources_t.The Doxygen comment at line 34 mentions
cuvsMultiGpuResources_tas the expected handle type, but the actual parameter type in the function signature (line 50) iscuvsResources_t. While the implementation validates that the handle is a multi-GPU resource, the documentation could cause confusion for API consumers.📝 Suggested documentation fix
📝 Committable suggestion
🤖 Prompt for AI Agents