Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions c/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ add_library(
cuvs_c SHARED
src/core/c_api.cpp
src/cluster/kmeans.cpp
$<$<BOOL:${BUILD_MG_ALGOS}>:src/cluster/mg_kmeans.cpp>
src/neighbors/brute_force.cpp
src/neighbors/ivf_flat.cpp
src/neighbors/ivf_pq.cpp
Expand Down
92 changes: 92 additions & 0 deletions c/include/cuvs/cluster/mg_kmeans.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/

#pragma once

#include <cuvs/cluster/kmeans.h>
#include <cuvs/core/c_api.h>
#include <dlpack/dlpack.h>
#include <stdint.h>

#include <cuvs/core/export.h>

#ifdef __cplusplus
extern "C" {
#endif

/**
* @defgroup mg_kmeans_c Multi-GPU k-means clustering APIs
* @{
*/

/**
* @brief Find clusters with single-node multi-GPU k-means using host data.
*
* X, sample_weight, and centroids must be host-accessible, row-major,
* C-contiguous DLPack tensors. X and centroids must have dtype float32 or
* float64, and sample_weight must match X when provided.
*
* @note In cuVS 26.08 (next ABI major version) this signature will be
* replaced by cuvsMultiGpuKMeansFit_v2.
*
* @param[in] res cuvsMultiGpuResources_t opaque C handle
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
Comment on lines +34 to +36
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Documentation refers to cuvsMultiGpuResources_t but parameter is cuvsResources_t.

The Doxygen comment at line 34 mentions cuvsMultiGpuResources_t as the expected handle type, but the actual parameter type in the function signature (line 50) is cuvsResources_t. While the implementation validates that the handle is a multi-GPU resource, the documentation could cause confusion for API consumers.

📝 Suggested documentation fix
-* `@param`[in]     res           cuvsMultiGpuResources_t opaque C handle
+* `@param`[in]     res           cuvsResources_t opaque C handle (must be a multi-GPU resource)
                               created by cuvsMultiGpuResourcesCreate or
                               cuvsMultiGpuResourcesCreateWithDeviceIds.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* @param[in] res cuvsMultiGpuResources_t opaque C handle
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
* `@param`[in] res cuvsResources_t opaque C handle (must be a multi-GPU resource)
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/include/cuvs/cluster/mg_kmeans.h` around lines 34 - 36, Update the Doxygen
for the function in mg_kmeans.h so the documented parameter type matches the
actual signature: replace references to cuvsMultiGpuResources_t with
cuvsResources_t and add a short note that the cuvsResources_t must represent a
multi-GPU resource created by cuvsMultiGpuResourcesCreate or
cuvsMultiGpuResourcesCreateWithDeviceIds (the implementation already validates
this). Ensure the comment references the exact symbol cuvsResources_t and
mentions the creation functions cuvsMultiGpuResourcesCreate /
cuvsMultiGpuResourcesCreateWithDeviceIds to avoid confusion.

* @param[in] params Parameters for KMeans model.
* @param[in] X Host training instances to cluster.
* [dim = n_samples x n_features]
* @param[in] sample_weight Optional host weights for each observation in X.
* [len = n_samples]
* @param[inout] centroids Host centroids. When init is Array, used as the
* initial cluster centers. The final generated
* centroids are copied back to this tensor.
* [dim = n_clusters x n_features]
* @param[out] inertia Sum of squared distances of samples to their
* closest cluster center.
* @param[out] n_iter Number of iterations run.
*/
CUVS_EXPORT cuvsError_t cuvsMultiGpuKMeansFit(cuvsResources_t res,
cuvsKMeansParams_t params,
DLManagedTensor* X,
DLManagedTensor* sample_weight,
DLManagedTensor* centroids,
double* inertia,
int* n_iter);

/**
* @brief Find clusters with single-node multi-GPU k-means (v2 params layout).
*
* Mirrors cuvsMultiGpuKMeansFit but takes cuvsKMeansParams_v2_t. Will become
* the unsuffixed cuvsMultiGpuKMeansFit in cuVS 26.08.
*
* @param[in] res cuvsMultiGpuResources_t opaque C handle.
* @param[in] params Parameters for KMeans model (v2 layout).
* @param[in] X Host training instances to cluster.
* [dim = n_samples x n_features]
* @param[in] sample_weight Optional host weights for each observation in X.
* [len = n_samples]
* @param[inout] centroids Host centroids. When init is Array, used as the
* initial cluster centers. The final generated
* centroids are copied back to this tensor.
* [dim = n_clusters x n_features]
* @param[out] inertia Sum of squared distances of samples to their
* closest cluster center.
* @param[out] n_iter Number of iterations run.
*/
CUVS_EXPORT cuvsError_t cuvsMultiGpuKMeansFit_v2(cuvsResources_t res,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need this suffix. Can add breaking changes in this release.

cuvsKMeansParams_v2_t params,
DLManagedTensor* X,
DLManagedTensor* sample_weight,
DLManagedTensor* centroids,
double* inertia,
int* n_iter);

/**
* @}
*/

#ifdef __cplusplus
}
#endif
1 change: 1 addition & 0 deletions c/include/cuvs/core/all.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#endif

#ifdef CUVS_BUILD_MG_ALGOS
#include <cuvs/cluster/mg_kmeans.h>
#include <cuvs/neighbors/mg_cagra.h>
#include <cuvs/neighbors/mg_common.h>
#include <cuvs/neighbors/mg_ivf_flat.h>
Expand Down
223 changes: 223 additions & 0 deletions c/src/cluster/mg_kmeans.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/

#include <cstdint>
#include <optional>

#include <dlpack/dlpack.h>

#include <cuvs/cluster/kmeans.hpp>
#include <cuvs/cluster/mg_kmeans.h>
#include <cuvs/core/c_api.h>

#include <raft/core/device_mdarray.hpp>
#include <raft/core/resource/cuda_stream.hpp>
#include <raft/core/resource/multi_gpu.hpp>
#include <raft/core/resources.hpp>
#include <raft/util/cudart_utils.hpp>

#include "../core/exceptions.hpp"
#include "../core/interop.hpp"

namespace {

template <typename ParamsT>
cuvs::cluster::kmeans::params convert_params(const ParamsT& params)
{
auto kmeans_params = cuvs::cluster::kmeans::params();
kmeans_params.metric = static_cast<cuvs::distance::DistanceType>(params.metric);
kmeans_params.init = static_cast<cuvs::cluster::kmeans::params::InitMethod>(params.init);
kmeans_params.n_clusters = params.n_clusters;
kmeans_params.max_iter = params.max_iter;
kmeans_params.tol = params.tol;
kmeans_params.n_init = params.n_init;
kmeans_params.oversampling_factor = params.oversampling_factor;
kmeans_params.batch_samples = params.batch_samples;
kmeans_params.batch_centroids = params.batch_centroids;
kmeans_params.init_size = params.init_size;
kmeans_params.streaming_batch_size = params.streaming_batch_size;
return kmeans_params;
}

void validate_host_tensor(DLManagedTensor* tensor, const char* name)
{
RAFT_EXPECTS(tensor != nullptr, "%s must not be NULL", name);
auto dl_tensor = tensor->dl_tensor;
RAFT_EXPECTS(dl_tensor.data != nullptr, "%s data must not be NULL", name);
RAFT_EXPECTS(dl_tensor.shape != nullptr, "%s shape must not be NULL", name);
RAFT_EXPECTS(
cuvs::core::is_dlpack_host_compatible(dl_tensor), "%s must be host accessible", name);
RAFT_EXPECTS(dl_tensor.device.device_type != kDLCUDA, "%s must reside in host memory", name);
RAFT_EXPECTS(cuvs::core::is_c_contiguous(tensor), "%s must be C-contiguous", name);
}

bool dtype_equal(const DLTensor& lhs, const DLTensor& rhs)
{
return lhs.dtype.code == rhs.dtype.code && lhs.dtype.bits == rhs.dtype.bits &&
lhs.dtype.lanes == rhs.dtype.lanes;
}

void validate_float_dtype(const DLTensor& tensor, const char* name)
{
RAFT_EXPECTS(
tensor.dtype.code == kDLFloat && (tensor.dtype.bits == 32 || tensor.dtype.bits == 64),
"%s must have dtype float32 or float64",
name);
RAFT_EXPECTS(tensor.dtype.lanes == 1, "%s must have one DLPack lane", name);
}

template <typename ParamsT>
void validate_inputs(const ParamsT& params,
DLManagedTensor* X_tensor,
DLManagedTensor* sample_weight_tensor,
DLManagedTensor* centroids_tensor)
{
RAFT_EXPECTS(params.n_clusters > 0, "n_clusters must be positive");
RAFT_EXPECTS(!params.hierarchical, "hierarchical kmeans is not supported by SNMG kmeans");

validate_host_tensor(X_tensor, "X");
validate_host_tensor(centroids_tensor, "centroids");

auto X = X_tensor->dl_tensor;
auto centroids = centroids_tensor->dl_tensor;

RAFT_EXPECTS(X.ndim == 2, "X must be a 2D tensor");
RAFT_EXPECTS(centroids.ndim == 2, "centroids must be a 2D tensor");
RAFT_EXPECTS(X.shape[0] > 0, "X must have at least one row");
RAFT_EXPECTS(X.shape[1] > 0, "X must have at least one column");
RAFT_EXPECTS(centroids.shape[0] == params.n_clusters,
"centroids row count must equal n_clusters");
RAFT_EXPECTS(centroids.shape[1] == X.shape[1],
"centroids column count must equal X column count");

validate_float_dtype(X, "X");
RAFT_EXPECTS(dtype_equal(X, centroids), "centroids dtype must match X dtype");

if (sample_weight_tensor != nullptr) {
validate_host_tensor(sample_weight_tensor, "sample_weight");
auto sample_weight = sample_weight_tensor->dl_tensor;
RAFT_EXPECTS(sample_weight.ndim == 1, "sample_weight must be a 1D tensor");
RAFT_EXPECTS(sample_weight.shape[0] == X.shape[0],
"sample_weight length must equal X row count");
RAFT_EXPECTS(dtype_equal(X, sample_weight), "sample_weight dtype must match X dtype");
}
}

template <typename T, typename ParamsT, typename IdxT = int64_t>
void fit_snmg(cuvsResources_t res,
const ParamsT& params,
DLManagedTensor* X_tensor,
DLManagedTensor* sample_weight_tensor,
DLManagedTensor* centroids_tensor,
double* inertia,
int* n_iter)
{
auto res_ptr = reinterpret_cast<raft::resources*>(res);
RAFT_EXPECTS(res_ptr != nullptr, "res must not be NULL");
RAFT_EXPECTS(raft::resource::is_multi_gpu(*res_ptr),
"cuvsMultiGpuKMeansFit requires a MultiGpuResources handle");

auto X = X_tensor->dl_tensor;
auto centroids = centroids_tensor->dl_tensor;

auto n_samples = static_cast<IdxT>(X.shape[0]);
auto n_features = static_cast<IdxT>(X.shape[1]);
auto n_clusters = static_cast<IdxT>(params.n_clusters);

auto X_view = raft::make_host_matrix_view<T const, IdxT>(
reinterpret_cast<T const*>(X.data), n_samples, n_features);

std::optional<raft::host_vector_view<T const, IdxT>> sample_weight;
if (sample_weight_tensor != nullptr) {
auto sw = sample_weight_tensor->dl_tensor;
sample_weight =
raft::make_host_vector_view<T const, IdxT>(reinterpret_cast<T const*>(sw.data), n_samples);
}

auto const& rank0_res = raft::resource::set_current_device_to_rank(*res_ptr, 0);
auto stream = raft::resource::get_cuda_stream(rank0_res);
auto d_centroids = raft::make_device_matrix<T, IdxT>(rank0_res, n_clusters, n_features);
auto n_centroid_values = n_clusters * n_features;

if (params.init == Array) {
raft::update_device(d_centroids.data_handle(),
reinterpret_cast<T const*>(centroids.data),
n_centroid_values,
stream);
raft::resource::sync_stream(rank0_res, stream);
}

T inertia_temp = T{0};
IdxT n_iter_temp = IdxT{0};
auto kmeans_params = convert_params(params);
cuvs::cluster::kmeans::fit(*res_ptr,
kmeans_params,
X_view,
sample_weight,
d_centroids.view(),
raft::make_host_scalar_view<T>(&inertia_temp),
raft::make_host_scalar_view<IdxT>(&n_iter_temp));

raft::update_host(
reinterpret_cast<T*>(centroids.data), d_centroids.data_handle(), n_centroid_values, stream);
raft::resource::sync_stream(rank0_res, stream);

*inertia = static_cast<double>(inertia_temp);
*n_iter = static_cast<int>(n_iter_temp);
}

template <typename ParamsT>
void dispatch_fit(cuvsResources_t res,
ParamsT params,
DLManagedTensor* X,
DLManagedTensor* sample_weight,
DLManagedTensor* centroids,
double* inertia,
int* n_iter)
{
RAFT_EXPECTS(res != 0, "res must not be NULL");
RAFT_EXPECTS(params != nullptr, "params must not be NULL");
RAFT_EXPECTS(inertia != nullptr, "inertia must not be NULL");
RAFT_EXPECTS(n_iter != nullptr, "n_iter must not be NULL");

validate_inputs(*params, X, sample_weight, centroids);

auto dataset = X->dl_tensor;
if (dataset.dtype.code == kDLFloat && dataset.dtype.bits == 32) {
fit_snmg<float>(res, *params, X, sample_weight, centroids, inertia, n_iter);
} else if (dataset.dtype.code == kDLFloat && dataset.dtype.bits == 64) {
fit_snmg<double>(res, *params, X, sample_weight, centroids, inertia, n_iter);
} else {
RAFT_FAIL("Unsupported dataset DLtensor dtype: %d and bits: %d",
dataset.dtype.code,
dataset.dtype.bits);
}
}

} // namespace

extern "C" cuvsError_t cuvsMultiGpuKMeansFit(cuvsResources_t res,
cuvsKMeansParams_t params,
DLManagedTensor* X,
DLManagedTensor* sample_weight,
DLManagedTensor* centroids,
double* inertia,
int* n_iter)
{
return cuvs::core::translate_exceptions(
[=] { dispatch_fit(res, params, X, sample_weight, centroids, inertia, n_iter); });
}

extern "C" cuvsError_t cuvsMultiGpuKMeansFit_v2(cuvsResources_t res,
cuvsKMeansParams_v2_t params,
DLManagedTensor* X,
DLManagedTensor* sample_weight,
DLManagedTensor* centroids,
double* inertia,
int* n_iter)
{
return cuvs::core::translate_exceptions(
[=] { dispatch_fit(res, params, X, sample_weight, centroids, inertia, n_iter); });
}
3 changes: 3 additions & 0 deletions c/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ ConfigureTest(
NAME DISTANCE_C_TEST PATH distance/run_pairwise_distance_c.c distance/pairwise_distance_c.cu
)
ConfigureTest(NAME KMEANS_C_TEST PATH cluster/kmeans_c.cu)
if(BUILD_MG_ALGOS)
ConfigureTest(NAME KMEANS_MG_C_TEST PATH cluster/kmeans_mg_c.cu)
endif()
ConfigureTest(NAME BRUTEFORCE_C_TEST PATH neighbors/run_brute_force_c.c neighbors/brute_force_c.cu)
ConfigureTest(NAME IVF_FLAT_C_TEST PATH neighbors/run_ivf_flat_c.c neighbors/ann_ivf_flat_c.cu)
ConfigureTest(NAME IVF_PQ_C_TEST PATH neighbors/run_ivf_pq_c.c neighbors/ann_ivf_pq_c.cu)
Expand Down
Loading
Loading