Skip to content

SNMG Batched KMeans Python API#2154

Open
viclafargue wants to merge 3 commits into
rapidsai:mainfrom
viclafargue:snmg-ooc-kmeans-python-api
Open

SNMG Batched KMeans Python API#2154
viclafargue wants to merge 3 commits into
rapidsai:mainfrom
viclafargue:snmg-ooc-kmeans-python-api

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

@viclafargue viclafargue commented Jun 2, 2026

Closes #2149 and #2155

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 90e59835-26cd-4d28-9aaf-fe30098e2774

📥 Commits

Reviewing files that changed from the base of the PR and between 89f2ca0 and b2eb0d7.

📒 Files selected for processing (2)
  • fern/docs.yml
  • fern/pages/python_api/index.md
✅ Files skipped from review due to trivial changes (2)
  • fern/pages/python_api/index.md
  • fern/docs.yml

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added multi-GPU K-Means clustering support in C API with dual parameter specifications (v1 and v2 ABI).
    • Added Python API bindings for multi-GPU K-Means with support for multiple initialization methods and sample weighting.
  • Documentation

    • Added C API and Python API documentation pages for multi-GPU K-Means.
    • Updated API navigation to include multi-GPU K-Means references.
  • Tests

    • Added C test suite validating multi-GPU K-Means clustering.
    • Added Python test suite covering multiple data types, initialization methods, and error handling.

Walkthrough

This PR adds single-node multi-GPU k-means fitting via new C and Python APIs. The C layer declares and implements fitting with DLPack tensor validation and dtype dispatch; the Python layer wraps the C API with NumPy host-array validation and tests across dtypes, initialization methods, and error cases.

Changes

Multi-GPU K-Means APIs

Layer / File(s) Summary
C API surface and library wiring
c/include/cuvs/cluster/mg_kmeans.h, c/include/cuvs/core/all.h, c/CMakeLists.txt
New public C header declares cuvsMultiGpuKMeansFit and cuvsMultiGpuKMeansFit_v2 for fitting with DLPack tensors. Header is conditionally included in the aggregate header and source is conditionally compiled into cuvs_c when BUILD_MG_ALGOS is enabled.
C implementation and test coverage
c/src/cluster/mg_kmeans.cpp, c/tests/CMakeLists.txt, c/tests/cluster/kmeans_mg_c.cu
Parameter conversion template maps input fields to native params. DLPack validation checks host accessibility, C-contiguity, dtype (float32/float64), and shape compatibility. Templated fit_snmg handles device setup, allocation, optional initialization, fitting, and output copying. dispatch_fit performs dtype selection and error translation. C test file covers both v1 and v2 parameter layouts with static data validation.
Python package structure and shared types
python/cuvs/cuvs/cluster/CMakeLists.txt, python/cuvs/cuvs/cluster/__init__.py, python/cuvs/cuvs/cluster/kmeans/kmeans.pxd, python/cuvs/cuvs/cluster/kmeans/kmeans.pyx, python/cuvs/cuvs/cluster/mg/*
cuvs.cluster now exports mg namespace. KMeansParams native params pointer field moved from .pyx to .pxd for sharing. New cuvs/cluster/mg/ module tree with CMakeLists entries, package initializers, and subdirectories.
Python multi-GPU k-means wrapper
python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxd, python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx, python/cuvs/cuvs/cluster/mg/kmeans/__init__.py
Cython .pxd declares extern cuvsMultiGpuKMeansFit binding. .pyx defines _as_host_array validation helper (enforces host NumPy, required dimensionality, C-contiguity, dtype match), FitOutput namedtuple, and fit function that converts inputs to DLPack, extracts resources, calls C API, and returns results. Package exports FitOutput, KMeansParams, and fit.
Python fit tests and validation
python/cuvs/cuvs/tests/test_mg_kmeans.py
GPU-gated test suite with helpers for synthetic data generation, host label/distance prediction, and output validation. test_mg_kmeans_fit_options parametrizes across float32/float64, init methods (Array, KMeansPlusPlus, Random), and weighting; validates centroid shape/dtype, iteration count, inertia against host reference, and label partitions. test_mg_kmeans_input_validation checks error handling for missing centroids, device arrays, non-C-contiguous inputs, dtype mismatches, and incompatible shapes.
C and Python API documentation
fern/pages/c_api/c-api-cluster-mg-kmeans.md, fern/pages/python_api/python-api-cluster-mg-kmeans.md, fern/docs.yml, fern/pages/c_api/index.md, fern/pages/python_api/index.md
New Fern reference pages document C function signatures, tensor requirements (host-accessible, row-major, dtype/shape), parameter tables, and v2 ABI note. Python page documents fit decorator, parameters, and return fields. Navigation indexes in C and Python sections updated with links to new reference pages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

improvement, non-breaking, cpp, benchmarking, doc

Suggested reviewers

  • tarang-jain
  • cjnolet
  • dantegd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'SNMG Batched KMeans Python API' clearly reflects the main change: adding a multi-GPU K-Means Python API wrapper.
Description check ✅ Passed The description 'Closes #2149 and #2155' is related to the changeset and indicates the issues being addressed.
Linked Issues check ✅ Passed The PR implements a Python API for SNMG Batched KMeans [#2149] with C API foundation, two exported functions, validation, tests, and documentation.
Out of Scope Changes check ✅ Passed All changes are within scope: C API (header, implementation, tests), Python API (module structure, wrappers, tests), CMake integration, and documentation for the SNMG KMeans feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
c/tests/cluster/kmeans_mg_c.cu (1)

112-115: 💤 Low value

Centroid comparison may be sensitive to cluster ordering.

The test compares centroids positionally, expecting {1.5, 1.5, 10.5, 10.5}. While the well-separated initial centroids {0,0} and {12,12} should converge deterministically to the two clusters, k-means implementations may still reorder clusters internally. If this test ever becomes flaky, consider sorting centroids before comparison or using an order-invariant comparison.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/tests/cluster/kmeans_mg_c.cu` around lines 112 - 115, The positional
comparison of centroids (centroids_data vs kExpectedCentroids) is sensitive to
cluster ordering; change the test to perform an order-invariant comparison by
grouping centroids into kNClusters vectors of length kNFeatures (using
centroids_data and kNFeatures to slice), sort those centroid vectors using a
deterministic key (e.g., lexicographic compare on feature values or by the first
feature), do the same sorting for the expected centroids (kExpectedCentroids),
and then run EXPECT_NEAR pairwise on the sorted lists to ensure the test is
robust to cluster reordering.
c/src/cluster/mg_kmeans.cpp (1)

144-144: 💤 Low value

Unqualified Array at mg_kmeans.cpp isn’t an issue, but can be made clearer

  • Array is an enumerator from the C API unscoped enum cuvsKMeansInitMethod (defined in c/include/cuvs/cluster/kmeans.h and pulled in via mg_kmeans.h), so it’s expected that if (params.init == Array) uses an unqualified name—there’s no cuvsKMeansInitMethod::Array form to qualify.
  • Optional: move convert_params(params) before the check and compare kmeans_params.init to cuvs::cluster::kmeans::params::InitMethod::Array for consistency with the C++ enum.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/src/cluster/mg_kmeans.cpp` at line 144, The condition uses the unscoped C
enum value Array (params.init == Array); to make it clearer/consistent, call
convert_params(params) first to build the C++ struct and then compare the C++
enum: replace the direct check of params.init with a check against
kmeans_params.init == cuvs::cluster::kmeans::params::InitMethod::Array (use
convert_params to produce kmeans_params), so the code references the C++ scoped
enum rather than the unqualified C enumerator.
python/cuvs/cuvs/cluster/mg/kmeans/__init__.py (1)

1-9: ⚡ Quick win

Consider adding a module docstring.

This module serves as the public API entry point for single-node multi-GPU k-means. A brief docstring would help users understand the package's purpose and available exports.

📝 Suggested module docstring
 # SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION.
 # SPDX-License-Identifier: Apache-2.0
 
+"""Single-node multi-GPU (SNMG) k-means clustering.
+
+This module provides k-means fitting across multiple GPUs on a single node,
+with host-memory input arrays distributed across available devices.
+"""
+
 from cuvs.cluster.kmeans import KMeansParams
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/cluster/mg/kmeans/__init__.py` around lines 1 - 9, Add a
concise module docstring at the top of the module to describe that this package
is the public API entry point for single-node multi-GPU k-means and list the
primary exports; update the module containing KMeansParams, FitOutput, and fit
to include a short triple-quoted string explaining purpose, intended use, and
the exported symbols (FitOutput, KMeansParams, fit) so users see what this
subpackage provides.
python/cuvs/cuvs/tests/test_mg_kmeans.py (1)

31-47: ⚡ Quick win

Consider adding docstrings to test helper functions.

The helper functions make_inputs, make_sample_weights, and predict_labels_host lack documentation explaining their purpose, parameters, and return values, which would improve test maintainability.

Also applies to: 50-57

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/tests/test_mg_kmeans.py` around lines 31 - 47, Add concise
docstrings to the test helper functions make_inputs, make_sample_weights, and
predict_labels_host describing their purpose, parameters (dtype, n_rows, n_cols,
n_clusters where applicable), return values (e.g., X and centroids for
make_inputs; sample weights array for make_sample_weights; predicted labels for
predict_labels_host), and any important behavior (e.g., deterministic RNG seeds
and array contiguity). Place the docstring immediately under each function
signature using a short triple-quoted string.
python/cuvs/cuvs/tests/test_kmeans.py (1)

18-28: ⚡ Quick win

Consider adding a docstring to the helper function.

The make_well_separated_kmeans_input helper generates synthetic test data but lacks documentation explaining its purpose, parameters, and return values.

📝 Suggested docstring
 def make_well_separated_kmeans_input(rng, n_rows, n_cols, n_clusters, dtype):
+    """Generate well-separated synthetic k-means input with deterministic structure.
+    
+    Creates cluster centers with large separation (scale=10.0) and adds small
+    Gaussian noise (scale=0.01) to ensure clusters remain distinct.
+    
+    Args:
+        rng: NumPy random generator
+        n_rows: Number of data points
+        n_cols: Number of features
+        n_clusters: Number of clusters
+        dtype: NumPy dtype for the output arrays
+        
+    Returns:
+        Tuple of (X, initial_centroids) as contiguous NumPy arrays
+    """
     labels = np.arange(n_rows) % n_clusters
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/tests/test_kmeans.py` around lines 18 - 28, Add a clear
docstring to the helper function make_well_separated_kmeans_input describing its
purpose (generate well-separated KMeans test data), parameters (rng: random
generator, n_rows: int, n_cols: int, n_clusters: int, dtype: numpy dtype), and
return values (X: contiguous ndarray of shape (n_rows, n_cols) with clustered
samples, initial_centroids: ndarray of shape (n_clusters, n_cols) containing the
initial centroids copied from X); place the docstring immediately below the def
line and mention that X is returned as a contiguous array and that
initial_centroids is a copy used for initialization.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@c/include/cuvs/cluster/mg_kmeans.h`:
- Around line 34-36: Update the Doxygen for the function in mg_kmeans.h so the
documented parameter type matches the actual signature: replace references to
cuvsMultiGpuResources_t with cuvsResources_t and add a short note that the
cuvsResources_t must represent a multi-GPU resource created by
cuvsMultiGpuResourcesCreate or cuvsMultiGpuResourcesCreateWithDeviceIds (the
implementation already validates this). Ensure the comment references the exact
symbol cuvsResources_t and mentions the creation functions
cuvsMultiGpuResourcesCreate / cuvsMultiGpuResourcesCreateWithDeviceIds to avoid
confusion.

In `@fern/pages/python_api/index.md`:
- Line 8: The sidebar shows two identical "[Kmeans]" link labels; update the
link label for the multi-GPU page to a distinct name (e.g., "Kmeans (multi-GPU)"
or "Kmeans — multi‑GPU") so the entry referencing
"/api-reference/python-api-cluster-mg-kmeans" is unambiguous; edit the link text
in fern/pages/python_api/index.md where the label "[Kmeans]" appears and leave
the URL unchanged.

In `@fern/pages/python_api/python-api-cluster-mg-kmeans.md`:
- Line 27: The `resources` parameter on the KMeans cluster API is undocumented;
update the `resources` row in python-api-cluster-mg-kmeans.md to describe its
behavior: state that `resources` is an optional cuvs.common.Resources object
controlling compute resources (CPUs, GPUs, memory) for training/inference,
document the default when omitted, list accepted fields/units (e.g., cpu_count,
gpu_count, memory_gb) and how the cluster uses them (scheduling/training
limits), and add a short usage example showing how to pass a
cuvs.common.Resources instance; reference the `resources` symbol and its type
`cuvs.common.Resources` in the description.

---

Nitpick comments:
In `@c/src/cluster/mg_kmeans.cpp`:
- Line 144: The condition uses the unscoped C enum value Array (params.init ==
Array); to make it clearer/consistent, call convert_params(params) first to
build the C++ struct and then compare the C++ enum: replace the direct check of
params.init with a check against kmeans_params.init ==
cuvs::cluster::kmeans::params::InitMethod::Array (use convert_params to produce
kmeans_params), so the code references the C++ scoped enum rather than the
unqualified C enumerator.

In `@c/tests/cluster/kmeans_mg_c.cu`:
- Around line 112-115: The positional comparison of centroids (centroids_data vs
kExpectedCentroids) is sensitive to cluster ordering; change the test to perform
an order-invariant comparison by grouping centroids into kNClusters vectors of
length kNFeatures (using centroids_data and kNFeatures to slice), sort those
centroid vectors using a deterministic key (e.g., lexicographic compare on
feature values or by the first feature), do the same sorting for the expected
centroids (kExpectedCentroids), and then run EXPECT_NEAR pairwise on the sorted
lists to ensure the test is robust to cluster reordering.

In `@python/cuvs/cuvs/cluster/mg/kmeans/__init__.py`:
- Around line 1-9: Add a concise module docstring at the top of the module to
describe that this package is the public API entry point for single-node
multi-GPU k-means and list the primary exports; update the module containing
KMeansParams, FitOutput, and fit to include a short triple-quoted string
explaining purpose, intended use, and the exported symbols (FitOutput,
KMeansParams, fit) so users see what this subpackage provides.

In `@python/cuvs/cuvs/tests/test_kmeans.py`:
- Around line 18-28: Add a clear docstring to the helper function
make_well_separated_kmeans_input describing its purpose (generate well-separated
KMeans test data), parameters (rng: random generator, n_rows: int, n_cols: int,
n_clusters: int, dtype: numpy dtype), and return values (X: contiguous ndarray
of shape (n_rows, n_cols) with clustered samples, initial_centroids: ndarray of
shape (n_clusters, n_cols) containing the initial centroids copied from X);
place the docstring immediately below the def line and mention that X is
returned as a contiguous array and that initial_centroids is a copy used for
initialization.

In `@python/cuvs/cuvs/tests/test_mg_kmeans.py`:
- Around line 31-47: Add concise docstrings to the test helper functions
make_inputs, make_sample_weights, and predict_labels_host describing their
purpose, parameters (dtype, n_rows, n_cols, n_clusters where applicable), return
values (e.g., X and centroids for make_inputs; sample weights array for
make_sample_weights; predicted labels for predict_labels_host), and any
important behavior (e.g., deterministic RNG seeds and array contiguity). Place
the docstring immediately under each function signature using a short
triple-quoted string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7a754ca0-8545-4e19-a93e-76804f4836ed

📥 Commits

Reviewing files that changed from the base of the PR and between 0c3d007 and 89f2ca0.

📒 Files selected for processing (23)
  • c/CMakeLists.txt
  • c/include/cuvs/cluster/mg_kmeans.h
  • c/include/cuvs/core/all.h
  • c/src/cluster/mg_kmeans.cpp
  • c/tests/CMakeLists.txt
  • c/tests/cluster/kmeans_mg_c.cu
  • fern/docs.yml
  • fern/pages/c_api/c-api-cluster-mg-kmeans.md
  • fern/pages/c_api/index.md
  • fern/pages/python_api/index.md
  • fern/pages/python_api/python-api-cluster-mg-kmeans.md
  • python/cuvs/cuvs/cluster/CMakeLists.txt
  • python/cuvs/cuvs/cluster/__init__.py
  • python/cuvs/cuvs/cluster/kmeans/kmeans.pxd
  • python/cuvs/cuvs/cluster/kmeans/kmeans.pyx
  • python/cuvs/cuvs/cluster/mg/CMakeLists.txt
  • python/cuvs/cuvs/cluster/mg/__init__.py
  • python/cuvs/cuvs/cluster/mg/kmeans/CMakeLists.txt
  • python/cuvs/cuvs/cluster/mg/kmeans/__init__.py
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pxd
  • python/cuvs/cuvs/cluster/mg/kmeans/kmeans.pyx
  • python/cuvs/cuvs/tests/test_kmeans.py
  • python/cuvs/cuvs/tests/test_mg_kmeans.py
💤 Files with no reviewable changes (1)
  • python/cuvs/cuvs/cluster/kmeans/kmeans.pyx

Comment on lines +34 to +36
* @param[in] res cuvsMultiGpuResources_t opaque C handle
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Documentation refers to cuvsMultiGpuResources_t but parameter is cuvsResources_t.

The Doxygen comment at line 34 mentions cuvsMultiGpuResources_t as the expected handle type, but the actual parameter type in the function signature (line 50) is cuvsResources_t. While the implementation validates that the handle is a multi-GPU resource, the documentation could cause confusion for API consumers.

📝 Suggested documentation fix
-* `@param`[in]     res           cuvsMultiGpuResources_t opaque C handle
+* `@param`[in]     res           cuvsResources_t opaque C handle (must be a multi-GPU resource)
                               created by cuvsMultiGpuResourcesCreate or
                               cuvsMultiGpuResourcesCreateWithDeviceIds.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* @param[in] res cuvsMultiGpuResources_t opaque C handle
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
* `@param`[in] res cuvsResources_t opaque C handle (must be a multi-GPU resource)
* created by cuvsMultiGpuResourcesCreate or
* cuvsMultiGpuResourcesCreateWithDeviceIds.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@c/include/cuvs/cluster/mg_kmeans.h` around lines 34 - 36, Update the Doxygen
for the function in mg_kmeans.h so the documented parameter type matches the
actual signature: replace references to cuvsMultiGpuResources_t with
cuvsResources_t and add a short note that the cuvsResources_t must represent a
multi-GPU resource created by cuvsMultiGpuResourcesCreate or
cuvsMultiGpuResourcesCreateWithDeviceIds (the implementation already validates
this). Ensure the comment references the exact symbol cuvsResources_t and
mentions the creation functions cuvsMultiGpuResourcesCreate /
cuvsMultiGpuResourcesCreateWithDeviceIds to avoid confusion.

## Cluster

- [Kmeans](/api-reference/python-api-cluster-kmeans)
- [Kmeans](/api-reference/python-api-cluster-mg-kmeans)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a distinct label for the multi-GPU page.

This adds a second identical "Kmeans" label, making the two links ambiguous in the sidebar.

🧭 Proposed fix
-- [Kmeans](/api-reference/python-api-cluster-mg-kmeans)
+- [Multi-GPU Kmeans](/api-reference/python-api-cluster-mg-kmeans)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- [Kmeans](/api-reference/python-api-cluster-mg-kmeans)
- [Multi-GPU Kmeans](/api-reference/python-api-cluster-mg-kmeans)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/python_api/index.md` at line 8, The sidebar shows two identical
"[Kmeans]" link labels; update the link label for the multi-GPU page to a
distinct name (e.g., "Kmeans (multi-GPU)" or "Kmeans — multi‑GPU") so the entry
referencing "/api-reference/python-api-cluster-mg-kmeans" is unambiguous; edit
the link text in fern/pages/python_api/index.md where the label "[Kmeans]"
appears and leave the URL unchanged.

| `X` | `host array-like` | Training instances, shape (m, k). Must be C-contiguous float32 or float64 host data. |
| `centroids` | `host array-like, optional` | Initial centroids when ``params.init_method == "Array"`` and output centroids for all init methods. If omitted, a host NumPy output array is allocated unless ``init_method == "Array"``. |
| `sample_weights` | `host array-like, optional` | Optional weights per observation. Must be C-contiguous and have the same dtype as X. |
| `resources` | `cuvs.common.Resources, optional` | |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the resources parameter behavior.

resources is currently undocumented (empty description), which makes the API contract incomplete for users.

📝 Proposed doc fix
-| `resources` | `cuvs.common.Resources, optional` |  |
+| `resources` | `cuvs.common.Resources, optional` | Multi-GPU resources handle. If omitted, default resources are used by the wrapper. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `resources` | `cuvs.common.Resources, optional` | |
| `resources` | `cuvs.common.Resources, optional` | Multi-GPU resources handle. If omitted, default resources are used by the wrapper. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/python_api/python-api-cluster-mg-kmeans.md` at line 27, The
`resources` parameter on the KMeans cluster API is undocumented; update the
`resources` row in python-api-cluster-mg-kmeans.md to describe its behavior:
state that `resources` is an optional cuvs.common.Resources object controlling
compute resources (CPUs, GPUs, memory) for training/inference, document the
default when omitted, list accepted fields/units (e.g., cpu_count, gpu_count,
memory_gb) and how the cluster uses them (scheduling/training limits), and add a
short usage example showing how to pass a cuvs.common.Resources instance;
reference the `resources` symbol and its type `cuvs.common.Resources` in the
description.

* closest cluster center.
* @param[out] n_iter Number of iterations run.
*/
CUVS_EXPORT cuvsError_t cuvsMultiGpuKMeansFit_v2(cuvsResources_t res,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need this suffix. Can add breaking changes in this release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

SNMG Batched KMeans Python API and benchmarking

3 participants