Check input dimensions by hcho3 · Pull Request #76 · rapidsai/nvforest

hcho3 · 2026-03-05T23:29:39Z

Closes #72

coderabbitai · 2026-03-05T23:33:47Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added ability to query the number of features expected by a model.
Improvements
- Added input validation to reject predictions with mismatched feature dimensions and provide clear error messages.
Tests
- Added test coverage verifying that shape mismatches raise descriptive errors.

Walkthrough

Adds a num_features property to the ForestInference interface and implementations, and enforces input dimension validation (2D and matching feature count) in predict/predict_per_tree/apply; includes tests that assert a ValueError is raised for mismatched feature counts.

Changes

Cohort / File(s)	Summary
Interface `python/nvforest/nvforest/_base.py`	Adds abstract property `num_features(self) -> int` to `ForestInference`.
Implementations `python/nvforest/nvforest/_forest_inference.py`	Adds public `num_features` property to CPU/GPU classifier and regressor inference classes, delegating to `self.forest.num_features`.
Cython validation `python/nvforest/nvforest/detail/forest_inference.pyx`	Adds `_validate_input_dims(X)` to `ForestInference_impl` and `ForestInferenceImpl`; calls it at start of `predict`, `predict_per_tree`, and `apply` to enforce 2D input and matching feature count, raising `ValueError` on mismatch.
Tests `python/nvforest/tests/test_nvforest.py`	Adds `test_incorrect_data_shape()` parametrized test to assert a `ValueError` with expected-feature message when input feature count differs from the model's.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Check input dimensions' accurately describes the main change: adding validation to check input data dimensions against model training features.
Description check	✅ Passed	The description 'Closes `#72`' is related to the changeset, referencing the issue that outlines the input dimension validation requirements.
Linked Issues check	✅ Passed	The PR implements all requirements from issue `#72`: adds num_features property to ForestInference classes, implements input dimension validation in predict/apply methods, and includes test coverage for dimension mismatch errors.
Out of Scope Changes check	✅ Passed	All changes are directly related to issue `#72`'s requirements: adding num_features property, validating input dimensions, and testing the validation logic.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

python/nvforest/nvforest/_base.py (1)
93-97: Consider adding a docstring for the new num_features property.

While other abstract properties in this file also lack docstrings, the coding guidelines recommend NumPy-style docstrings for public functions/properties. Consider adding a brief docstring for completeness, such as:
`@property`
`@abstractmethod`
def num_features(self) -> int:
    """Return the number of features expected by the model."""
    pass
As per coding guidelines: "All public functions should have NumPy-style docstrings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/_base.py` around lines 93 - 97, Add a NumPy-style
docstring to the abstract property num_features in class _base (the `@property`
`@abstractmethod` def num_features(self) -> int) describing that it returns the
number of features expected by the model; keep it brief (one-line summary and
optional short description/returns section) consistent with other public APIs.
python/nvforest/tests/test_nvforest.py (1)
861-873: Good test coverage for the new feature validation.

The test correctly verifies:

The num_features property returns the expected value.

A ValueError is raised with a helpful message when input dimensions mismatch.

Consider extending coverage for completeness:

Test with both CPU and GPU devices (parameterize with @pytest.mark.parametrize("device", ("cpu", "gpu"))).

Test other inference methods (predict_proba, predict_per_tree, apply) that also validate input dimensions.

Test with too many features (e.g., 6 columns instead of 5), not just too few.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/tests/test_nvforest.py` around lines 861 - 873, Extend
test_incorrect_data_shape to parametrize over device and to exercise all
inference methods that validate input shape: add
`@pytest.mark.parametrize`("device", ("cpu","gpu")) and call
nvforest.load_from_sklearn(clf, device=device) to get fm; keep the existing
assert on fm.num_features, then for each method name in
("predict","predict_proba","predict_per_tree","apply") use
pytest.raises(ValueError, match=f"Expected {n_features} features") to call
getattr(fm, method)(np.zeros((1, 4))) and also test too-many-features by calling
each method with np.zeros((1, n_features+1))); ensure you reference the existing
test_incorrect_data_shape and the fm object when adding these checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@python/nvforest/nvforest/_base.py`:
- Around line 93-97: Add a NumPy-style docstring to the abstract property
num_features in class _base (the `@property` `@abstractmethod` def
num_features(self) -> int) describing that it returns the number of features
expected by the model; keep it brief (one-line summary and optional short
description/returns section) consistent with other public APIs.

In `@python/nvforest/tests/test_nvforest.py`:
- Around line 861-873: Extend test_incorrect_data_shape to parametrize over
device and to exercise all inference methods that validate input shape: add
`@pytest.mark.parametrize`("device", ("cpu","gpu")) and call
nvforest.load_from_sklearn(clf, device=device) to get fm; keep the existing
assert on fm.num_features, then for each method name in
("predict","predict_proba","predict_per_tree","apply") use
pytest.raises(ValueError, match=f"Expected {n_features} features") to call
getattr(fm, method)(np.zeros((1, 4))) and also test too-many-features by calling
each method with np.zeros((1, n_features+1))); ensure you reference the existing
test_incorrect_data_shape and the fm object when adding these checks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 33ad5435-6a1f-490a-9f35-905774c01955

📥 Commits

Reviewing files that changed from the base of the PR and between d8aa3ee and 47025ff.

📒 Files selected for processing (4)

python/nvforest/nvforest/_base.py
python/nvforest/nvforest/_forest_inference.py
python/nvforest/nvforest/detail/forest_inference.pyx
python/nvforest/tests/test_nvforest.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/nvforest/tests/test_nvforest.py`:
- Around line 877-880: The test currently only exercises
nvforest.load_from_sklearn; add a saved-model test that goes through the
serialized-load path (use nvforest.save_model or ForestInference.save if
available to write the model to a temporary file, then call nvforest.load_model
or ForestInference.load with device="cpu") to reproduce issue-72's validation
path; ensure the loaded model's feature count is validated by asserting expected
behavior (e.g., raises or matches num_features) after loading from disk so any
regression in the load_model / ForestInference.load path is caught.
- Around line 882-884: The test currently only asserts the error starts with
"Expected {n_features} features"; update the pytest.raises match to require the
actual received feature count (input_size) too so the exception message includes
both expected and actual counts. Locate the block using predict_func, fm,
input_size and n_features and change the match to assert something like
"Expected {n_features} features, got {input_size}" (or the project's chosen
wording) so the raised ValueError contains expected vs actual values.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 963bc909-d573-46fb-ae7c-8a886be0d654

📥 Commits

Reviewing files that changed from the base of the PR and between 47025ff and 1c077cf.

📒 Files selected for processing (1)

python/nvforest/tests/test_nvforest.py

coderabbitai · 2026-03-05T23:53:32Z

python/nvforest/tests/test_nvforest.py

+    clf = RandomForestClassifier(max_features="sqrt", n_estimators=10)
+    clf.fit(X, y)
+
+    fm = nvforest.load_from_sklearn(clf, device="cpu")


⚠️ Potential issue | 🟠 Major

Cover the actual issue-72 loading path here.

This only exercises load_from_sklearn(..., device="cpu"), but issue #72 is about feature-count validation after loading a serialized model via ForestInference.load / nvforest.load_model. A loader-specific num_features regression would still slip through, so please add at least one saved-model case that goes through load_model and reproduces the user-facing path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/nvforest/tests/test_nvforest.py` around lines 877 - 880, The test currently only exercises nvforest.load_from_sklearn; add a saved-model test that goes through the serialized-load path (use nvforest.save_model or ForestInference.save if available to write the model to a temporary file, then call nvforest.load_model or ForestInference.load with device="cpu") to reproduce issue-72's validation path; ensure the loaded model's feature count is validated by asserting expected behavior (e.g., raises or matches num_features) after loading from disk so any regression in the load_model / ForestInference.load path is caught.

coderabbitai · 2026-03-05T23:53:32Z

python/nvforest/tests/test_nvforest.py

+    with pytest.raises(ValueError, match=f"Expected {n_features} features"):
+        X_test = np.zeros((1, input_size))
+        _ = predict_func(fm, X_test)


⚠️ Potential issue | 🟡 Minor

Assert the received feature count in the error message too.

Right now any message that starts with Expected 5 features passes. Matching the actual width (input_size) here would enforce the expected-vs-actual detail that users need when debugging shape errors.

Suggested assertion tightening

- with pytest.raises(ValueError, match=f"Expected {n_features} features"): + with pytest.raises( + ValueError, + match=rf"Expected {n_features} features.*got {input_size}", + ):

As per coding guidelines "Error messages should be helpful and include expected vs actual values".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with pytest.raises(ValueError, match=f"Expected {n_features} features"):

X_test = np.zeros((1, input_size))

_ = predict_func(fm, X_test)

with pytest.raises(

ValueError,

match=rf"Expected {n_features} features.*got {input_size}",

):

X_test = np.zeros((1, input_size))

_ = predict_func(fm, X_test)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/nvforest/tests/test_nvforest.py` around lines 882 - 884, The test currently only asserts the error starts with "Expected {n_features} features"; update the pytest.raises match to require the actual received feature count (input_size) too so the exception message includes both expected and actual counts. Locate the block using predict_func, fm, input_size and n_features and change the match to assert something like "Expected {n_features} features, got {input_size}" (or the project's chosen wording) so the raised ValueError contains expected vs actual values.

hcho3 added 2 commits March 5, 2026 15:05

Expose num_features as a property

6b8399c

Check for the size of input

47025ff

hcho3 requested a review from a team as a code owner March 5, 2026 23:29

github-actions bot added the Cython / Python label Mar 5, 2026

hcho3 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change and removed Cython / Python labels Mar 5, 2026

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

More test coverage

1c077cf

github-actions bot added the Cython / Python label Mar 5, 2026

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check input dimensions#76

Check input dimensions#76
hcho3 wants to merge 3 commits intorapidsai:mainfrom
hcho3:check_input_dims

hcho3 commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hcho3 commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 5, 2026 •

edited

Loading