Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
python/nvforest/nvforest/_base.py (1)
93-97: Consider adding a docstring for the newnum_featuresproperty.While other abstract properties in this file also lack docstrings, the coding guidelines recommend NumPy-style docstrings for public functions/properties. Consider adding a brief docstring for completeness, such as:
`@property` `@abstractmethod` def num_features(self) -> int: """Return the number of features expected by the model.""" passAs per coding guidelines: "All public functions should have NumPy-style docstrings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/_base.py` around lines 93 - 97, Add a NumPy-style docstring to the abstract property num_features in class _base (the `@property` `@abstractmethod` def num_features(self) -> int) describing that it returns the number of features expected by the model; keep it brief (one-line summary and optional short description/returns section) consistent with other public APIs.python/nvforest/tests/test_nvforest.py (1)
861-873: Good test coverage for the new feature validation.The test correctly verifies:
- The
num_featuresproperty returns the expected value.- A
ValueErroris raised with a helpful message when input dimensions mismatch.Consider extending coverage for completeness:
- Test with both CPU and GPU devices (parameterize with
@pytest.mark.parametrize("device", ("cpu", "gpu"))).- Test other inference methods (
predict_proba,predict_per_tree,apply) that also validate input dimensions.- Test with too many features (e.g., 6 columns instead of 5), not just too few.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/tests/test_nvforest.py` around lines 861 - 873, Extend test_incorrect_data_shape to parametrize over device and to exercise all inference methods that validate input shape: add `@pytest.mark.parametrize`("device", ("cpu","gpu")) and call nvforest.load_from_sklearn(clf, device=device) to get fm; keep the existing assert on fm.num_features, then for each method name in ("predict","predict_proba","predict_per_tree","apply") use pytest.raises(ValueError, match=f"Expected {n_features} features") to call getattr(fm, method)(np.zeros((1, 4))) and also test too-many-features by calling each method with np.zeros((1, n_features+1))); ensure you reference the existing test_incorrect_data_shape and the fm object when adding these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@python/nvforest/nvforest/_base.py`:
- Around line 93-97: Add a NumPy-style docstring to the abstract property
num_features in class _base (the `@property` `@abstractmethod` def
num_features(self) -> int) describing that it returns the number of features
expected by the model; keep it brief (one-line summary and optional short
description/returns section) consistent with other public APIs.
In `@python/nvforest/tests/test_nvforest.py`:
- Around line 861-873: Extend test_incorrect_data_shape to parametrize over
device and to exercise all inference methods that validate input shape: add
`@pytest.mark.parametrize`("device", ("cpu","gpu")) and call
nvforest.load_from_sklearn(clf, device=device) to get fm; keep the existing
assert on fm.num_features, then for each method name in
("predict","predict_proba","predict_per_tree","apply") use
pytest.raises(ValueError, match=f"Expected {n_features} features") to call
getattr(fm, method)(np.zeros((1, 4))) and also test too-many-features by calling
each method with np.zeros((1, n_features+1))); ensure you reference the existing
test_incorrect_data_shape and the fm object when adding these checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 33ad5435-6a1f-490a-9f35-905774c01955
📒 Files selected for processing (4)
python/nvforest/nvforest/_base.pypython/nvforest/nvforest/_forest_inference.pypython/nvforest/nvforest/detail/forest_inference.pyxpython/nvforest/tests/test_nvforest.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/nvforest/tests/test_nvforest.py`:
- Around line 877-880: The test currently only exercises
nvforest.load_from_sklearn; add a saved-model test that goes through the
serialized-load path (use nvforest.save_model or ForestInference.save if
available to write the model to a temporary file, then call nvforest.load_model
or ForestInference.load with device="cpu") to reproduce issue-72's validation
path; ensure the loaded model's feature count is validated by asserting expected
behavior (e.g., raises or matches num_features) after loading from disk so any
regression in the load_model / ForestInference.load path is caught.
- Around line 882-884: The test currently only asserts the error starts with
"Expected {n_features} features"; update the pytest.raises match to require the
actual received feature count (input_size) too so the exception message includes
both expected and actual counts. Locate the block using predict_func, fm,
input_size and n_features and change the match to assert something like
"Expected {n_features} features, got {input_size}" (or the project's chosen
wording) so the raised ValueError contains expected vs actual values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 963bc909-d573-46fb-ae7c-8a886be0d654
📒 Files selected for processing (1)
python/nvforest/tests/test_nvforest.py
| clf = RandomForestClassifier(max_features="sqrt", n_estimators=10) | ||
| clf.fit(X, y) | ||
|
|
||
| fm = nvforest.load_from_sklearn(clf, device="cpu") |
There was a problem hiding this comment.
Cover the actual issue-72 loading path here.
This only exercises load_from_sklearn(..., device="cpu"), but issue #72 is about feature-count validation after loading a serialized model via ForestInference.load / nvforest.load_model. A loader-specific num_features regression would still slip through, so please add at least one saved-model case that goes through load_model and reproduces the user-facing path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/nvforest/tests/test_nvforest.py` around lines 877 - 880, The test
currently only exercises nvforest.load_from_sklearn; add a saved-model test that
goes through the serialized-load path (use nvforest.save_model or
ForestInference.save if available to write the model to a temporary file, then
call nvforest.load_model or ForestInference.load with device="cpu") to reproduce
issue-72's validation path; ensure the loaded model's feature count is validated
by asserting expected behavior (e.g., raises or matches num_features) after
loading from disk so any regression in the load_model / ForestInference.load
path is caught.
| with pytest.raises(ValueError, match=f"Expected {n_features} features"): | ||
| X_test = np.zeros((1, input_size)) | ||
| _ = predict_func(fm, X_test) |
There was a problem hiding this comment.
Assert the received feature count in the error message too.
Right now any message that starts with Expected 5 features passes. Matching the actual width (input_size) here would enforce the expected-vs-actual detail that users need when debugging shape errors.
Suggested assertion tightening
- with pytest.raises(ValueError, match=f"Expected {n_features} features"):
+ with pytest.raises(
+ ValueError,
+ match=rf"Expected {n_features} features.*got {input_size}",
+ ):As per coding guidelines "Error messages should be helpful and include expected vs actual values".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with pytest.raises(ValueError, match=f"Expected {n_features} features"): | |
| X_test = np.zeros((1, input_size)) | |
| _ = predict_func(fm, X_test) | |
| with pytest.raises( | |
| ValueError, | |
| match=rf"Expected {n_features} features.*got {input_size}", | |
| ): | |
| X_test = np.zeros((1, input_size)) | |
| _ = predict_func(fm, X_test) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/nvforest/tests/test_nvforest.py` around lines 882 - 884, The test
currently only asserts the error starts with "Expected {n_features} features";
update the pytest.raises match to require the actual received feature count
(input_size) too so the exception message includes both expected and actual
counts. Locate the block using predict_func, fm, input_size and n_features and
change the match to assert something like "Expected {n_features} features, got
{input_size}" (or the project's chosen wording) so the raised ValueError
contains expected vs actual values.
Closes #72