Add comprehensive benchmark suite comparing cuforest against sklearn, XGBoost, and LightGBM native inference by dantegd · Pull Request #32 · rapidsai/nvforest

dantegd · 2026-01-16T18:56:27Z

PR adds a comprehensive benchmark suite for comparing cuforest inference performance against native ML framework inference.

Features

Multi-framework support: sklearn, XGBoost, and LightGBM
Model types: Both regressors and classifiers
Devices: CPU and GPU benchmarking
CLI options:
- --framework / -f: Select specific frameworks
- --dry-run / -n: Preview configuration without running
- --quick-test / -q: Minimal parameters for quick verification
- --device / -d: Select cpu, gpu, or both
- --model-type / -m: Select regressor, classifier, or both

Usage

python -m cuforest.benchmark.benchmark run --dry-run
python -m cuforest.benchmark.benchmark run --quick-test
python -m cuforest.benchmark.analyze data/final_results.csv

Files

cuforest/benchmark/benchmark.py - Main benchmark with CLI
cuforest/benchmark/analyze.py - Results analysis and visualization
cuforest/benchmark/README.md - Documentation

todo:

Add tests
Do a local e2e run to check correctness (In progress)

Note: Builds on top of #31

…st into fea-optimize-new-instance

Co-authored-by: Simon Adorf <carl.simon.adorf@gmail.com>

…arn, XGBoost, and LightGBM native inference

copy-pr-bot · 2026-01-16T18:56:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-04T04:21:59Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Benchmark suite for comparing nvforest performance against sklearn, XGBoost, and LightGBM across CPU and GPU devices.
- Analysis tools for generating speedup visualizations, computing statistical summaries, and filtering results by framework and device.
Documentation
- Comprehensive benchmark guide with setup instructions, usage examples, parameter reference, and output format details.
Tests
- Full test coverage for benchmark execution, analysis functionality, and CLI interactions.

Walkthrough

Adds a complete benchmarking suite for nvforest comparing regressor and classifier models against sklearn, XGBoost, and LightGBM on CPU and GPU. Includes documentation, framework integration, benchmark execution, result analysis with visualization, and comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Documentation and Package Setup `python/nvforest/nvforest/benchmark/README.md`, `python/nvforest/nvforest/benchmark/__init__.py`	README documenting quick-start commands, parameter grids, device handling per framework, output CSV structure, and dependencies. Module initializer with license headers and docstring.
Core Benchmark Implementation `python/nvforest/nvforest/benchmark/benchmark.py`	Comprehensive benchmarking module with FrameworkConfig dataclass for sklearn, XGBoost, LightGBM integration; helper functions for model save/load/predict with GPU awareness; run_inference_benchmark for timed inference; train_model, generate_data, write_checkpoint utilities; run_benchmark_suite orchestrating end-to-end execution across feature counts, depths, trees, frameworks, devices, and batch sizes; Click CLI with run command supporting dry-run and quick-test modes.
Analysis and Visualization `python/nvforest/nvforest/benchmark/analyze.py`	Analysis module providing plot_speedup_heatmaps for grid visualization across model configurations, generate_summary_stats for grouped statistics, print_summary for human-readable reporting, and Click-based analyze CLI for CSV result parsing with optional framework/device filtering and output control.
Test Coverage `python/nvforest/tests/test_benchmark.py`	Comprehensive test suite validating framework configs, parameter space structure, data generation reproducibility, model training, inference benchmarking, checkpoint writing, CLI behavior, device handling across frameworks, and end-to-end model I/O workflows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title refers to 'cuforest' but the actual changes are to 'nvforest'. The codebase and files changed use 'nvforest', while the title uses the old/incorrect name, making it misleading about which project is being modified.	Update the title to replace 'cuforest' with 'nvforest' to accurately reflect the actual package being modified.
Description check	⚠️ Warning	The description mentions 'cuforest' in code examples and file paths, but the actual PR implements 'nvforest'. While the description is otherwise related to the changeset, the repeated use of the incorrect project name is misleading.	Replace all instances of 'cuforest' with 'nvforest' in the description, code examples, and file paths to match the actual implementation.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (7)

python/nvforest/nvforest/benchmark/analyze.py (2)

275-281: Chain the exception for consistent error handling.

Similar to the earlier issue, chain the exception for better debugging.

♻️ Proposed fix

         try:
             plot_speedup_heatmaps(df, title=title, output_filename=output)
         except ImportError as e:
             if not plot_only:
                 print(f"\nNote: {e}")
             else:
-                raise click.ClickException(str(e))
+                raise click.ClickException(str(e)) from e

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 275 - 281, The
except block for ImportError when calling plot_speedup_heatmaps should chain the
original exception when re-raising so tracebacks include the root cause; replace
the current raise click.ClickException(str(e)) with a chained raise (i.e., raise
click.ClickException(str(e)) from e) in the analyze.py except block that handles
ImportError around plot_speedup_heatmaps.

43-50: Chain the exception for better debugging context.

When re-raising an exception, use from err to preserve the exception chain, making debugging easier.

♻️ Proposed fix

     try:
         import matplotlib.pyplot as plt
         import seaborn as sns
-    except ImportError:
+    except ImportError as err:
         raise ImportError(
             "matplotlib and seaborn are required for plotting. "
             "Install them with: pip install matplotlib seaborn"
-        )
+        ) from err

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 43 - 50, The
ImportError being raised in the top-level import try/except in analyze.py should
chain the original exception for better traceback context; in the except block
that catches the import failure, capture the exception (e.g., "except
ImportError as err:") and re-raise the new ImportError using "from err" so the
original import error is preserved when raising the message about
matplotlib/seaborn.

python/nvforest/nvforest/benchmark/benchmark.py (4)

357-430: Consider adding Returns section to docstring.

The function has a good NumPy-style docstring but is missing the Returns section to document that it returns the trained model.

📝 Proposed docstring addition

     device : str
         Device to use for training ('cpu' or 'gpu').
         Only affects XGBoost currently.
+
+    Returns
+    -------
+    Any
+        The trained model instance.
     """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 357 - 430, The
docstring for train_model is missing a Returns section; add a NumPy-style
"Returns" block to the function docstring (above the implementation in
train_model) that documents the returned trained model (type Any or instance of
framework.regressor_class/framework.classifier_class) and briefly states that
the fitted model is returned after calling model.fit(X_train, y_train). Include
the return type and a short description so callers and autodocs know what to
expect.

85-97: Logger adds duplicate handlers if get_logger() is called multiple times.

While currently called once at module level, if get_logger() were called again, it would add duplicate handlers. Consider guarding against this.

♻️ Proposed fix

 def get_logger():
     """Configure and return the benchmark logger."""
     logging.basicConfig(level=logging.INFO)
     logger = logging.getLogger(__name__)
+    if logger.handlers:
+        return logger  # Already configured
     logger.setLevel(logging.INFO)
     handler = logging.StreamHandler(sys.stderr)
     handler.setLevel(logging.INFO)
     logger.addHandler(handler)
     logger.propagate = False
     return logger

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 85 - 97,
get_logger currently always adds a StreamHandler which causes duplicate handlers
if called multiple times; modify get_logger so it first obtains the logger via
logging.getLogger(__name__) and only adds a new StreamHandler when the logger
has no handlers (e.g., if not logger.handlers or not any(isinstance(h,
logging.StreamHandler) for h in logger.handlers)); keep setting levels
(logger.setLevel, handler.setLevel) and logger.propagate = False; ensure LOGGER
= get_logger() behavior is unchanged.

604-616: Consider explicit GPU memory cleanup for X_device.

X_device (cupy array) is created inside the batch_size loop but not explicitly deleted. While Python's GC should handle this, explicit cleanup with del X_device after each iteration and a cp.get_default_memory_pool().free_all_blocks() call could help prevent GPU OOM during benchmarks with many configurations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 604 - 616, The
benchmark loop creates a GPU array X_device when device == "gpu" (in
benchmark.py) but never frees it; after each iteration where X_device was
created, ensure you explicitly delete the reference (del X_device) and call
cp.get_default_memory_pool().free_all_blocks() to release GPU memory, guarding
the free call with a try/except if cupy may be unavailable and only invoking it
when device == "gpu" and cp was successfully imported; place this cleanup at the
end of the batch_size/config iteration to avoid GPU OOMs during long runs.

219-235: GPU fallback doesn't use cached availability check.

The code sets _LIGHTGBM_GPU_AVAILABLE = False on failure but doesn't check it before attempting GPU prediction. This means every GPU prediction call will attempt GPU first, fail, then fall back to CPU. Consider checking the cached flag to skip the GPU attempt:

♻️ Proposed improvement

     if device == "gpu":
+        if _LIGHTGBM_GPU_CHECKED and not _LIGHTGBM_GPU_AVAILABLE:
+            return model.predict(X)
         try:
             # LightGBM GPU prediction - works if model was trained with GPU
             # and LightGBM was built with GPU support
             return model.predict(X, num_threads=1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 219 - 235, The
GPU fallback repeatedly attempts GPU because the cached flag isn't checked;
update the device=="gpu" branch to short-circuit using the module flags: if
_LIGHTGBM_GPU_CHECKED and _LIGHTGBM_GPU_AVAILABLE is False, call
model.predict(X) immediately; otherwise try the GPU path (model.predict(X,
num_threads=1)) inside the try/except and on success set _LIGHTGBM_GPU_CHECKED =
True and _LIGHTGBM_GPU_AVAILABLE = True, and on lgb.basic.LightGBMError set
_LIGHTGBM_GPU_CHECKED = True and _LIGHTGBM_GPU_AVAILABLE = False, log via
LOGGER.warning, then fall back to model.predict(X). Ensure you reference the
existing symbols _LIGHTGBM_GPU_CHECKED, _LIGHTGBM_GPU_AVAILABLE, LOGGER, and
model.predict when making the change.

python/nvforest/nvforest/benchmark/README.md (1)

35-35: Clarify default output directory path.

The default is documented as benchmark/data/ but the actual code uses a path relative to the benchmark module (nvforest/benchmark/data/). Consider clarifying this is relative to the package installation, or documenting the absolute path pattern users should expect.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/README.md` at line 35, Update the README
entry for `--output-dir` to clarify that the documented default
`benchmark/data/` is actually relative to the installed package and points to
the package resource directory `nvforest/benchmark/data/`; change the
description for the `--output-dir`/`-o` row to state it is relative to the
nvforest package installation (e.g., under site-packages as
`<...>/nvforest/benchmark/data/`) or show the absolute pattern users should
expect, so the `--output-dir` default documented matches the code's actual
package-relative path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/nvforest/nvforest/benchmark/analyze.py`:
- Around line 64-68: The axes reshaping logic in analyze.py can fail when both
len(max_depth_values)==1 and len(num_trees_values)==1 because plt.subplots(1, 1)
returns a single Axes object (not an array); update the code that currently
checks max_depth_values and num_trees_values and reshapes axes to first ensure
axes is an array (e.g., wrap single Axes into a 2D array or use
numpy.atleast_2d/np.array) before calling .reshape, or handle the single-Axes
case explicitly so axes is always an indexable 2D structure; adjust the branches
that reference axes, max_depth_values, num_trees_values and plt.subplots
accordingly.

In `@python/nvforest/nvforest/benchmark/benchmark.py`:
- Around line 628-637: The lambda passed into run_inference_benchmark captures
framework by reference which can lead to late-binding bugs; explicitly bind
framework into the lambda's defaults (e.g., add f=framework to the lambda
signature) so that the call to framework.predict_native uses the current
iteration's framework instance; update the lambda used in
run_inference_benchmark (where native_model, device, native_data are used) to
include that extra default argument so predict_native is called on the correct
framework.
- Around line 184-199: Remove the unused helper and redundant cache: delete the
entire _check_lightgbm_gpu_available() function and remove the
_LIGHTGBM_GPU_AVAILABLE variable (and any assignments to it) from the module;
keep and continue using _LIGHTGBM_GPU_CHECKED for deduping warnings, and ensure
any code that previously set _LIGHTGBM_GPU_AVAILABLE instead updates/uses only
_LIGHTGBM_GPU_CHECKED or checks LightGBM availability directly where needed
(refer to symbols _check_lightgbm_gpu_available, _LIGHTGBM_GPU_AVAILABLE, and
_LIGHTGBM_GPU_CHECKED).

In `@python/nvforest/nvforest/benchmark/README.md`:
- Around line 115-117: Update the pip/GPU installation instructions in the
README: replace the deprecated use of pip's --install-option=--gpu with a simple
"pip install lightgbm" and add a short note that the PyPI wheel includes OpenCL
GPU support and that users must install appropriate OpenCL drivers/runtimes;
also add a clarifying sentence that CUDA backend requires building from source
with CMake and -DUSE_CUDA=ON so readers know which GPU backend the benchmark
uses and what driver prerequisites are required.

---

Nitpick comments:
In `@python/nvforest/nvforest/benchmark/analyze.py`:
- Around line 275-281: The except block for ImportError when calling
plot_speedup_heatmaps should chain the original exception when re-raising so
tracebacks include the root cause; replace the current raise
click.ClickException(str(e)) with a chained raise (i.e., raise
click.ClickException(str(e)) from e) in the analyze.py except block that handles
ImportError around plot_speedup_heatmaps.
- Around line 43-50: The ImportError being raised in the top-level import
try/except in analyze.py should chain the original exception for better
traceback context; in the except block that catches the import failure, capture
the exception (e.g., "except ImportError as err:") and re-raise the new
ImportError using "from err" so the original import error is preserved when
raising the message about matplotlib/seaborn.

In `@python/nvforest/nvforest/benchmark/benchmark.py`:
- Around line 357-430: The docstring for train_model is missing a Returns
section; add a NumPy-style "Returns" block to the function docstring (above the
implementation in train_model) that documents the returned trained model (type
Any or instance of framework.regressor_class/framework.classifier_class) and
briefly states that the fitted model is returned after calling
model.fit(X_train, y_train). Include the return type and a short description so
callers and autodocs know what to expect.
- Around line 85-97: get_logger currently always adds a StreamHandler which
causes duplicate handlers if called multiple times; modify get_logger so it
first obtains the logger via logging.getLogger(__name__) and only adds a new
StreamHandler when the logger has no handlers (e.g., if not logger.handlers or
not any(isinstance(h, logging.StreamHandler) for h in logger.handlers)); keep
setting levels (logger.setLevel, handler.setLevel) and logger.propagate = False;
ensure LOGGER = get_logger() behavior is unchanged.
- Around line 604-616: The benchmark loop creates a GPU array X_device when
device == "gpu" (in benchmark.py) but never frees it; after each iteration where
X_device was created, ensure you explicitly delete the reference (del X_device)
and call cp.get_default_memory_pool().free_all_blocks() to release GPU memory,
guarding the free call with a try/except if cupy may be unavailable and only
invoking it when device == "gpu" and cp was successfully imported; place this
cleanup at the end of the batch_size/config iteration to avoid GPU OOMs during
long runs.
- Around line 219-235: The GPU fallback repeatedly attempts GPU because the
cached flag isn't checked; update the device=="gpu" branch to short-circuit
using the module flags: if _LIGHTGBM_GPU_CHECKED and _LIGHTGBM_GPU_AVAILABLE is
False, call model.predict(X) immediately; otherwise try the GPU path
(model.predict(X, num_threads=1)) inside the try/except and on success set
_LIGHTGBM_GPU_CHECKED = True and _LIGHTGBM_GPU_AVAILABLE = True, and on
lgb.basic.LightGBMError set _LIGHTGBM_GPU_CHECKED = True and
_LIGHTGBM_GPU_AVAILABLE = False, log via LOGGER.warning, then fall back to
model.predict(X). Ensure you reference the existing symbols
_LIGHTGBM_GPU_CHECKED, _LIGHTGBM_GPU_AVAILABLE, LOGGER, and model.predict when
making the change.

In `@python/nvforest/nvforest/benchmark/README.md`:
- Line 35: Update the README entry for `--output-dir` to clarify that the
documented default `benchmark/data/` is actually relative to the installed
package and points to the package resource directory `nvforest/benchmark/data/`;
change the description for the `--output-dir`/`-o` row to state it is relative
to the nvforest package installation (e.g., under site-packages as
`<...>/nvforest/benchmark/data/`) or show the absolute pattern users should
expect, so the `--output-dir` default documented matches the code's actual
package-relative path.

ℹ️ Review info

Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dd64f002-16f8-4baa-884a-9b5d8fac4046

📥 Commits

Reviewing files that changed from the base of the PR and between afb6731 and 5f8c30a.

📒 Files selected for processing (6)

python/nvforest/nvforest/benchmark/README.md
python/nvforest/nvforest/benchmark/__init__.py
python/nvforest/nvforest/benchmark/analyze.py
python/nvforest/nvforest/benchmark/benchmark.py
python/nvforest/nvforest/benchmark/data/.gitkeep
python/nvforest/tests/test_benchmark.py

coderabbitai · 2026-03-04T04:22:02Z

python/nvforest/nvforest/benchmark/analyze.py

+    # Handle single row/column case
+    if len(max_depth_values) == 1:
+        axes = axes.reshape(1, -1)
+    if len(num_trees_values) == 1:
+        axes = axes.reshape(-1, 1)


⚠️ Potential issue | 🟡 Minor

Potential error when both dimensions equal 1.

When len(max_depth_values) == 1 and len(num_trees_values) == 1, plt.subplots(1, 1) returns a single Axes object (not an array), and calling .reshape() on it will raise AttributeError.

🐛 Proposed fix

# Handle single row/column case + if len(max_depth_values) == 1 and len(num_trees_values) == 1: + axes = np.array([[axes]]) - if len(max_depth_values) == 1: + elif len(max_depth_values) == 1: axes = axes.reshape(1, -1) - if len(num_trees_values) == 1: + elif len(num_trees_values) == 1: axes = axes.reshape(-1, 1)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Handle single row/column case

if len(max_depth_values) == 1:

axes = axes.reshape(1, -1)

if len(num_trees_values) == 1:

axes = axes.reshape(-1, 1)

# Handle single row/column case

if len(max_depth_values) == 1 and len(num_trees_values) == 1:

axes = np.array([[axes]])

elif len(max_depth_values) == 1:

axes = axes.reshape(1, -1)

elif len(num_trees_values) == 1:

axes = axes.reshape(-1, 1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 64 - 68, The axes reshaping logic in analyze.py can fail when both len(max_depth_values)==1 and len(num_trees_values)==1 because plt.subplots(1, 1) returns a single Axes object (not an array); update the code that currently checks max_depth_values and num_trees_values and reshapes axes to first ensure axes is an array (e.g., wrap single Axes into a 2D array or use numpy.atleast_2d/np.array) before calling .reshape, or handle the single-Axes case explicitly so axes is always an indexable 2D structure; adjust the branches that reference axes, max_depth_values, num_trees_values and plt.subplots accordingly.

coderabbitai · 2026-03-04T04:22:02Z

python/nvforest/nvforest/benchmark/benchmark.py

+def _check_lightgbm_gpu_available() -> bool:
+    """Check if LightGBM was built with GPU support."""
+    try:
+        # Try to create a dummy dataset and check if GPU device works
+        # This is a lightweight check that doesn't require actual training
+        params = {"device": "gpu", "gpu_platform_id": 0, "gpu_device_id": 0}
+        _ = lgb.Dataset([], params=params)
+        # LightGBM will raise an error if GPU is not available when we try to use it
+        return True  # Assume available, will fail gracefully during training/predict
+    except Exception:
+        return False
+
+
+# Cache for LightGBM GPU availability
+_LIGHTGBM_GPU_CHECKED = False
+_LIGHTGBM_GPU_AVAILABLE = False


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if _check_lightgbm_gpu_available is called anywhere rg "_check_lightgbm_gpu_available" python/nvforest/ --type py

Repository: rapidsai/nvforest

Length of output: 153

🏁 Script executed:

# Check if the cache variables are used rg "_LIGHTGBM_GPU_CHECKED|_LIGHTGBM_GPU_AVAILABLE" python/nvforest/ --type py -B 2 -A 2

Repository: rapidsai/nvforest

Length of output: 1539

🏁 Script executed:

# Also check the _predict_lightgbm function to understand cache usage rg -A 30 "def _predict_lightgbm" python/nvforest/nvforest/benchmark/benchmark.py

Repository: rapidsai/nvforest

Length of output: 1287

🏁 Script executed:

# Check where _LIGHTGBM_GPU_AVAILABLE is read/used rg "_LIGHTGBM_GPU_AVAILABLE" python/nvforest/ --type py -B 3 -A 3

Repository: rapidsai/nvforest

Length of output: 1823

Remove unused _check_lightgbm_gpu_available() function and redundant _LIGHTGBM_GPU_AVAILABLE cache variable.

_check_lightgbm_gpu_available() is defined but never called and should be removed. Additionally, while _LIGHTGBM_GPU_CHECKED is correctly used to deduplicate warning logs, _LIGHTGBM_GPU_AVAILABLE is set in the exception handler but never actually read anywhere in the code, making it redundant.

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 192-192: Consider moving this statement to an else block

(TRY300)

[warning] 193-193: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 184 - 199, Remove the unused helper and redundant cache: delete the entire _check_lightgbm_gpu_available() function and remove the _LIGHTGBM_GPU_AVAILABLE variable (and any assignments to it) from the module; keep and continue using _LIGHTGBM_GPU_CHECKED for deduping warnings, and ensure any code that previously set _LIGHTGBM_GPU_AVAILABLE instead updates/uses only _LIGHTGBM_GPU_CHECKED or checks LightGBM availability directly where needed (refer to symbols _check_lightgbm_gpu_available, _LIGHTGBM_GPU_AVAILABLE, and _LIGHTGBM_GPU_CHECKED).

coderabbitai · 2026-03-04T04:22:02Z

python/nvforest/nvforest/benchmark/benchmark.py

+                                try:
+                                    native_time = run_inference_benchmark(
+                                        lambda batch,
+                                        m=native_model,
+                                        d=device: framework.predict_native(
+                                            m, batch, d
+                                        ),
+                                        native_data,
+                                        batch_size,
+                                    )


⚠️ Potential issue | 🟡 Minor

Lambda captures loop variable framework by reference.

The lambda correctly binds native_model and device as default arguments but captures framework by reference. While safe in the current code structure (lambda is used immediately within the same iteration), this is a latent bug waiting to happen if the code is refactored.

🐛 Proposed fix

try: native_time = run_inference_benchmark( lambda batch, m=native_model, - d=device: framework.predict_native( + d=device, + f=framework: f.predict_native( m, batch, d ), native_data, batch_size, )

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 632-632: Function definition does not bind loop variable framework

(B023)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 628 - 637, The lambda passed into run_inference_benchmark captures framework by reference which can lead to late-binding bugs; explicitly bind framework into the lambda's defaults (e.g., add f=framework to the lambda signature) so that the call to framework.predict_native uses the current iteration's framework instance; update the lambda used in run_inference_benchmark (where native_model, device, native_data are used) to include that extra default argument so predict_native is called on the correct framework.

coderabbitai · 2026-03-04T04:22:02Z

python/nvforest/nvforest/benchmark/README.md

+  # Option 2: pip with GPU flag
+  pip install lightgbm --install-option=--gpu
+  ```


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

LightGBM GPU installation pip 2025 modern method

💡 Result:

Modern (2025+) pip install methods for LightGBM GPU

A) Easiest: GPU via OpenCL using official PyPI wheels (Windows / Linux)

Install LightGBM:
pip install -U lightgbm

Ensure an OpenCL runtime/driver is installed.

The LightGBM PyPI wheels include both CPU+GPU (OpenCL) support; you typically just need GPU drivers / OpenCL runtime present. (pypi.org)

Use it:

In parameters, use device_type="gpu" (OpenCL). (GPU build details are in the official install guide.) (lightgbm.readthedocs.io)

Notes:

This OpenCL GPU path is Windows + Linux only; macOS GPU is not supported. (lightgbm.readthedocs.io)

B) NVIDIA CUDA (Linux): device_type="cuda" usually means build from source

If you specifically want the CUDA backend (device_type="cuda"), LightGBM treats it as a separate implementation and documents building with CMake using -DUSE_CUDA=ON (Linux; Windows CUDA backend not supported). (lightgbm.readthedocs.io)

C) AMD ROCm (Linux): install AMD’s ROCm wheel via pip

If you’re on AMD Instinct + ROCm, AMD provides a ROCm-enabled wheel:

pip install amd_lightgbm --extra-index-url=https://pypi.amd.com/rocm-7.0.2/simple

(or the matching ROCm version index). (rocm.docs.amd.com)

Citations:

1: https://pypi.org/project/lightgbm/?utm_source=openai

2: https://lightgbm.readthedocs.io/en/stable/Installation-Guide.html?utm_source=openai

3: https://lightgbm.readthedocs.io/en/stable/Installation-Guide.html?utm_source=openai

4: https://lightgbm.readthedocs.io/en/stable/Installation-Guide.html?utm_source=openai

5: https://rocm.docs.amd.com/projects/lightgbm/en/docs-25.11/install/install.html?utm_source=openai

Update pip installation command for LightGBM GPU support.

The --install-option flag is deprecated in pip 21.3+ and removed in pip 23.1. However, the modern approach is simpler than using --config-settings: official LightGBM PyPI wheels include GPU (OpenCL) support built-in.

Update to:

# Option 2: pip with GPU (OpenCL) support pip install lightgbm # Ensure OpenCL drivers/runtime are installed for GPU acceleration

Note: If CUDA backend is needed instead of OpenCL, building from source with CMake is required (-DUSE_CUDA=ON). Add a note clarifying which GPU backend the benchmark uses and any driver prerequisites.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/README.md` around lines 115 - 117, Update the pip/GPU installation instructions in the README: replace the deprecated use of pip's --install-option=--gpu with a simple "pip install lightgbm" and add a short note that the PyPI wheel includes OpenCL GPU support and that users must install appropriate OpenCL drivers/runtimes; also add a clarifying sentence that CUDA backend requires building from source with CMake and -DUSE_CUDA=ON so readers know which GPU backend the benchmark uses and what driver prerequisites are required.

dantegd and others added 8 commits January 15, 2026 16:12

FEA Make optimize() return a new instance instead of mutating in-place

9ae25df

fix formatting

656548d

FIX for Python 3.10

444abfe

Merge branch 'fea-optimize-new-instance' of github.com:dantegd/cufore…

ff1f476

…st into fea-optimize-new-instance

Apply suggestions from code review

5a02152

Co-authored-by: Simon Adorf <carl.simon.adorf@gmail.com>

ENH make _create_with_layout a classmethod

ff92aa8

FEA Add comprehensive benchmark suite comparing cuforest against skle…

c7eb560

…arn, XGBoost, and LightGBM native inference

ENH Move benchmark to Python package and add README documentation

60aeacd

github-actions bot added the Cython / Python label Jan 16, 2026

dantegd and others added 6 commits January 20, 2026 09:50

ENH Add pytests, improve readme and some code improvements

a25d1ab

Merge main

9cec4c0

ENH LightGBM GPU code improvements

4ac9152

Merge branch 'main' into fea-bench

bf1b24d

Merge remote-tracking branch 'origin/main' into fea-bench

1c952ed

Merge branch 'main' into fea-bench

5379142

hcho3 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Feb 28, 2026

hcho3 added 2 commits February 28, 2026 00:06

Rename cuforest -> nvforest

71b1209

Add missing GPU check for LightGBM

5f8c30a

hcho3 marked this pull request as ready for review March 4, 2026 04:15

hcho3 requested a review from a team as a code owner March 4, 2026 04:15

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comprehensive benchmark suite comparing cuforest against sklearn, XGBoost, and LightGBM native inference#32

Add comprehensive benchmark suite comparing cuforest against sklearn, XGBoost, and LightGBM native inference#32
dantegd wants to merge 16 commits intorapidsai:mainfrom
dantegd:fea-bench

dantegd commented Jan 16, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 4, 2026

Uh oh!

coderabbitai bot Mar 4, 2026

Uh oh!

coderabbitai bot Mar 4, 2026

Uh oh!

coderabbitai bot Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dantegd commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features

Usage

Files

todo:

Uh oh!

copy-pr-bot bot commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Modern (2025+) pip install methods for LightGBM GPU

A) Easiest: GPU via OpenCL using official PyPI wheels (Windows / Linux)

B) NVIDIA CUDA (Linux): device_type="cuda" usually means build from source

C) AMD ROCm (Linux): install AMD’s ROCm wheel via pip

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dantegd commented Jan 16, 2026 •

edited

Loading

B) NVIDIA CUDA (Linux): `device_type="cuda"` usually means build from source