Skip to content

Add comprehensive benchmark suite comparing cuforest against sklearn, XGBoost, and LightGBM native inference#32

Open
dantegd wants to merge 16 commits intorapidsai:mainfrom
dantegd:fea-bench
Open

Add comprehensive benchmark suite comparing cuforest against sklearn, XGBoost, and LightGBM native inference#32
dantegd wants to merge 16 commits intorapidsai:mainfrom
dantegd:fea-bench

Conversation

@dantegd
Copy link
Member

@dantegd dantegd commented Jan 16, 2026

PR adds a comprehensive benchmark suite for comparing cuforest inference performance against native ML framework inference.

Features

  • Multi-framework support: sklearn, XGBoost, and LightGBM
  • Model types: Both regressors and classifiers
  • Devices: CPU and GPU benchmarking
  • CLI options:
    • --framework / -f: Select specific frameworks
    • --dry-run / -n: Preview configuration without running
    • --quick-test / -q: Minimal parameters for quick verification
    • --device / -d: Select cpu, gpu, or both
    • --model-type / -m: Select regressor, classifier, or both

Usage

python -m cuforest.benchmark.benchmark run --dry-run
python -m cuforest.benchmark.benchmark run --quick-test
python -m cuforest.benchmark.analyze data/final_results.csv

Files

  • cuforest/benchmark/benchmark.py - Main benchmark with CLI
  • cuforest/benchmark/analyze.py - Results analysis and visualization
  • cuforest/benchmark/README.md - Documentation

todo:

  • Add tests
  • Do a local e2e run to check correctness (In progress)

Note: Builds on top of #31

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hcho3 hcho3 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Feb 28, 2026
@hcho3 hcho3 marked this pull request as ready for review March 4, 2026 04:15
@hcho3 hcho3 requested a review from a team as a code owner March 4, 2026 04:15
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Benchmark suite for comparing nvforest performance against sklearn, XGBoost, and LightGBM across CPU and GPU devices.
    • Analysis tools for generating speedup visualizations, computing statistical summaries, and filtering results by framework and device.
  • Documentation

    • Comprehensive benchmark guide with setup instructions, usage examples, parameter reference, and output format details.
  • Tests

    • Full test coverage for benchmark execution, analysis functionality, and CLI interactions.

Walkthrough

Adds a complete benchmarking suite for nvforest comparing regressor and classifier models against sklearn, XGBoost, and LightGBM on CPU and GPU. Includes documentation, framework integration, benchmark execution, result analysis with visualization, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Documentation and Package Setup
python/nvforest/nvforest/benchmark/README.md, python/nvforest/nvforest/benchmark/__init__.py
README documenting quick-start commands, parameter grids, device handling per framework, output CSV structure, and dependencies. Module initializer with license headers and docstring.
Core Benchmark Implementation
python/nvforest/nvforest/benchmark/benchmark.py
Comprehensive benchmarking module with FrameworkConfig dataclass for sklearn, XGBoost, LightGBM integration; helper functions for model save/load/predict with GPU awareness; run_inference_benchmark for timed inference; train_model, generate_data, write_checkpoint utilities; run_benchmark_suite orchestrating end-to-end execution across feature counts, depths, trees, frameworks, devices, and batch sizes; Click CLI with run command supporting dry-run and quick-test modes.
Analysis and Visualization
python/nvforest/nvforest/benchmark/analyze.py
Analysis module providing plot_speedup_heatmaps for grid visualization across model configurations, generate_summary_stats for grouped statistics, print_summary for human-readable reporting, and Click-based analyze CLI for CSV result parsing with optional framework/device filtering and output control.
Test Coverage
python/nvforest/tests/test_benchmark.py
Comprehensive test suite validating framework configs, parameter space structure, data generation reproducibility, model training, inference benchmarking, checkpoint writing, CLI behavior, device handling across frameworks, and end-to-end model I/O workflows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title refers to 'cuforest' but the actual changes are to 'nvforest'. The codebase and files changed use 'nvforest', while the title uses the old/incorrect name, making it misleading about which project is being modified. Update the title to replace 'cuforest' with 'nvforest' to accurately reflect the actual package being modified.
Description check ⚠️ Warning The description mentions 'cuforest' in code examples and file paths, but the actual PR implements 'nvforest'. While the description is otherwise related to the changeset, the repeated use of the incorrect project name is misleading. Replace all instances of 'cuforest' with 'nvforest' in the description, code examples, and file paths to match the actual implementation.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (7)
python/nvforest/nvforest/benchmark/analyze.py (2)

275-281: Chain the exception for consistent error handling.

Similar to the earlier issue, chain the exception for better debugging.

♻️ Proposed fix
         try:
             plot_speedup_heatmaps(df, title=title, output_filename=output)
         except ImportError as e:
             if not plot_only:
                 print(f"\nNote: {e}")
             else:
-                raise click.ClickException(str(e))
+                raise click.ClickException(str(e)) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 275 - 281, The
except block for ImportError when calling plot_speedup_heatmaps should chain the
original exception when re-raising so tracebacks include the root cause; replace
the current raise click.ClickException(str(e)) with a chained raise (i.e., raise
click.ClickException(str(e)) from e) in the analyze.py except block that handles
ImportError around plot_speedup_heatmaps.

43-50: Chain the exception for better debugging context.

When re-raising an exception, use from err to preserve the exception chain, making debugging easier.

♻️ Proposed fix
     try:
         import matplotlib.pyplot as plt
         import seaborn as sns
-    except ImportError:
+    except ImportError as err:
         raise ImportError(
             "matplotlib and seaborn are required for plotting. "
             "Install them with: pip install matplotlib seaborn"
-        )
+        ) from err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 43 - 50, The
ImportError being raised in the top-level import try/except in analyze.py should
chain the original exception for better traceback context; in the except block
that catches the import failure, capture the exception (e.g., "except
ImportError as err:") and re-raise the new ImportError using "from err" so the
original import error is preserved when raising the message about
matplotlib/seaborn.
python/nvforest/nvforest/benchmark/benchmark.py (4)

357-430: Consider adding Returns section to docstring.

The function has a good NumPy-style docstring but is missing the Returns section to document that it returns the trained model.

📝 Proposed docstring addition
     device : str
         Device to use for training ('cpu' or 'gpu').
         Only affects XGBoost currently.
+
+    Returns
+    -------
+    Any
+        The trained model instance.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 357 - 430, The
docstring for train_model is missing a Returns section; add a NumPy-style
"Returns" block to the function docstring (above the implementation in
train_model) that documents the returned trained model (type Any or instance of
framework.regressor_class/framework.classifier_class) and briefly states that
the fitted model is returned after calling model.fit(X_train, y_train). Include
the return type and a short description so callers and autodocs know what to
expect.

85-97: Logger adds duplicate handlers if get_logger() is called multiple times.

While currently called once at module level, if get_logger() were called again, it would add duplicate handlers. Consider guarding against this.

♻️ Proposed fix
 def get_logger():
     """Configure and return the benchmark logger."""
     logging.basicConfig(level=logging.INFO)
     logger = logging.getLogger(__name__)
+    if logger.handlers:
+        return logger  # Already configured
     logger.setLevel(logging.INFO)
     handler = logging.StreamHandler(sys.stderr)
     handler.setLevel(logging.INFO)
     logger.addHandler(handler)
     logger.propagate = False
     return logger
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 85 - 97,
get_logger currently always adds a StreamHandler which causes duplicate handlers
if called multiple times; modify get_logger so it first obtains the logger via
logging.getLogger(__name__) and only adds a new StreamHandler when the logger
has no handlers (e.g., if not logger.handlers or not any(isinstance(h,
logging.StreamHandler) for h in logger.handlers)); keep setting levels
(logger.setLevel, handler.setLevel) and logger.propagate = False; ensure LOGGER
= get_logger() behavior is unchanged.

604-616: Consider explicit GPU memory cleanup for X_device.

X_device (cupy array) is created inside the batch_size loop but not explicitly deleted. While Python's GC should handle this, explicit cleanup with del X_device after each iteration and a cp.get_default_memory_pool().free_all_blocks() call could help prevent GPU OOM during benchmarks with many configurations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 604 - 616, The
benchmark loop creates a GPU array X_device when device == "gpu" (in
benchmark.py) but never frees it; after each iteration where X_device was
created, ensure you explicitly delete the reference (del X_device) and call
cp.get_default_memory_pool().free_all_blocks() to release GPU memory, guarding
the free call with a try/except if cupy may be unavailable and only invoking it
when device == "gpu" and cp was successfully imported; place this cleanup at the
end of the batch_size/config iteration to avoid GPU OOMs during long runs.

219-235: GPU fallback doesn't use cached availability check.

The code sets _LIGHTGBM_GPU_AVAILABLE = False on failure but doesn't check it before attempting GPU prediction. This means every GPU prediction call will attempt GPU first, fail, then fall back to CPU. Consider checking the cached flag to skip the GPU attempt:

♻️ Proposed improvement
     if device == "gpu":
+        if _LIGHTGBM_GPU_CHECKED and not _LIGHTGBM_GPU_AVAILABLE:
+            return model.predict(X)
         try:
             # LightGBM GPU prediction - works if model was trained with GPU
             # and LightGBM was built with GPU support
             return model.predict(X, num_threads=1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 219 - 235, The
GPU fallback repeatedly attempts GPU because the cached flag isn't checked;
update the device=="gpu" branch to short-circuit using the module flags: if
_LIGHTGBM_GPU_CHECKED and _LIGHTGBM_GPU_AVAILABLE is False, call
model.predict(X) immediately; otherwise try the GPU path (model.predict(X,
num_threads=1)) inside the try/except and on success set _LIGHTGBM_GPU_CHECKED =
True and _LIGHTGBM_GPU_AVAILABLE = True, and on lgb.basic.LightGBMError set
_LIGHTGBM_GPU_CHECKED = True and _LIGHTGBM_GPU_AVAILABLE = False, log via
LOGGER.warning, then fall back to model.predict(X). Ensure you reference the
existing symbols _LIGHTGBM_GPU_CHECKED, _LIGHTGBM_GPU_AVAILABLE, LOGGER, and
model.predict when making the change.
python/nvforest/nvforest/benchmark/README.md (1)

35-35: Clarify default output directory path.

The default is documented as benchmark/data/ but the actual code uses a path relative to the benchmark module (nvforest/benchmark/data/). Consider clarifying this is relative to the package installation, or documenting the absolute path pattern users should expect.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/README.md` at line 35, Update the README
entry for `--output-dir` to clarify that the documented default
`benchmark/data/` is actually relative to the installed package and points to
the package resource directory `nvforest/benchmark/data/`; change the
description for the `--output-dir`/`-o` row to state it is relative to the
nvforest package installation (e.g., under site-packages as
`<...>/nvforest/benchmark/data/`) or show the absolute pattern users should
expect, so the `--output-dir` default documented matches the code's actual
package-relative path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/nvforest/nvforest/benchmark/analyze.py`:
- Around line 64-68: The axes reshaping logic in analyze.py can fail when both
len(max_depth_values)==1 and len(num_trees_values)==1 because plt.subplots(1, 1)
returns a single Axes object (not an array); update the code that currently
checks max_depth_values and num_trees_values and reshapes axes to first ensure
axes is an array (e.g., wrap single Axes into a 2D array or use
numpy.atleast_2d/np.array) before calling .reshape, or handle the single-Axes
case explicitly so axes is always an indexable 2D structure; adjust the branches
that reference axes, max_depth_values, num_trees_values and plt.subplots
accordingly.

In `@python/nvforest/nvforest/benchmark/benchmark.py`:
- Around line 628-637: The lambda passed into run_inference_benchmark captures
framework by reference which can lead to late-binding bugs; explicitly bind
framework into the lambda's defaults (e.g., add f=framework to the lambda
signature) so that the call to framework.predict_native uses the current
iteration's framework instance; update the lambda used in
run_inference_benchmark (where native_model, device, native_data are used) to
include that extra default argument so predict_native is called on the correct
framework.
- Around line 184-199: Remove the unused helper and redundant cache: delete the
entire _check_lightgbm_gpu_available() function and remove the
_LIGHTGBM_GPU_AVAILABLE variable (and any assignments to it) from the module;
keep and continue using _LIGHTGBM_GPU_CHECKED for deduping warnings, and ensure
any code that previously set _LIGHTGBM_GPU_AVAILABLE instead updates/uses only
_LIGHTGBM_GPU_CHECKED or checks LightGBM availability directly where needed
(refer to symbols _check_lightgbm_gpu_available, _LIGHTGBM_GPU_AVAILABLE, and
_LIGHTGBM_GPU_CHECKED).

In `@python/nvforest/nvforest/benchmark/README.md`:
- Around line 115-117: Update the pip/GPU installation instructions in the
README: replace the deprecated use of pip's --install-option=--gpu with a simple
"pip install lightgbm" and add a short note that the PyPI wheel includes OpenCL
GPU support and that users must install appropriate OpenCL drivers/runtimes;
also add a clarifying sentence that CUDA backend requires building from source
with CMake and -DUSE_CUDA=ON so readers know which GPU backend the benchmark
uses and what driver prerequisites are required.

---

Nitpick comments:
In `@python/nvforest/nvforest/benchmark/analyze.py`:
- Around line 275-281: The except block for ImportError when calling
plot_speedup_heatmaps should chain the original exception when re-raising so
tracebacks include the root cause; replace the current raise
click.ClickException(str(e)) with a chained raise (i.e., raise
click.ClickException(str(e)) from e) in the analyze.py except block that handles
ImportError around plot_speedup_heatmaps.
- Around line 43-50: The ImportError being raised in the top-level import
try/except in analyze.py should chain the original exception for better
traceback context; in the except block that catches the import failure, capture
the exception (e.g., "except ImportError as err:") and re-raise the new
ImportError using "from err" so the original import error is preserved when
raising the message about matplotlib/seaborn.

In `@python/nvforest/nvforest/benchmark/benchmark.py`:
- Around line 357-430: The docstring for train_model is missing a Returns
section; add a NumPy-style "Returns" block to the function docstring (above the
implementation in train_model) that documents the returned trained model (type
Any or instance of framework.regressor_class/framework.classifier_class) and
briefly states that the fitted model is returned after calling
model.fit(X_train, y_train). Include the return type and a short description so
callers and autodocs know what to expect.
- Around line 85-97: get_logger currently always adds a StreamHandler which
causes duplicate handlers if called multiple times; modify get_logger so it
first obtains the logger via logging.getLogger(__name__) and only adds a new
StreamHandler when the logger has no handlers (e.g., if not logger.handlers or
not any(isinstance(h, logging.StreamHandler) for h in logger.handlers)); keep
setting levels (logger.setLevel, handler.setLevel) and logger.propagate = False;
ensure LOGGER = get_logger() behavior is unchanged.
- Around line 604-616: The benchmark loop creates a GPU array X_device when
device == "gpu" (in benchmark.py) but never frees it; after each iteration where
X_device was created, ensure you explicitly delete the reference (del X_device)
and call cp.get_default_memory_pool().free_all_blocks() to release GPU memory,
guarding the free call with a try/except if cupy may be unavailable and only
invoking it when device == "gpu" and cp was successfully imported; place this
cleanup at the end of the batch_size/config iteration to avoid GPU OOMs during
long runs.
- Around line 219-235: The GPU fallback repeatedly attempts GPU because the
cached flag isn't checked; update the device=="gpu" branch to short-circuit
using the module flags: if _LIGHTGBM_GPU_CHECKED and _LIGHTGBM_GPU_AVAILABLE is
False, call model.predict(X) immediately; otherwise try the GPU path
(model.predict(X, num_threads=1)) inside the try/except and on success set
_LIGHTGBM_GPU_CHECKED = True and _LIGHTGBM_GPU_AVAILABLE = True, and on
lgb.basic.LightGBMError set _LIGHTGBM_GPU_CHECKED = True and
_LIGHTGBM_GPU_AVAILABLE = False, log via LOGGER.warning, then fall back to
model.predict(X). Ensure you reference the existing symbols
_LIGHTGBM_GPU_CHECKED, _LIGHTGBM_GPU_AVAILABLE, LOGGER, and model.predict when
making the change.

In `@python/nvforest/nvforest/benchmark/README.md`:
- Line 35: Update the README entry for `--output-dir` to clarify that the
documented default `benchmark/data/` is actually relative to the installed
package and points to the package resource directory `nvforest/benchmark/data/`;
change the description for the `--output-dir`/`-o` row to state it is relative
to the nvforest package installation (e.g., under site-packages as
`<...>/nvforest/benchmark/data/`) or show the absolute pattern users should
expect, so the `--output-dir` default documented matches the code's actual
package-relative path.

ℹ️ Review info
Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dd64f002-16f8-4baa-884a-9b5d8fac4046

📥 Commits

Reviewing files that changed from the base of the PR and between afb6731 and 5f8c30a.

📒 Files selected for processing (6)
  • python/nvforest/nvforest/benchmark/README.md
  • python/nvforest/nvforest/benchmark/__init__.py
  • python/nvforest/nvforest/benchmark/analyze.py
  • python/nvforest/nvforest/benchmark/benchmark.py
  • python/nvforest/nvforest/benchmark/data/.gitkeep
  • python/nvforest/tests/test_benchmark.py

Comment on lines +64 to +68
# Handle single row/column case
if len(max_depth_values) == 1:
axes = axes.reshape(1, -1)
if len(num_trees_values) == 1:
axes = axes.reshape(-1, 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential error when both dimensions equal 1.

When len(max_depth_values) == 1 and len(num_trees_values) == 1, plt.subplots(1, 1) returns a single Axes object (not an array), and calling .reshape() on it will raise AttributeError.

🐛 Proposed fix
     # Handle single row/column case
+    if len(max_depth_values) == 1 and len(num_trees_values) == 1:
+        axes = np.array([[axes]])
-    if len(max_depth_values) == 1:
+    elif len(max_depth_values) == 1:
         axes = axes.reshape(1, -1)
-    if len(num_trees_values) == 1:
+    elif len(num_trees_values) == 1:
         axes = axes.reshape(-1, 1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Handle single row/column case
if len(max_depth_values) == 1:
axes = axes.reshape(1, -1)
if len(num_trees_values) == 1:
axes = axes.reshape(-1, 1)
# Handle single row/column case
if len(max_depth_values) == 1 and len(num_trees_values) == 1:
axes = np.array([[axes]])
elif len(max_depth_values) == 1:
axes = axes.reshape(1, -1)
elif len(num_trees_values) == 1:
axes = axes.reshape(-1, 1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 64 - 68, The axes
reshaping logic in analyze.py can fail when both len(max_depth_values)==1 and
len(num_trees_values)==1 because plt.subplots(1, 1) returns a single Axes object
(not an array); update the code that currently checks max_depth_values and
num_trees_values and reshapes axes to first ensure axes is an array (e.g., wrap
single Axes into a 2D array or use numpy.atleast_2d/np.array) before calling
.reshape, or handle the single-Axes case explicitly so axes is always an
indexable 2D structure; adjust the branches that reference axes,
max_depth_values, num_trees_values and plt.subplots accordingly.

Comment on lines +184 to +199
def _check_lightgbm_gpu_available() -> bool:
"""Check if LightGBM was built with GPU support."""
try:
# Try to create a dummy dataset and check if GPU device works
# This is a lightweight check that doesn't require actual training
params = {"device": "gpu", "gpu_platform_id": 0, "gpu_device_id": 0}
_ = lgb.Dataset([], params=params)
# LightGBM will raise an error if GPU is not available when we try to use it
return True # Assume available, will fail gracefully during training/predict
except Exception:
return False


# Cache for LightGBM GPU availability
_LIGHTGBM_GPU_CHECKED = False
_LIGHTGBM_GPU_AVAILABLE = False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if _check_lightgbm_gpu_available is called anywhere
rg "_check_lightgbm_gpu_available" python/nvforest/ --type py

Repository: rapidsai/nvforest

Length of output: 153


🏁 Script executed:

# Check if the cache variables are used
rg "_LIGHTGBM_GPU_CHECKED|_LIGHTGBM_GPU_AVAILABLE" python/nvforest/ --type py -B 2 -A 2

Repository: rapidsai/nvforest

Length of output: 1539


🏁 Script executed:

# Also check the _predict_lightgbm function to understand cache usage
rg -A 30 "def _predict_lightgbm" python/nvforest/nvforest/benchmark/benchmark.py

Repository: rapidsai/nvforest

Length of output: 1287


🏁 Script executed:

# Check where _LIGHTGBM_GPU_AVAILABLE is read/used
rg "_LIGHTGBM_GPU_AVAILABLE" python/nvforest/ --type py -B 3 -A 3

Repository: rapidsai/nvforest

Length of output: 1823


Remove unused _check_lightgbm_gpu_available() function and redundant _LIGHTGBM_GPU_AVAILABLE cache variable.

_check_lightgbm_gpu_available() is defined but never called and should be removed. Additionally, while _LIGHTGBM_GPU_CHECKED is correctly used to deduplicate warning logs, _LIGHTGBM_GPU_AVAILABLE is set in the exception handler but never actually read anywhere in the code, making it redundant.

🧰 Tools
🪛 Ruff (0.15.2)

[warning] 192-192: Consider moving this statement to an else block

(TRY300)


[warning] 193-193: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 184 - 199,
Remove the unused helper and redundant cache: delete the entire
_check_lightgbm_gpu_available() function and remove the _LIGHTGBM_GPU_AVAILABLE
variable (and any assignments to it) from the module; keep and continue using
_LIGHTGBM_GPU_CHECKED for deduping warnings, and ensure any code that previously
set _LIGHTGBM_GPU_AVAILABLE instead updates/uses only _LIGHTGBM_GPU_CHECKED or
checks LightGBM availability directly where needed (refer to symbols
_check_lightgbm_gpu_available, _LIGHTGBM_GPU_AVAILABLE, and
_LIGHTGBM_GPU_CHECKED).

Comment on lines +628 to +637
try:
native_time = run_inference_benchmark(
lambda batch,
m=native_model,
d=device: framework.predict_native(
m, batch, d
),
native_data,
batch_size,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Lambda captures loop variable framework by reference.

The lambda correctly binds native_model and device as default arguments but captures framework by reference. While safe in the current code structure (lambda is used immediately within the same iteration), this is a latent bug waiting to happen if the code is refactored.

🐛 Proposed fix
                                 try:
                                     native_time = run_inference_benchmark(
                                         lambda batch,
                                         m=native_model,
-                                        d=device: framework.predict_native(
+                                        d=device,
+                                        f=framework: f.predict_native(
                                             m, batch, d
                                         ),
                                         native_data,
                                         batch_size,
                                     )
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 632-632: Function definition does not bind loop variable framework

(B023)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 628 - 637, The
lambda passed into run_inference_benchmark captures framework by reference which
can lead to late-binding bugs; explicitly bind framework into the lambda's
defaults (e.g., add f=framework to the lambda signature) so that the call to
framework.predict_native uses the current iteration's framework instance; update
the lambda used in run_inference_benchmark (where native_model, device,
native_data are used) to include that extra default argument so predict_native
is called on the correct framework.

Comment on lines +115 to +117
# Option 2: pip with GPU flag
pip install lightgbm --install-option=--gpu
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

LightGBM GPU installation pip 2025 modern method

💡 Result:

Modern (2025+) pip install methods for LightGBM GPU

A) Easiest: GPU via OpenCL using official PyPI wheels (Windows / Linux)

  1. Install LightGBM:
    pip install -U lightgbm
  2. Ensure an OpenCL runtime/driver is installed.
    • The LightGBM PyPI wheels include both CPU+GPU (OpenCL) support; you typically just need GPU drivers / OpenCL runtime present. (pypi.org)
  3. Use it:
    • In parameters, use device_type="gpu" (OpenCL). (GPU build details are in the official install guide.) (lightgbm.readthedocs.io)

Notes:


B) NVIDIA CUDA (Linux): device_type="cuda" usually means build from source

If you specifically want the CUDA backend (device_type="cuda"), LightGBM treats it as a separate implementation and documents building with CMake using -DUSE_CUDA=ON (Linux; Windows CUDA backend not supported). (lightgbm.readthedocs.io)


C) AMD ROCm (Linux): install AMD’s ROCm wheel via pip

If you’re on AMD Instinct + ROCm, AMD provides a ROCm-enabled wheel:

pip install amd_lightgbm --extra-index-url=https://pypi.amd.com/rocm-7.0.2/simple

(or the matching ROCm version index). (rocm.docs.amd.com)

Citations:


Update pip installation command for LightGBM GPU support.

The --install-option flag is deprecated in pip 21.3+ and removed in pip 23.1. However, the modern approach is simpler than using --config-settings: official LightGBM PyPI wheels include GPU (OpenCL) support built-in.

Update to:

# Option 2: pip with GPU (OpenCL) support
pip install lightgbm
# Ensure OpenCL drivers/runtime are installed for GPU acceleration

Note: If CUDA backend is needed instead of OpenCL, building from source with CMake is required (-DUSE_CUDA=ON). Add a note clarifying which GPU backend the benchmark uses and any driver prerequisites.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/nvforest/nvforest/benchmark/README.md` around lines 115 - 117, Update
the pip/GPU installation instructions in the README: replace the deprecated use
of pip's --install-option=--gpu with a simple "pip install lightgbm" and add a
short note that the PyPI wheel includes OpenCL GPU support and that users must
install appropriate OpenCL drivers/runtimes; also add a clarifying sentence that
CUDA backend requires building from source with CMake and -DUSE_CUDA=ON so
readers know which GPU backend the benchmark uses and what driver prerequisites
are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cython / Python improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants