Add comprehensive benchmark suite comparing cuforest against sklearn, XGBoost, and LightGBM native inference#32
Add comprehensive benchmark suite comparing cuforest against sklearn, XGBoost, and LightGBM native inference#32dantegd wants to merge 16 commits intorapidsai:mainfrom
Conversation
…st into fea-optimize-new-instance
Co-authored-by: Simon Adorf <carl.simon.adorf@gmail.com>
…arn, XGBoost, and LightGBM native inference
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a complete benchmarking suite for nvforest comparing regressor and classifier models against sklearn, XGBoost, and LightGBM on CPU and GPU. Includes documentation, framework integration, benchmark execution, result analysis with visualization, and comprehensive test coverage. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (7)
python/nvforest/nvforest/benchmark/analyze.py (2)
275-281: Chain the exception for consistent error handling.Similar to the earlier issue, chain the exception for better debugging.
♻️ Proposed fix
try: plot_speedup_heatmaps(df, title=title, output_filename=output) except ImportError as e: if not plot_only: print(f"\nNote: {e}") else: - raise click.ClickException(str(e)) + raise click.ClickException(str(e)) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 275 - 281, The except block for ImportError when calling plot_speedup_heatmaps should chain the original exception when re-raising so tracebacks include the root cause; replace the current raise click.ClickException(str(e)) with a chained raise (i.e., raise click.ClickException(str(e)) from e) in the analyze.py except block that handles ImportError around plot_speedup_heatmaps.
43-50: Chain the exception for better debugging context.When re-raising an exception, use
from errto preserve the exception chain, making debugging easier.♻️ Proposed fix
try: import matplotlib.pyplot as plt import seaborn as sns - except ImportError: + except ImportError as err: raise ImportError( "matplotlib and seaborn are required for plotting. " "Install them with: pip install matplotlib seaborn" - ) + ) from err🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 43 - 50, The ImportError being raised in the top-level import try/except in analyze.py should chain the original exception for better traceback context; in the except block that catches the import failure, capture the exception (e.g., "except ImportError as err:") and re-raise the new ImportError using "from err" so the original import error is preserved when raising the message about matplotlib/seaborn.python/nvforest/nvforest/benchmark/benchmark.py (4)
357-430: Consider adding Returns section to docstring.The function has a good NumPy-style docstring but is missing the Returns section to document that it returns the trained model.
📝 Proposed docstring addition
device : str Device to use for training ('cpu' or 'gpu'). Only affects XGBoost currently. + + Returns + ------- + Any + The trained model instance. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 357 - 430, The docstring for train_model is missing a Returns section; add a NumPy-style "Returns" block to the function docstring (above the implementation in train_model) that documents the returned trained model (type Any or instance of framework.regressor_class/framework.classifier_class) and briefly states that the fitted model is returned after calling model.fit(X_train, y_train). Include the return type and a short description so callers and autodocs know what to expect.
85-97: Logger adds duplicate handlers ifget_logger()is called multiple times.While currently called once at module level, if
get_logger()were called again, it would add duplicate handlers. Consider guarding against this.♻️ Proposed fix
def get_logger(): """Configure and return the benchmark logger.""" logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) + if logger.handlers: + return logger # Already configured logger.setLevel(logging.INFO) handler = logging.StreamHandler(sys.stderr) handler.setLevel(logging.INFO) logger.addHandler(handler) logger.propagate = False return logger🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 85 - 97, get_logger currently always adds a StreamHandler which causes duplicate handlers if called multiple times; modify get_logger so it first obtains the logger via logging.getLogger(__name__) and only adds a new StreamHandler when the logger has no handlers (e.g., if not logger.handlers or not any(isinstance(h, logging.StreamHandler) for h in logger.handlers)); keep setting levels (logger.setLevel, handler.setLevel) and logger.propagate = False; ensure LOGGER = get_logger() behavior is unchanged.
604-616: Consider explicit GPU memory cleanup forX_device.
X_device(cupy array) is created inside the batch_size loop but not explicitly deleted. While Python's GC should handle this, explicit cleanup withdel X_deviceafter each iteration and acp.get_default_memory_pool().free_all_blocks()call could help prevent GPU OOM during benchmarks with many configurations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 604 - 616, The benchmark loop creates a GPU array X_device when device == "gpu" (in benchmark.py) but never frees it; after each iteration where X_device was created, ensure you explicitly delete the reference (del X_device) and call cp.get_default_memory_pool().free_all_blocks() to release GPU memory, guarding the free call with a try/except if cupy may be unavailable and only invoking it when device == "gpu" and cp was successfully imported; place this cleanup at the end of the batch_size/config iteration to avoid GPU OOMs during long runs.
219-235: GPU fallback doesn't use cached availability check.The code sets
_LIGHTGBM_GPU_AVAILABLE = Falseon failure but doesn't check it before attempting GPU prediction. This means every GPU prediction call will attempt GPU first, fail, then fall back to CPU. Consider checking the cached flag to skip the GPU attempt:♻️ Proposed improvement
if device == "gpu": + if _LIGHTGBM_GPU_CHECKED and not _LIGHTGBM_GPU_AVAILABLE: + return model.predict(X) try: # LightGBM GPU prediction - works if model was trained with GPU # and LightGBM was built with GPU support return model.predict(X, num_threads=1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 219 - 235, The GPU fallback repeatedly attempts GPU because the cached flag isn't checked; update the device=="gpu" branch to short-circuit using the module flags: if _LIGHTGBM_GPU_CHECKED and _LIGHTGBM_GPU_AVAILABLE is False, call model.predict(X) immediately; otherwise try the GPU path (model.predict(X, num_threads=1)) inside the try/except and on success set _LIGHTGBM_GPU_CHECKED = True and _LIGHTGBM_GPU_AVAILABLE = True, and on lgb.basic.LightGBMError set _LIGHTGBM_GPU_CHECKED = True and _LIGHTGBM_GPU_AVAILABLE = False, log via LOGGER.warning, then fall back to model.predict(X). Ensure you reference the existing symbols _LIGHTGBM_GPU_CHECKED, _LIGHTGBM_GPU_AVAILABLE, LOGGER, and model.predict when making the change.python/nvforest/nvforest/benchmark/README.md (1)
35-35: Clarify default output directory path.The default is documented as
benchmark/data/but the actual code uses a path relative to the benchmark module (nvforest/benchmark/data/). Consider clarifying this is relative to the package installation, or documenting the absolute path pattern users should expect.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/nvforest/nvforest/benchmark/README.md` at line 35, Update the README entry for `--output-dir` to clarify that the documented default `benchmark/data/` is actually relative to the installed package and points to the package resource directory `nvforest/benchmark/data/`; change the description for the `--output-dir`/`-o` row to state it is relative to the nvforest package installation (e.g., under site-packages as `<...>/nvforest/benchmark/data/`) or show the absolute pattern users should expect, so the `--output-dir` default documented matches the code's actual package-relative path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/nvforest/nvforest/benchmark/analyze.py`:
- Around line 64-68: The axes reshaping logic in analyze.py can fail when both
len(max_depth_values)==1 and len(num_trees_values)==1 because plt.subplots(1, 1)
returns a single Axes object (not an array); update the code that currently
checks max_depth_values and num_trees_values and reshapes axes to first ensure
axes is an array (e.g., wrap single Axes into a 2D array or use
numpy.atleast_2d/np.array) before calling .reshape, or handle the single-Axes
case explicitly so axes is always an indexable 2D structure; adjust the branches
that reference axes, max_depth_values, num_trees_values and plt.subplots
accordingly.
In `@python/nvforest/nvforest/benchmark/benchmark.py`:
- Around line 628-637: The lambda passed into run_inference_benchmark captures
framework by reference which can lead to late-binding bugs; explicitly bind
framework into the lambda's defaults (e.g., add f=framework to the lambda
signature) so that the call to framework.predict_native uses the current
iteration's framework instance; update the lambda used in
run_inference_benchmark (where native_model, device, native_data are used) to
include that extra default argument so predict_native is called on the correct
framework.
- Around line 184-199: Remove the unused helper and redundant cache: delete the
entire _check_lightgbm_gpu_available() function and remove the
_LIGHTGBM_GPU_AVAILABLE variable (and any assignments to it) from the module;
keep and continue using _LIGHTGBM_GPU_CHECKED for deduping warnings, and ensure
any code that previously set _LIGHTGBM_GPU_AVAILABLE instead updates/uses only
_LIGHTGBM_GPU_CHECKED or checks LightGBM availability directly where needed
(refer to symbols _check_lightgbm_gpu_available, _LIGHTGBM_GPU_AVAILABLE, and
_LIGHTGBM_GPU_CHECKED).
In `@python/nvforest/nvforest/benchmark/README.md`:
- Around line 115-117: Update the pip/GPU installation instructions in the
README: replace the deprecated use of pip's --install-option=--gpu with a simple
"pip install lightgbm" and add a short note that the PyPI wheel includes OpenCL
GPU support and that users must install appropriate OpenCL drivers/runtimes;
also add a clarifying sentence that CUDA backend requires building from source
with CMake and -DUSE_CUDA=ON so readers know which GPU backend the benchmark
uses and what driver prerequisites are required.
---
Nitpick comments:
In `@python/nvforest/nvforest/benchmark/analyze.py`:
- Around line 275-281: The except block for ImportError when calling
plot_speedup_heatmaps should chain the original exception when re-raising so
tracebacks include the root cause; replace the current raise
click.ClickException(str(e)) with a chained raise (i.e., raise
click.ClickException(str(e)) from e) in the analyze.py except block that handles
ImportError around plot_speedup_heatmaps.
- Around line 43-50: The ImportError being raised in the top-level import
try/except in analyze.py should chain the original exception for better
traceback context; in the except block that catches the import failure, capture
the exception (e.g., "except ImportError as err:") and re-raise the new
ImportError using "from err" so the original import error is preserved when
raising the message about matplotlib/seaborn.
In `@python/nvforest/nvforest/benchmark/benchmark.py`:
- Around line 357-430: The docstring for train_model is missing a Returns
section; add a NumPy-style "Returns" block to the function docstring (above the
implementation in train_model) that documents the returned trained model (type
Any or instance of framework.regressor_class/framework.classifier_class) and
briefly states that the fitted model is returned after calling
model.fit(X_train, y_train). Include the return type and a short description so
callers and autodocs know what to expect.
- Around line 85-97: get_logger currently always adds a StreamHandler which
causes duplicate handlers if called multiple times; modify get_logger so it
first obtains the logger via logging.getLogger(__name__) and only adds a new
StreamHandler when the logger has no handlers (e.g., if not logger.handlers or
not any(isinstance(h, logging.StreamHandler) for h in logger.handlers)); keep
setting levels (logger.setLevel, handler.setLevel) and logger.propagate = False;
ensure LOGGER = get_logger() behavior is unchanged.
- Around line 604-616: The benchmark loop creates a GPU array X_device when
device == "gpu" (in benchmark.py) but never frees it; after each iteration where
X_device was created, ensure you explicitly delete the reference (del X_device)
and call cp.get_default_memory_pool().free_all_blocks() to release GPU memory,
guarding the free call with a try/except if cupy may be unavailable and only
invoking it when device == "gpu" and cp was successfully imported; place this
cleanup at the end of the batch_size/config iteration to avoid GPU OOMs during
long runs.
- Around line 219-235: The GPU fallback repeatedly attempts GPU because the
cached flag isn't checked; update the device=="gpu" branch to short-circuit
using the module flags: if _LIGHTGBM_GPU_CHECKED and _LIGHTGBM_GPU_AVAILABLE is
False, call model.predict(X) immediately; otherwise try the GPU path
(model.predict(X, num_threads=1)) inside the try/except and on success set
_LIGHTGBM_GPU_CHECKED = True and _LIGHTGBM_GPU_AVAILABLE = True, and on
lgb.basic.LightGBMError set _LIGHTGBM_GPU_CHECKED = True and
_LIGHTGBM_GPU_AVAILABLE = False, log via LOGGER.warning, then fall back to
model.predict(X). Ensure you reference the existing symbols
_LIGHTGBM_GPU_CHECKED, _LIGHTGBM_GPU_AVAILABLE, LOGGER, and model.predict when
making the change.
In `@python/nvforest/nvforest/benchmark/README.md`:
- Line 35: Update the README entry for `--output-dir` to clarify that the
documented default `benchmark/data/` is actually relative to the installed
package and points to the package resource directory `nvforest/benchmark/data/`;
change the description for the `--output-dir`/`-o` row to state it is relative
to the nvforest package installation (e.g., under site-packages as
`<...>/nvforest/benchmark/data/`) or show the absolute pattern users should
expect, so the `--output-dir` default documented matches the code's actual
package-relative path.
ℹ️ Review info
Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dd64f002-16f8-4baa-884a-9b5d8fac4046
📒 Files selected for processing (6)
python/nvforest/nvforest/benchmark/README.mdpython/nvforest/nvforest/benchmark/__init__.pypython/nvforest/nvforest/benchmark/analyze.pypython/nvforest/nvforest/benchmark/benchmark.pypython/nvforest/nvforest/benchmark/data/.gitkeeppython/nvforest/tests/test_benchmark.py
| # Handle single row/column case | ||
| if len(max_depth_values) == 1: | ||
| axes = axes.reshape(1, -1) | ||
| if len(num_trees_values) == 1: | ||
| axes = axes.reshape(-1, 1) |
There was a problem hiding this comment.
Potential error when both dimensions equal 1.
When len(max_depth_values) == 1 and len(num_trees_values) == 1, plt.subplots(1, 1) returns a single Axes object (not an array), and calling .reshape() on it will raise AttributeError.
🐛 Proposed fix
# Handle single row/column case
+ if len(max_depth_values) == 1 and len(num_trees_values) == 1:
+ axes = np.array([[axes]])
- if len(max_depth_values) == 1:
+ elif len(max_depth_values) == 1:
axes = axes.reshape(1, -1)
- if len(num_trees_values) == 1:
+ elif len(num_trees_values) == 1:
axes = axes.reshape(-1, 1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Handle single row/column case | |
| if len(max_depth_values) == 1: | |
| axes = axes.reshape(1, -1) | |
| if len(num_trees_values) == 1: | |
| axes = axes.reshape(-1, 1) | |
| # Handle single row/column case | |
| if len(max_depth_values) == 1 and len(num_trees_values) == 1: | |
| axes = np.array([[axes]]) | |
| elif len(max_depth_values) == 1: | |
| axes = axes.reshape(1, -1) | |
| elif len(num_trees_values) == 1: | |
| axes = axes.reshape(-1, 1) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/nvforest/nvforest/benchmark/analyze.py` around lines 64 - 68, The axes
reshaping logic in analyze.py can fail when both len(max_depth_values)==1 and
len(num_trees_values)==1 because plt.subplots(1, 1) returns a single Axes object
(not an array); update the code that currently checks max_depth_values and
num_trees_values and reshapes axes to first ensure axes is an array (e.g., wrap
single Axes into a 2D array or use numpy.atleast_2d/np.array) before calling
.reshape, or handle the single-Axes case explicitly so axes is always an
indexable 2D structure; adjust the branches that reference axes,
max_depth_values, num_trees_values and plt.subplots accordingly.
| def _check_lightgbm_gpu_available() -> bool: | ||
| """Check if LightGBM was built with GPU support.""" | ||
| try: | ||
| # Try to create a dummy dataset and check if GPU device works | ||
| # This is a lightweight check that doesn't require actual training | ||
| params = {"device": "gpu", "gpu_platform_id": 0, "gpu_device_id": 0} | ||
| _ = lgb.Dataset([], params=params) | ||
| # LightGBM will raise an error if GPU is not available when we try to use it | ||
| return True # Assume available, will fail gracefully during training/predict | ||
| except Exception: | ||
| return False | ||
|
|
||
|
|
||
| # Cache for LightGBM GPU availability | ||
| _LIGHTGBM_GPU_CHECKED = False | ||
| _LIGHTGBM_GPU_AVAILABLE = False |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if _check_lightgbm_gpu_available is called anywhere
rg "_check_lightgbm_gpu_available" python/nvforest/ --type pyRepository: rapidsai/nvforest
Length of output: 153
🏁 Script executed:
# Check if the cache variables are used
rg "_LIGHTGBM_GPU_CHECKED|_LIGHTGBM_GPU_AVAILABLE" python/nvforest/ --type py -B 2 -A 2Repository: rapidsai/nvforest
Length of output: 1539
🏁 Script executed:
# Also check the _predict_lightgbm function to understand cache usage
rg -A 30 "def _predict_lightgbm" python/nvforest/nvforest/benchmark/benchmark.pyRepository: rapidsai/nvforest
Length of output: 1287
🏁 Script executed:
# Check where _LIGHTGBM_GPU_AVAILABLE is read/used
rg "_LIGHTGBM_GPU_AVAILABLE" python/nvforest/ --type py -B 3 -A 3Repository: rapidsai/nvforest
Length of output: 1823
Remove unused _check_lightgbm_gpu_available() function and redundant _LIGHTGBM_GPU_AVAILABLE cache variable.
_check_lightgbm_gpu_available() is defined but never called and should be removed. Additionally, while _LIGHTGBM_GPU_CHECKED is correctly used to deduplicate warning logs, _LIGHTGBM_GPU_AVAILABLE is set in the exception handler but never actually read anywhere in the code, making it redundant.
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 192-192: Consider moving this statement to an else block
(TRY300)
[warning] 193-193: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 184 - 199,
Remove the unused helper and redundant cache: delete the entire
_check_lightgbm_gpu_available() function and remove the _LIGHTGBM_GPU_AVAILABLE
variable (and any assignments to it) from the module; keep and continue using
_LIGHTGBM_GPU_CHECKED for deduping warnings, and ensure any code that previously
set _LIGHTGBM_GPU_AVAILABLE instead updates/uses only _LIGHTGBM_GPU_CHECKED or
checks LightGBM availability directly where needed (refer to symbols
_check_lightgbm_gpu_available, _LIGHTGBM_GPU_AVAILABLE, and
_LIGHTGBM_GPU_CHECKED).
| try: | ||
| native_time = run_inference_benchmark( | ||
| lambda batch, | ||
| m=native_model, | ||
| d=device: framework.predict_native( | ||
| m, batch, d | ||
| ), | ||
| native_data, | ||
| batch_size, | ||
| ) |
There was a problem hiding this comment.
Lambda captures loop variable framework by reference.
The lambda correctly binds native_model and device as default arguments but captures framework by reference. While safe in the current code structure (lambda is used immediately within the same iteration), this is a latent bug waiting to happen if the code is refactored.
🐛 Proposed fix
try:
native_time = run_inference_benchmark(
lambda batch,
m=native_model,
- d=device: framework.predict_native(
+ d=device,
+ f=framework: f.predict_native(
m, batch, d
),
native_data,
batch_size,
)🧰 Tools
🪛 Ruff (0.15.2)
[warning] 632-632: Function definition does not bind loop variable framework
(B023)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/nvforest/nvforest/benchmark/benchmark.py` around lines 628 - 637, The
lambda passed into run_inference_benchmark captures framework by reference which
can lead to late-binding bugs; explicitly bind framework into the lambda's
defaults (e.g., add f=framework to the lambda signature) so that the call to
framework.predict_native uses the current iteration's framework instance; update
the lambda used in run_inference_benchmark (where native_model, device,
native_data are used) to include that extra default argument so predict_native
is called on the correct framework.
| # Option 2: pip with GPU flag | ||
| pip install lightgbm --install-option=--gpu | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
LightGBM GPU installation pip 2025 modern method
💡 Result:
Modern (2025+) pip install methods for LightGBM GPU
A) Easiest: GPU via OpenCL using official PyPI wheels (Windows / Linux)
- Install LightGBM:
pip install -U lightgbm
- Ensure an OpenCL runtime/driver is installed.
- The LightGBM PyPI wheels include both CPU+GPU (OpenCL) support; you typically just need GPU drivers / OpenCL runtime present. (pypi.org)
- Use it:
- In parameters, use
device_type="gpu"(OpenCL). (GPU build details are in the official install guide.) (lightgbm.readthedocs.io)
- In parameters, use
Notes:
- This OpenCL GPU path is Windows + Linux only; macOS GPU is not supported. (lightgbm.readthedocs.io)
B) NVIDIA CUDA (Linux): device_type="cuda" usually means build from source
If you specifically want the CUDA backend (device_type="cuda"), LightGBM treats it as a separate implementation and documents building with CMake using -DUSE_CUDA=ON (Linux; Windows CUDA backend not supported). (lightgbm.readthedocs.io)
C) AMD ROCm (Linux): install AMD’s ROCm wheel via pip
If you’re on AMD Instinct + ROCm, AMD provides a ROCm-enabled wheel:
pip install amd_lightgbm --extra-index-url=https://pypi.amd.com/rocm-7.0.2/simple(or the matching ROCm version index). (rocm.docs.amd.com)
Citations:
- 1: https://pypi.org/project/lightgbm/?utm_source=openai
- 2: https://lightgbm.readthedocs.io/en/stable/Installation-Guide.html?utm_source=openai
- 3: https://lightgbm.readthedocs.io/en/stable/Installation-Guide.html?utm_source=openai
- 4: https://lightgbm.readthedocs.io/en/stable/Installation-Guide.html?utm_source=openai
- 5: https://rocm.docs.amd.com/projects/lightgbm/en/docs-25.11/install/install.html?utm_source=openai
Update pip installation command for LightGBM GPU support.
The --install-option flag is deprecated in pip 21.3+ and removed in pip 23.1. However, the modern approach is simpler than using --config-settings: official LightGBM PyPI wheels include GPU (OpenCL) support built-in.
Update to:
# Option 2: pip with GPU (OpenCL) support
pip install lightgbm
# Ensure OpenCL drivers/runtime are installed for GPU accelerationNote: If CUDA backend is needed instead of OpenCL, building from source with CMake is required (-DUSE_CUDA=ON). Add a note clarifying which GPU backend the benchmark uses and any driver prerequisites.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/nvforest/nvforest/benchmark/README.md` around lines 115 - 117, Update
the pip/GPU installation instructions in the README: replace the deprecated use
of pip's --install-option=--gpu with a simple "pip install lightgbm" and add a
short note that the PyPI wheel includes OpenCL GPU support and that users must
install appropriate OpenCL drivers/runtimes; also add a clarifying sentence that
CUDA backend requires building from source with CMake and -DUSE_CUDA=ON so
readers know which GPU backend the benchmark uses and what driver prerequisites
are required.
PR adds a comprehensive benchmark suite for comparing cuforest inference performance against native ML framework inference.
Features
--framework/-f: Select specific frameworks--dry-run/-n: Preview configuration without running--quick-test/-q: Minimal parameters for quick verification--device/-d: Select cpu, gpu, or both--model-type/-m: Select regressor, classifier, or bothUsage
Files
cuforest/benchmark/benchmark.py- Main benchmark with CLIcuforest/benchmark/analyze.py- Results analysis and visualizationcuforest/benchmark/README.md- Documentationtodo:
Note: Builds on top of #31