Fix three critical catboost engine bugs by viv-analytics · Pull Request #125 · tidymodels/bonsai

viv-analytics · 2026-04-18T19:25:18Z

Summary

objective alias crash: process_loss_function() was checking only for "loss_function" before auto-injecting a loss. If the user passed objective via set_engine(), bonsai still injected loss_function, triggering CatBoost's hard C++ error: "Only one of the parameters [loss_function, objective] should be initialized." Fixed by extending the guard to c("loss_function", "objective").
Synonym alias collision: CatBoost hard-errors when synonym aliases (n_estimators, eta, max_depth, etc.) are passed alongside the canonical parameter name that bonsai injects by default. Added check_catboost_aliases() + catboost_aliases table (modelled on the existing check_lightgbm_aliases()) that errors early with a helpful message directing users to the parsnip argument instead.
stop_iter non-functional: train_catboost() never passed a test_pool to catboost.train(), so the overfitting detector (od_type = "Iter") had no evaluation data and silently did nothing. Added a validation argument (mirrors LightGBM's pattern): when early_stopping_rounds is set, a held-out pool is built and passed as test_pool. When validation is omitted the training data is reused as the evaluation set (same safe fallback as LightGBM).

Test plan

catboost does not inject loss_function when objective is already set — regression and multiclass with objective = pass without error
catboost errors on synonym aliases passed to set_engine() — n_estimators and eta both error with "alias for a main model argument"
stop_iter is functional when validation is supplied — direct API with validation = 0.2, fallback with no validation, and parsnip interface all pass
All pre-existing catboost tests continue to pass (60 pass, 2 pre-existing snapshot skips unchanged)

🤖 Generated with Claude Code

- `process_loss_function()` now checks for both `"loss_function"` and `"objective"` before auto-injecting, preventing CatBoost's hard-error when the user supplies `objective` via `set_engine()`. - Add `check_catboost_aliases()` + `catboost_aliases` table (modelled on `check_lightgbm_aliases()`): errors early with a helpful message when a CatBoost synonym alias (e.g. `n_estimators`, `eta`, `max_depth`) is passed to `set_engine()`, which would otherwise cause a C++-level duplicate-parameter hard-error. - Add `validation` argument to `train_catboost()` (mirrors LightGBM's pattern): when `early_stopping_rounds` is set, a `test_pool` is now built and passed to `catboost.train()`. Without this, the overfitting detector was silently non-functional. When `validation` is omitted the training data is reused as the evaluation set (same fallback as LightGBM). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix three critical catboost engine bugs#125

Fix three critical catboost engine bugs#125
viv-analytics wants to merge 1 commit into
tidymodels:mainfrom
viv-analytics:fix/catboost-critical-issues

viv-analytics commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viv-analytics commented Apr 18, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant