Skip to content

Fix three critical catboost engine bugs#125

Open
viv-analytics wants to merge 1 commit into
tidymodels:mainfrom
viv-analytics:fix/catboost-critical-issues
Open

Fix three critical catboost engine bugs#125
viv-analytics wants to merge 1 commit into
tidymodels:mainfrom
viv-analytics:fix/catboost-critical-issues

Conversation

@viv-analytics
Copy link
Copy Markdown

Summary

  • objective alias crash: process_loss_function() was checking only for "loss_function" before auto-injecting a loss. If the user passed objective via set_engine(), bonsai still injected loss_function, triggering CatBoost's hard C++ error: "Only one of the parameters [loss_function, objective] should be initialized." Fixed by extending the guard to c("loss_function", "objective").

  • Synonym alias collision: CatBoost hard-errors when synonym aliases (n_estimators, eta, max_depth, etc.) are passed alongside the canonical parameter name that bonsai injects by default. Added check_catboost_aliases() + catboost_aliases table (modelled on the existing check_lightgbm_aliases()) that errors early with a helpful message directing users to the parsnip argument instead.

  • stop_iter non-functional: train_catboost() never passed a test_pool to catboost.train(), so the overfitting detector (od_type = "Iter") had no evaluation data and silently did nothing. Added a validation argument (mirrors LightGBM's pattern): when early_stopping_rounds is set, a held-out pool is built and passed as test_pool. When validation is omitted the training data is reused as the evaluation set (same safe fallback as LightGBM).

Test plan

  • catboost does not inject loss_function when objective is already set — regression and multiclass with objective = pass without error
  • catboost errors on synonym aliases passed to set_engine()n_estimators and eta both error with "alias for a main model argument"
  • stop_iter is functional when validation is supplied — direct API with validation = 0.2, fallback with no validation, and parsnip interface all pass
  • All pre-existing catboost tests continue to pass (60 pass, 2 pre-existing snapshot skips unchanged)

🤖 Generated with Claude Code

- `process_loss_function()` now checks for both `"loss_function"` and
  `"objective"` before auto-injecting, preventing CatBoost's hard-error
  when the user supplies `objective` via `set_engine()`.

- Add `check_catboost_aliases()` + `catboost_aliases` table (modelled on
  `check_lightgbm_aliases()`): errors early with a helpful message when a
  CatBoost synonym alias (e.g. `n_estimators`, `eta`, `max_depth`) is
  passed to `set_engine()`, which would otherwise cause a C++-level
  duplicate-parameter hard-error.

- Add `validation` argument to `train_catboost()` (mirrors LightGBM's
  pattern): when `early_stopping_rounds` is set, a `test_pool` is now
  built and passed to `catboost.train()`. Without this, the overfitting
  detector was silently non-functional. When `validation` is omitted the
  training data is reused as the evaluation set (same fallback as LightGBM).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant