Fix three critical catboost engine bugs#125
Open
viv-analytics wants to merge 1 commit into
Open
Conversation
- `process_loss_function()` now checks for both `"loss_function"` and `"objective"` before auto-injecting, preventing CatBoost's hard-error when the user supplies `objective` via `set_engine()`. - Add `check_catboost_aliases()` + `catboost_aliases` table (modelled on `check_lightgbm_aliases()`): errors early with a helpful message when a CatBoost synonym alias (e.g. `n_estimators`, `eta`, `max_depth`) is passed to `set_engine()`, which would otherwise cause a C++-level duplicate-parameter hard-error. - Add `validation` argument to `train_catboost()` (mirrors LightGBM's pattern): when `early_stopping_rounds` is set, a `test_pool` is now built and passed to `catboost.train()`. Without this, the overfitting detector was silently non-functional. When `validation` is omitted the training data is reused as the evaluation set (same fallback as LightGBM). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
objectivealias crash:process_loss_function()was checking only for"loss_function"before auto-injecting a loss. If the user passedobjectiveviaset_engine(), bonsai still injectedloss_function, triggering CatBoost's hard C++ error: "Only one of the parameters [loss_function, objective] should be initialized." Fixed by extending the guard toc("loss_function", "objective").Synonym alias collision: CatBoost hard-errors when synonym aliases (
n_estimators,eta,max_depth, etc.) are passed alongside the canonical parameter name that bonsai injects by default. Addedcheck_catboost_aliases()+catboost_aliasestable (modelled on the existingcheck_lightgbm_aliases()) that errors early with a helpful message directing users to the parsnip argument instead.stop_iternon-functional:train_catboost()never passed atest_pooltocatboost.train(), so the overfitting detector (od_type = "Iter") had no evaluation data and silently did nothing. Added avalidationargument (mirrors LightGBM's pattern): whenearly_stopping_roundsis set, a held-out pool is built and passed astest_pool. Whenvalidationis omitted the training data is reused as the evaluation set (same safe fallback as LightGBM).Test plan
catboost does not inject loss_function when objective is already set— regression and multiclass withobjective =pass without errorcatboost errors on synonym aliases passed to set_engine()—n_estimatorsandetaboth error with "alias for a main model argument"stop_iter is functional when validation is supplied— direct API withvalidation = 0.2, fallback with no validation, and parsnip interface all pass🤖 Generated with Claude Code