[WIP] [New Model] GRANDE #249

s-marton · 2025-12-22T10:39:48Z

Add GRANDE method for TabArena benchmarking

Description

This PR introduces GRANDE, a novel method for learning axis-aligned decision tree ensembles with gradient descent, into the TabArena benchmark.

Repository: https://github.com/s-marton/GRANDE
Paper: https://arxiv.org/abs/2309.17130

Changes included

Added GRANDE implementation in:
- tabarena/tabarena/benchmark/models/ag/grande/
- tabarena/tabarena/models/grande/
Updated method registration in model_registry.py and models/utils.py files to include GRANDE
Added test tst/benchmark/models/test_grande.py to verify correct integration and basic functionality

Evaluation

GRANDE was benchmarked on TabArena Lite, as well as on an extended set of datasets using folds 0, 1, and 2 (the reported results are based on this extended evaluation).
Results show that GRANDE achieves strong performance, particularly on binary classification and regression tasks. Performance on multiclass tasks is currently lower, which drags down the overall benchmark results. Ongoing work is focused on improving this.

Figure 1: GRANDE results on TabArena folds 0, 1, and 2.

Figure 2a: GRANDE binary classification results on TabArena folds 0, 1, and 2.	Figure 2b: GRANDE regression results on TabArena folds 0, 1, and 2.	Figure 2c: GRANDE multi-class classification results on TabArena folds 0, 1, and 2.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

LennartPurucker · 2025-12-22T10:49:41Z

Heyho @s-marton, great work on the PR, this looks great and awesome results!

I will try to get back to you ASAP and start a run on my end so I can run on all folds/splits afterwards. I kindly ask for some patience, as the winter break begins on Wednesday, and it might take me a bit longer to take a closer look.

LennartPurucker · 2025-12-22T10:55:00Z

One initial thought: I see a fixed n_estimators in the search space. I assume this is the upper limit (?). Is this a large enough limit as we often see values in the scale of 10k for such an upper limit for other boosting methods?

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

s-marton · 2025-12-22T11:05:48Z

Heyho @s-marton, great work on the PR, this looks great and awesome results!

I will try to get back to you ASAP and start a run on my end so I can run on all folds/splits afterwards. I kindly ask for some patience, as the winter break begins on Wednesday, and it might take me a bit longer to take a closer look.

Hey, no problem at all! Take your time, and enjoy the winter break. Looking forward to hearing from you whenever you get a chance.

One initial thought: I see a fixed n_estimators in the search space. I assume this is the upper limit (?). Is this a large enough limit as we often see values in the scale of 10k for such an upper limit for other boosting methods?

Regarding your question on n_estimators: This is fixed intentionally. Unlike GBDTs, in GRANDE all estimators are trained in parallel rather than sequentially, so this number always corresponds to the exact number of estimators in the ensemble. Increasing it further usually does not yield a notable performance gain but mainly adds computational and memory overhead.

LennartPurucker · 2025-12-22T11:08:41Z

Unlike GBDTs, in GRANDE all estimators are trained in parallel

Ah, gotcha! I should read the paper as well! :)

dholzmueller · 2025-12-24T12:37:53Z

It seems that the multi-class plot above is in fact the regression plot and vice versa. So GRANDE's weakness is regression tasks, not multi-class tasks.

s-marton · 2026-01-05T09:38:34Z

It seems that the multi-class plot above is in fact the regression plot and vice versa. So GRANDE's weakness is regression tasks, not multi-class tasks.

I just double-checked the subsets, and you’re absolutely right, the issue is with regression. That also makes more sense conceptually and gives a clearer direction for how we might improve the results. I’ll look into it and hopefully be able to push an update soon.

LennartPurucker · 2026-01-18T21:21:49Z

Heyho, here are some results from my initial evaluation on TabArena-Lite with 50 configs (for now).

The results seem a bit worse, but that's reasonable given it is less HPO and we now force early stopping after 1 hour. Yet, there seems to be a problem related to the default performance for regression. I am not quite sure what triggered this change. And will have to investigate.

In general, from my work on the refactor, it seems like it would be great to refactor the PR such that we pip install GRANDE from the official repository so it might function as a standalone package. @s-marton is this something you are working towards?

All tasks:

binary	regression	multi-class

s-marton · 2026-01-19T07:10:18Z

Hi @LennartPurucker,
thanks for the update! In general, I think it is reasonable that the performance with 50 trials is a bit worse, considering that GRANDE benefits more from tuning and ensembling compared to some baselines, and that the 1h ES could also trigger for some larger datasets depending on the model configuration.

The default performance for regression, however, is surprising. I will take a look at this as well. The GRANDE repo is currently a bit behind, but I am planning to update it soon, so it should be possible to refactor the PR to use a pip install. I will take care of this shortly.

LennartPurucker · 2026-01-19T08:18:31Z

Great to hear @s-marton! Let me know once I should take another look. After ICML, I should also have enough compute to run more configs!

Add GRANDE method to TabArena

b52737a

Copilot AI review requested due to automatic review settings December 22, 2025 10:39

Copilot started reviewing on behalf of s-marton December 22, 2025 10:40 View session

Copilot AI reviewed Dec 22, 2025

View reviewed changes

LennartPurucker added 4 commits January 17, 2026 15:21

Merge branch 'main' into add-grande-method

df4b046

maint/add: see change log in grande_model

b1b5585

add: start tabarena-lite, 50 configs benchmark

3754c18

Merge remote-tracking branch 'origin/main' into add-grande-method

934b9b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [New Model] GRANDE #249

[WIP] [New Model] GRANDE #249

Uh oh!

s-marton commented Dec 22, 2025

Uh oh!

LennartPurucker commented Dec 22, 2025

Uh oh!

LennartPurucker commented Dec 22, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

s-marton commented Dec 22, 2025

Uh oh!

LennartPurucker commented Dec 22, 2025

Uh oh!

dholzmueller commented Dec 24, 2025

Uh oh!

s-marton commented Jan 5, 2026

Uh oh!

LennartPurucker commented Jan 18, 2026

Uh oh!

s-marton commented Jan 19, 2026

Uh oh!

LennartPurucker commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] [New Model] GRANDE #249

Are you sure you want to change the base?

[WIP] [New Model] GRANDE #249

Uh oh!

Conversation

s-marton commented Dec 22, 2025

Add GRANDE method for TabArena benchmarking

Description

Changes included

Evaluation

Uh oh!

LennartPurucker commented Dec 22, 2025

Uh oh!

LennartPurucker commented Dec 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

s-marton commented Dec 22, 2025

Uh oh!

LennartPurucker commented Dec 22, 2025

Uh oh!

dholzmueller commented Dec 24, 2025

Uh oh!

s-marton commented Jan 5, 2026

Uh oh!

LennartPurucker commented Jan 18, 2026

Uh oh!

s-marton commented Jan 19, 2026

Uh oh!

LennartPurucker commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants