Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356

BKHMSI · 2026-02-10T10:57:10Z

No description provided.

KartikP · 2026-02-10T11:13:59Z

Hi @BKHMSI did you intend to remove Pereira2018.243sentences-linear?

BKHMSI · 2026-02-10T11:22:53Z

Hi @BKHMSI did you intend to remove Pereira2018.243sentences-linear?

Yes, I did intend to change all benchmarks to use ridge regression instead of linear.

mschrimpf · 2026-02-10T11:35:28Z

let's keep the original ones for reference? (at least in code, don't have to display on the website.)
I agree we should use ridge going forward

BKHMSI · 2026-02-10T12:40:38Z

Re-added the linear metrics for all benchmarks and fixed ceiling for ridge.

Note that for Pereira2018, we need to cache ceilings for the new metrics.

mike-ferguson · 2026-02-10T14:27:02Z

Hi all,

Please note a few things about the state of the language repo:

Automerging currently disabled for benchmark-only PRs. Merging will require manual approval by Brain-Score admin (Kartik, me, or Martin most likely)
Scoring is disabled as well until language scoring infra comes online (~ end of next week)

KartikP · 2026-02-12T11:33:35Z

Hi @BKHMSI thanks for the PR. I had a chance to take a look at it and I noticed a few things that require attention:

Please provide a description. A simple explanation of what you've done and why would suffice.
In the benchmark factory, you pass the groupkfold to the linear benchmarks via CV_kwargs which breaks backwards compatibility with the linear variants of the benchmarks. Given that the intention is to hide linear on the leaderboard and use RidgeCV moving foward, could you just not pass any kwargs? Otherwise, tests should also reflect changes.
Missing numpy import in blank2014/benchmark.py, fedorenko2016/benchmark.py, tuckute2024/benchmark.py
Benchmarks return a dict instead of Score object. This breaks the way that the score object is parsed to populate the DB -> leaderboard.
My recommendation:

        score = Score(np.mean(list(layer_scores.values())))
        score.attrs['layer_scores'] = layer_scores

You've added a substantial amount (RidgeGCV, Ridge benchmark variants, ,etc) yet no tests. To ensure that your additions continue to operate as expected, please consider this addition.

i've attempted at addressing all of these issues in #361. The most significant differences are:

Return a Score object with layer_scores, raw, and ceiling as attributes. This was necessary because the dict was breaking downstream benchmark API.
Default.kfold was set to False instead of "group" to ensure backwards capability for cross-validation. This was the main data integrity risk.
Missing imports (numpy and scipy.linalg)
Missing coords (Blank2014 never added story coord and Fedorenko2016 never added sentence_id coord.
Remove CV_kwargs from linear benchmarks

If #361 looks good to you, please let me know, otherwise, I hope it can be benefit to you.

BKHMSI added 3 commits February 2, 2026 12:56

support for Ridge GPU + Restructuring Pereira

db67562

Merge branch 'brain-score:main' into main

c3988b1

group based splitting for Fedorenko2016 and Blank2014

49e8fdf

re-added linear metric to benchmarks

3729272

KartikP added a commit that referenced this pull request Feb 11, 2026

from PR #356

3be6bb3

KartikP mentioned this pull request Feb 11, 2026

Fix #356 compatibility #361

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356

Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356

BKHMSI commented Feb 10, 2026

Uh oh!

KartikP commented Feb 10, 2026

Uh oh!

BKHMSI commented Feb 10, 2026

Uh oh!

mschrimpf commented Feb 10, 2026

Uh oh!

BKHMSI commented Feb 10, 2026 •

edited

Loading

Uh oh!

mike-ferguson commented Feb 10, 2026 •

edited

Loading

Uh oh!

KartikP commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356

Are you sure you want to change the base?

Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356

Conversation

BKHMSI commented Feb 10, 2026

Uh oh!

KartikP commented Feb 10, 2026

Uh oh!

BKHMSI commented Feb 10, 2026

Uh oh!

mschrimpf commented Feb 10, 2026

Uh oh!

BKHMSI commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mike-ferguson commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KartikP commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BKHMSI commented Feb 10, 2026 •

edited

Loading

mike-ferguson commented Feb 10, 2026 •

edited

Loading