Skip to content

Add Model Compare Tab#509

Merged
KartikP merged 6 commits intomasterfrom
kp/model-compare
Mar 5, 2026
Merged

Add Model Compare Tab#509
KartikP merged 6 commits intomasterfrom
kp/model-compare

Conversation

@KartikP
Copy link
Copy Markdown
Contributor

@KartikP KartikP commented Feb 11, 2026

compare.models.mov

@mike-ferguson mike-ferguson self-requested a review February 11, 2026 21:20
@KartikP KartikP requested review from mike-ferguson and removed request for mike-ferguson February 17, 2026 18:52
@mschrimpf
Copy link
Copy Markdown
Member

This is great!

Some nitpicks:

Per-Benchmark Score Correlation

  1. Can we use the same visualization as for the compare-benchmarks? For instance, Pearson R etc are shown differently in the two plots. In fact they're even shown differently within the compare-models plot as there is a box at the top detailing the stats and then there's another one with the same information in the top left. I don't care so much which one we use, except that we show information once, uniformly, and that the Brain-Score logo would be nice to show somewhere on the plot so that it's visible when people re-post the plot.
  2. The search is great! Can we just make it bit more forgiving? For instance, searching for "resnet 50 sin" does not yield resnet50-SIN. (Going forward it would be great to include our tags here too, such that you can e.g. search for "transformer".)
  3. Could we include more information about each model that is selected? For instance the rank, who contributed it, a link to the model page (like a mini model card). I would put this as two boxes below the search instead of the current stats.
  4. At least according to the legend, scores are repeated: average vision includes everything, neural includes {V1, V2, V4, IT}. I would say we either let the user choose the level, or we keep it at {V1, V2, V4, IT, Behavior} -- i.e. gray out engineering by default.
  5. For the individual benchmark dots, is it possible to make them clickable? I.e. link to the benchmark page? (Tooltip could even be some of the benchmark stimuli, but just the link would already be great.)
  6. (When clicking on e.g. V1, I would have expected that only those scores remain and everything else goes away. But this is only a minor inconvenience and I guess otherwise it wouldn't be possible to filter individually.)

Top Benchmark Differences

Move the title text a tiny bit higher, it overlaps with the bars.
image

Mean Score by Domain

Let's show the individual benchmark dots? Or do more of a boxplot/violin plot?

@mschrimpf
Copy link
Copy Markdown
Member

I want to be clear that these are all nitpicks that we can address in an update in the near future. Having this live as-is would already be an improvement over not having it :)

@KartikP KartikP merged commit 12cbc98 into master Mar 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants