Add method to facilitate tabulation of multiple performance evaluations. by ablaom · Pull Request #1045 · JuliaAI/MLJBase.jl

ablaom · 2026-04-05T21:26:12Z

This PR is a follow up for the enhancements at #1034.

This PR extends DataAPI.describe (also extended by DataFrames) to summarise an MLJ performance evaluation, as a named tuple. Multiple evaluations can then be combined in a table:

using MLJ
X, y = @load_iris # a vector and a table

# instantiate two models:
knn = (@load KNNClassifier pkg=NearestNeighborModels)()
tree = (@load DecisionTreeClassifier pkg=DecisionTree)()

named_models = [
    "Dummy" => ConstantClassifier(),  # a built-in model
    "K-nearest neighbors" => knn,
    "Decision Tree" => tree,
]
performance_evaluations = evaluate(named_models, X, y; measures=[brier_score, accuracy])
julia> describe(performance_evaluations[1])
(tag = "Dummy", BrierScore = -0.573 ± 0.1, Accuracy = 0.33 ± 0.23)

table = describe.(performance_evaluations)
julia> pretty(table)
┌─────────────────────┬──────────────────────┬──────────────────────┐
│ tag                 │ BrierScore           │ Accuracy             │
│ String              │ Measurement{Float64} │ Measurement{Float64} │
│ Textual             │ Continuous           │ Continuous           │
├─────────────────────┼──────────────────────┼──────────────────────┤
│ Dummy               │ -0.573±0.1           │ 0.33±0.23            │
│ K-nearest neighbors │ -0.21±0.21           │ 0.92±0.18            │
│ Decision Tree       │ -0.00118977±0.0      │ 1.0±0.0              │
└─────────────────────┴──────────────────────┴──────────────────────┘

I'm not wedded to the name describe. Maybe better is summarize but I don't know other examples of summarize in the ML ecosystem and I was reluctant to add yet another method to the namespace. Thoughts or suggestions welcome.

cc @OkonSamuel @LucasMatSP

codecov · 2026-04-05T23:54:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.08%. Comparing base (be6d92f) to head (24d7d65).

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1045      +/-   ##
==========================================
- Coverage   90.26%   89.08%   -1.19%     
==========================================
  Files          33       34       +1     
  Lines        2600     2574      -26     
==========================================
- Hits         2347     2293      -54     
- Misses        253      281      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tylerjthomas9 · 2026-04-06T06:40:03Z

The describe name is more intuitive and consistent with other similar naming patterns I've seen.

One small thing I noticed is that individualize iterates over the input twice. In practice, I don’t think it will cause issues here because nobody would feed in an IO stream or other non-re-iterable collection, but the docstring could be tweaked to reflect this

Overall, it looks great to me.

LucasMatSP · 2026-04-06T12:28:11Z

Personally, I prefer summarize, but I don't have a strong opinion about this. The feature is nice

ablaom · 2026-04-06T21:37:14Z

Thanks for the input! And thank you kindly @tylerjthomas9 for the review.

One small thing I noticed is that individualize iterates over the input twice.

Good catch. I have tweaked the docstring. But I also realised we already have a similar helper method in the pipeline code, called individuate. Therefore I will refactor this and move the method to "src/utilities.jl".

Regarding the name, I'm inclining towards describe, as I couldn't find summarize and Base.summary doesn't apply. But two other alternatives I thought of are:

overload NameTuple(::AbstractPerformanceEvaluation)
report, which we already have for machines

Does anyone like these better than describe?

tylerjthomas9 · 2026-04-08T06:48:35Z

Regarding the name, I'm inclining towards describe, as I couldn't find summarize and Base.summary doesn't apply. But two other alternatives I thought of are:

overload NameTuple(::AbstractPerformanceEvaluation)

report, which we already have for machines

Does anyone like these better than describe?

I am a fan of describe, but I may be alone on this one. It's intuitive to me and fits with other libraries like DataFrames.jl's describe.

ablaom added 2 commits April 6, 2026 09:13

add overloading of describe with tests and docstring

25f81d3

fix docstring

24d7d65

refactor individuate to replace indivualize

24fa8bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to facilitate tabulation of multiple performance evaluations.#1045

Add method to facilitate tabulation of multiple performance evaluations.#1045
ablaom wants to merge 3 commits intodevfrom
describe

ablaom commented Apr 5, 2026

Uh oh!

codecov bot commented Apr 5, 2026

Uh oh!

tylerjthomas9 commented Apr 6, 2026

Uh oh!

LucasMatSP commented Apr 6, 2026 •

edited

Loading

Uh oh!

ablaom commented Apr 6, 2026

Uh oh!

tylerjthomas9 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ablaom commented Apr 5, 2026

Uh oh!

codecov bot commented Apr 5, 2026

Codecov Report

Uh oh!

tylerjthomas9 commented Apr 6, 2026

Uh oh!

LucasMatSP commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ablaom commented Apr 6, 2026

Uh oh!

tylerjthomas9 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasMatSP commented Apr 6, 2026 •

edited

Loading