Add Permutation Importance by mayer79 · Pull Request #202 · lorentzenchr/model-diagnostics

mayer79 · 2025-05-10T10:04:15Z

Implements #201

mayer79 · 2025-05-16T11:49:02Z

This is the current basic call:

import numpy as np
import polars as pl
from sklearn.linear_model import LinearRegression

from model_diagnostics.xai import plot_permutation_importance

rng = np.random.default_rng(1)
n = 1000

X = pl.DataFrame(
    {
        "area": rng.uniform(30, 120, n),
        "rooms": rng.choice([2.5, 3.5, 4.5], n),
        "age": rng.uniform(0, 100, n),
    }
)

y = X["area"] + 20 * X["rooms"] + rng.normal(0, 1, n)

model = LinearRegression()
model.fit(X, y)

_ = plot_permutation_importance(
    predict_function=model.predict,
    X=X,
    y=y,
)

The extended feature API allows to permute groups like this:

_ = plot_permutation_importance(
    predict_function=model.predict,
    features={"size": ["area", "rooms"], "age": "age"},
    X=X,
    y=y,
)

…xcept parrow

mayer79 · 2026-02-01T18:23:34Z

2 important points:

Merge with main and resolve merge conflicts. No updates of package versions should be needed.

-> Done

Is it possible to construct a simple test where we know the answer (=importance values) and check against that answer?

-> Ideally, we can compare with Scikit-Learn's implementation. But I am still working on this

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

…79/model-diagnostics into enh-permutation-importance

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

lorentzenchr · 2026-02-08T13:18:58Z

I think only the np.split point is open. If this is merged, an example or addition to an existing one would be nice.

mayer79 · 2026-02-08T14:10:55Z

I think only the np.split point is open. If this is merged, an example or addition to an existing one would be nice.

The condition now is like this:

        # np.split() does not work on pyarrow arrays and should not be used on Pandas
        if (
            is_pyarrow_array(predictions)
            or is_pandas_df(predictions)
            or is_pandas_series(predictions)
        ):
            predictions = predictions.to_numpy()

Open issues:

comparison with another implementation. I was unable to exactly reproduce with Scikit-Learn, even with a single feature / single shuffle situation.
Test failures in calling the scorer pre shuffle

lorentzenchr · 2026-02-10T16:46:45Z

comparison with another implementation. I was unable to exactly reproduce with Scikit-Learn, even with a single feature / single shuffle situation.

I guess we can live without it. Do you have any clue as to where this difference might stem from?

Test failures in calling the scorer pre shuffle

CI/CD must be green before merge.

mayer79 · 2026-02-14T14:07:47Z

comparison with another implementation. I was unable to exactly reproduce with Scikit-Learn, even with a single feature / single shuffle situation.

I guess we can live without it. Do you have any clue as to where this difference might stem from?

I was unable to bring the random number generator to the same state.

Test failures in calling the scorer pre shuffle

CI/CD must be green before merge.

test.py3.12 (locally) gives different failtures than I see on github. Will look into it as soon as I have enough time.

lorentzenchr · 2026-02-28T20:05:05Z

@mayer79 ready to merge?

lorentzenchr · 2026-02-28T21:59:15Z

🎉

Add compute_permutation_importance()

1c594f8

mayer79 self-assigned this May 10, 2025

mayer79 added the enhancement New feature or request label May 10, 2025

mayer79 marked this pull request as draft May 10, 2025 10:04

Replace ipynb by py

775a150

mayer79 changed the title ~~Add compute_permutation_importance()~~ Add Permutation Importance May 10, 2025

mayer79 added 4 commits May 10, 2025 12:24

Catch None values of n_repeats

0bf083e

doctest failure

f8485d0

add plot_permutation_importance()

0c7e6f6

Improve docstring

98e611a

mayer79 added 19 commits May 16, 2025 13:55

Linter

3f44810

remove base_score and n_repeats from output

7d3a4c7

docstring on features argument

7cdc7b7

calculate base score before stacking

63f7825

use scipy special to calculate t quantile

293ca1d

remove reset_index()

c312f1f

Fix doctest

0132dc5

Allow max_display=None

2877208

Add unit tests for plot

42705b9

Add error message for max_display

262540b

Remove wrong Optional typing

e5dcc6f

Replace boolean function argument

5215e53

Linter

ff21a64

Expand docstring of plot()

e5c65dd

simpler safe_select_column()

7f696eb

Replace safe_get_column() by get_second_dimension()

617308e

drop safe_index_rows_1d()

32b13d2

Clarify that np.split() works on all relevant prediction containers e…

8b5d36b

…xcept parrow

First unit tests on calculate_permutation_importance()

62535c0

mayer79 added 2 commits February 1, 2026 18:30

formattings from review

0b17220

change smaller to smaller_is_better, same for greater

97c56d9

missed two 'greater'

e5660db

lorentzenchr reviewed Feb 3, 2026

View reviewed changes

mayer79 and others added 12 commits February 6, 2026 12:43

remove handling of old pandas

e72cdef

Update src/model_diagnostics/_utils/array.py

4b88d64

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

rename rearrange_rows_of_some_columns()

69c306f

Merge branch 'enh-permutation-importance' of https://github.com/mayer…

74768f2

…79/model-diagnostics into enh-permutation-importance

Update src/model_diagnostics/xai/permutation_importance.py

616e394

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

remove unnecessary list() and avoid deep copy for tuples

75cd119

improve formatting of the example

efb60b7

remove unnecessary imports in array

5242dc7

reduce code repetition in tests

3b4b756

change branching order in safe_copy()

c9ddf0a

fix unit tests

6bcf471

fix more unit tests

0b5c3e8

mayer79 added 2 commits February 8, 2026 14:58

catch special case of pa.array predictions and add unit test

cd2238b

np.split warns on pandas structures

8b989f6

fix failing test

6872574

lorentzenchr approved these changes Feb 28, 2026

View reviewed changes

mayer79 merged commit 92e1d9f into lorentzenchr:main Feb 28, 2026
5 checks passed

mayer79 deleted the enh-permutation-importance branch February 28, 2026 20:35

mayer79 mentioned this pull request Feb 28, 2026

Add permutation importance #201

Closed

Conversation

mayer79 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayer79 commented May 16, 2025

Uh oh!

mayer79 commented Feb 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Feb 8, 2026

Uh oh!

mayer79 commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Feb 10, 2026

Uh oh!

mayer79 commented Feb 14, 2026

Uh oh!

lorentzenchr commented Feb 28, 2026

Uh oh!

Uh oh!

lorentzenchr commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mayer79 commented May 10, 2025 •

edited

Loading

mayer79 commented Feb 8, 2026 •

edited

Loading