Add Permutation Importance#202
Conversation
|
This is the current basic call: import numpy as np
import polars as pl
from sklearn.linear_model import LinearRegression
from model_diagnostics.xai import plot_permutation_importance
rng = np.random.default_rng(1)
n = 1000
X = pl.DataFrame(
{
"area": rng.uniform(30, 120, n),
"rooms": rng.choice([2.5, 3.5, 4.5], n),
"age": rng.uniform(0, 100, n),
}
)
y = X["area"] + 20 * X["rooms"] + rng.normal(0, 1, n)
model = LinearRegression()
model.fit(X, y)
_ = plot_permutation_importance(
predict_function=model.predict,
X=X,
y=y,
)The extended feature API allows to permute groups like this: _ = plot_permutation_importance(
predict_function=model.predict,
features={"size": ["area", "rooms"], "age": "age"},
X=X,
y=y,
) |
-> Done
-> Ideally, we can compare with Scikit-Learn's implementation. But I am still working on this |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
…79/model-diagnostics into enh-permutation-importance
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
|
I think only the np.split point is open. If this is merged, an example or addition to an existing one would be nice. |
The condition now is like this: # np.split() does not work on pyarrow arrays and should not be used on Pandas
if (
is_pyarrow_array(predictions)
or is_pandas_df(predictions)
or is_pandas_series(predictions)
):
predictions = predictions.to_numpy()Open issues:
|
I guess we can live without it. Do you have any clue as to where this difference might stem from?
CI/CD must be green before merge. |
I was unable to bring the random number generator to the same state.
test.py3.12 (locally) gives different failtures than I see on github. Will look into it as soon as I have enough time. |
|
@mayer79 ready to merge? |
|
🎉 |


Implements #201