Skip to content

Add explain class and shap #74

Open
npanczyk wants to merge 42 commits intodevelopfrom
explain_stash
Open

Add explain class and shap #74
npanczyk wants to merge 42 commits intodevelopfrom
explain_stash

Conversation

@npanczyk
Copy link
Collaborator

@npanczyk npanczyk commented Nov 4, 2024

PR Description

This PR adds an explain class and the SHAP package to pyMAISE.

Closes: #67

What changes were made?

Will ultimately include:

  • new explain class
  • updated CHF benchmark
  • unit tests (not done yet)

Reviewers: @myerspat @mradaideh

Important

Please do not review SHAP package files! (I.e., explain/shap)

@npanczyk npanczyk added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 4, 2024
@npanczyk npanczyk self-assigned this Nov 4, 2024
@myerspat myerspat closed this Nov 5, 2024
@myerspat myerspat reopened this Nov 5, 2024
@npanczyk npanczyk marked this pull request as ready for review November 5, 2024 23:11
@npanczyk npanczyk requested review from mradaideh and myerspat and removed request for mradaideh November 5, 2024 23:11
@mradaideh
Copy link
Collaborator

Hi @npanczyk , thanks for a great work. I tested your branch on a clean venv and it worked like a charm on short (small search) and long (full search) environments. These are things to consider before merge:

1- After I installed pyMAISE on fresh venv, I had to install three more dependencies for shap. These were something we talked about to be added to pyMAISE default packages to install with these specific versions:

slicer-0.0.8
numba-0.60.0
cloudpickle-3.0.0

2- Find a way to pass verbose=0 to all tensorflow prediction lines associated with SHAP methods. These are coming from model.predict calls. There might be a way to silence those before or while passing them to DeepLIFT, KernelSHAP, and IG.

3- Let @myerspat knows to add a clear acknowledge in the github page and docs page to SHAP package that we are using their implementation.

4- For the example CHF notebook, since we do not have a detailed documentation yet for that capability yet, it is helpful to discuss something about the classes. For example, why you use nsamples=None (what that means and what would happen if you used nsamples=20 and so on), what is background samples for KernelSHAP, why that needs to be small from a computing cost perspective, and so on for what you feel is important to be described for the user when using such methods.

Copy link
Collaborator

@myerspat myerspat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npanczyk, all code looks great. I didn't review any of the SHAP package specific stuff but I'll leave that to you if you have any issues with it. Now we just need to update the documentation. There are four places that need to be updated with the new capabilities:

  • In the landing page given by docs/source/index.rst please include a blurb somewhere that briefly explains the new explainability features. Copy this over to README.md.
  • In the installation guide at docs/source/installation.rst please include the new dependencies.
  • In the user guide under docs/source/user_guide.rst include a section that fits within the order of things that outlines the new capabilities with any examples.
  • In the pyMAISE API reference at docs/source/pymaise_api.rst include a section for explainability that links the methods/classes you want the user to be aware of. This will link to the methods/classes docstring so make sure those are up to snuff too.

Feel free to go through the docs and add any blurbs on explainabiity where you see fit. Also please address any of the other comments I've left.

Comment on lines +73 to +79
"""
This functin fits a DeepLIFT explainer to evaluate SHAP coeffiicents (only for
neural networks).

:param nsamples: (int less than total samples in test set or None, default=None)
Number of samples used to estimate the DeepLIFT importances if it is
different than using all samples in X
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all functions that the user will be calling directly that are from SHAP should have their docstrings changed to the format pyMAISE uses.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user shouldn't call any functions directly from SHAP. Let me know if I'm missing something here.

Copy link
Collaborator

@myerspat myerspat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @npanczyk, some last changes here.

Comment on lines +195 to +202
.. rubric:: Classes

.. autosummary::
:toctree: stubs
:nosignatures:
:template: class.rst

pyMAISE.ShapExplainers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for me pyMAISE.ShapExplainers does not work in generating documentation. You can test this locally by moving to docs/ and running make html assuming you installed pyMAISE using pip install -e ".[dev]". This will build the HTML files that will be generated on readthedocs. You can open them at docs/build/html/. Also if ShapExplainers is the only class/function the user will need from the explain module consider just importing that in pyMAISE/__init__.py:

from pyMAISE.explain import ShapExplainers

The user should then be able to directly import ShapExplainers by from pyMAISE import ShapExplainers (test this to ensure this is true), and the above autosummary should work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure I got this but I'm having issues getting sphinx to generate an autosummary for me. It's no longer throwing errors about the ShapExplainers thing though, I added it to the init.py as you suggested. Let me know if this still breaks for you

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something at the top of this file referring to the explainability features.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that explainability kind of fell under the post-processing category here. If you think I should make it separate, I can add a "Step 6" but then this changes more of the document structure.

Comment on lines +400 to +406
- :meth:`pyMAISE.ShapExplainers.DeepLIFT`: fits a DeepLIFT explainer to evaluate SHAP coefficients,
- :meth:`pyMAISE.ShapExplainers.IntGradients`: fits an Integrated Gradient explainer to evaluate SHAP coefficients,
- :meth:`pyMAISE.ShapExplainers.KernelSHAP`: fits a KernelSHAP explainer to evaluate SHAP coefficients,
- :meth:`pyMAISE.ShapExplainers.Exact_SHAP`: fits an Exact SHAP explainer to evaluate SHAP coefficients,
- :meth:`pyMAISE.ShapExplainers.postprocess_results`: generates SHAP mean values for plotting functions,
- :meth:`pyMAISE.ShapExplainers.plot`: makes a beeswarm plot and a bar plot for each SHAP method or for a particular method, and
- :meth:`pyMAISE.ShapExplainers.plot_bar_only`: makes a bar plot for each or a particular SHAP method.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check these links work in the generated documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuning and postprocessing results do not look correct here. I think you ran one iteration/epochs for all models here, which is not what we want to show in the final results. You should run a complete run of this benchmark with your explainability features. Feel free to use the parallelization capabilities to make the models tune faster (30 minutes tuning and 75 total, before explain, on AIMS01). Also, nowhere in the landing pages of pyMAISE do you point users to this benchmark as an example of how to use the new explainability features. I would add something to docs/source/index.rst and README.md. Also in your documentation under Explainability Metrics any specific references to function arguments, functions, and methods should have `` surrounding them to make them code and not standard text. Refer to MIT reactor benchmark for examples.

return ax


class ShapExplainers:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you're pointing users to this class, you should add a docstring discussing it and its parameters (refer to Tuner of PostProcessor for examples). You may also consider writing examples for using this (refer to Tuner for an example). Also if you look at the second to last code block on your CHF benchmark you see there are warnings. I know you are having issues getting this to go away, but for Jupyter notebooks, we can use the pyMAISE.utils._try_clear() functions to clear unwanted output, which is what I'm using pretty much everywhere in the tuner. This should clean some things up, as we shouldn't see any warnings for verbosity=0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explainability Analysis

3 participants