PyPremise

Premise is a data mining method to identify significant differences between two groups of texts. It can be used, e.g., to identify from the outputs of an LLM what systematic differences occur if you change your prompt. Or it can identify patterns or explanations of where an NLP classifier performs well and where it fails.

PyPremise is a Python library to makes it easy to run Premise with just a couple of lines of code.

Example Applications

Systematic differences between two prompts or two LLMs

Premise identified, e.g., that changing the last name of the protagonist from Dr. Li to Dr. Smith in a story generation prompt caused - among other differences - a drastic change in the gender distribution (more details and examples here)

Patterns Group 0 (Dr. Li)	Patterns Group 1 (Dr. Smith)
(She, she)	(He, he)
(car, accident)	(general, practioner)
	(small, town)

Evaluating where a classifier fails

Premise identified, e.g., for a Visual Question Answering model that it struggled with counting, color identification and visual orientation (more details and examples here):

pattern	example from the dataset
(how, many)	how many elephants are there
(what, ⓧ(color, colors, colour) )	what color is the bench
(on, top, of)	what is on the top of the cake
(left, to)	what can be seen to the left
(on, wall, hanging)	what is hanging on the wall

Example Usage

You can run PyPremise with just a few lines of code.

from pypremise import Premise, data_loaders
premise_instances, _, voc_index_to_token = data_loaders.get_dummy_data()

premise = Premise(voc_index_to_token=voc_index_to_token)
patterns = premise.find_patterns(premise_instances)

for p in patterns:
    print(p)
# prints two patterns
# (How) and (many) towards group 0
# (When) and (was) and (taken) towards group

PyPremise provides you with helper methods to load data from different sources like numpy arrays or tokenized text files. See below for more examples.

Installation

Install a recent version of Python, then just run

pip install pypremise

Currently, PyPremise directly supports Linux (AMD64/X86-64), Mac (Apple Silicon, the "M" processors) and Windows. See the section "Native Code" below for other platforms.

Tutorials

We provide several Jupyter Notebooks as tutorials. Ideally, you start with the "General introductory example" and then you can pick tutorials for more specific use cases.

General introductory example: general_example.ipynb
How to evaluate LLM outputs (Spotlight usecase): LLM_outputs_examples.ipynb
How to explain misclassifications of NLP models (original Premise usecase): missclassification_examples.ipynb
How to use word embeddings for more complex results: word_embedding_examples.ipynb
How to control and interpret log levels: logging_tutorial.ipynb

If you need additional help, do not hesitate to reach out to us!

Additional data loaders

The file data_loaders.py contains helper functions for loading data from various sources such as token lists, NumPy-like arrays, or CSV files. If you want to load your own data, refer to this file for examples and supported formats.

Native Code

The core code of the Premise algorithm is implemented in C++ for efficiency and runtime reasons. You can find it in the original Premise repository. You can use the native version of Premise (without the PyPremise Python wrapper) following the instructions there.

If you have your own compiled version of Premise (e.g. for a currently not supported platform), you can use it in PyPremise via the premise_engine argument of the pypremise.core.Premise constructor. Just use

Premise(premise_engine="local_path:/path/to/your/premise/binary.exe")

Authors, Issues, License & Citation

PyPremise is developed by the Human-Centric NLP & AI group at LMU Munich. If you run into any issues using PyPremise, do not hesitate to contact us via mail or create an issue on Github.

If you use this tool in your work, we would be happy if you tell us about it.

If you use PyPremise to evaluate the output of LLMs, please cite

@inproceedings{hedderich-etal-2025-whats,
    title = "What{'}s the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns",
    author = "Hedderich, Michael A.  and
      Wang, Anyi  and
      Zhao, Raoyuan  and
      Eichin, Florian  and
      Fischer, Jonas  and
      Plank, Barbara",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2025",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.985/"
}

For any other uses (like e.g. explaining misclassifications) and for the general Premise algorithm, please cite:

@inproceedings{premise,
  author    = {Michael A. Hedderich and
               Jonas Fischer and
               Dietrich Klakow and
               Jilles Vreeken},
  title     = {Label-Descriptive Patterns and Their Application to Characterizing
               Classification Errors},
  booktitle = {International Conference on Machine Learning, {ICML}},
  series    = {Proceedings of Machine Learning Research},
  year      = {2022},
  url       = {https://proceedings.mlr.press/v162/hedderich22a.html}
}

PyPremise (not Premise itself) is published under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
documentation		documentation
src/pypremise		src/pypremise
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
example.png		example.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyPremise

Example Applications

Systematic differences between two prompts or two LLMs

Evaluating where a classifier fails

Example Usage

Installation

Tutorials

Additional data loaders

Native Code

Authors, Issues, License & Citation

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

m-hedderich/PyPremise

Folders and files

Latest commit

History

Repository files navigation

PyPremise

Example Applications

Systematic differences between two prompts or two LLMs

Evaluating where a classifier fails

Example Usage

Installation

Tutorials

Additional data loaders

Native Code

Authors, Issues, License & Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages