Skip to content

LLM-Mutation/ease-2026-replication-package

Repository files navigation

EASE 2026 Replication Package

Replication package for the paper:

@inproceedings{Maton2026,
  author    = "Maton, Megan and Kapfhammer, Gregory M. and McMinn, Phil",
  title     = "Empirically Comparing Hazard-Guided LLM Mutation Techniques with Existing LLM- and
               Rule-Based Approaches",
  booktitle = "International Conference on Evaluation and Assessment in Software Engineering (EASE)",
  year      = "2026"
}

Requirements

  • Java 11: set the environment variable JDK_11 to your Java 11 installation.
  • Defects4J: follow the Defects4J setup instructions and confirm defects4j test ... runs.
  • UV dependency management: install uv from https://github.com/astral-sh/uv.
  • yq for parsing YAML config files in bash scripts.
  • Ollama running gpt-oss:20b locally.

Setup

Clone this repo, including submodules:

git clone https://github.com/LLM-Mutation/EASE-2026-replication-package/
git submodule update --init --recursive

Defects4J Config

This study uses projects from the Defects4J collection with the mgnmtn/defects4j configuration. Follow the setup instructions. Projects are listed in scripts/args_full.csv.

multiplex

The multiplex submodule lives in ./multiplex/ and provides the implementations of:

  • HAZOP
  • STPA
  • Unguided
  • LLMorpheus
  • Mutahunter

Prompts

The core prompts can be found in config/config_prompts.yml, with the individual user prompts being configured within multiplex.

The Mutahunter prompts must be added by the user due to licensing differences. The system prompt should be added to the end of the config/config_prompts.yml file under mutahunter_generate_mutants: |. Then generate the experiment config files:

cd scripts 
uv sync
python3.13 generate_config_files.py
cd ..

This will generate all the configuration files used in the study.

Running multiplex for experiments

Add the arguments for each Defects4J bug you want to study to the relevant args-*.csv files, and ensure args.csv contains all bugs to clone.

⚠️ Executing all tools can be memory intensive, especially with a local LLM and no GPU.

Run all tools from the repo root. This can be time-consuming for multiple projects, and depends on the chosen LLM. To update the LLM for automated runs, edit scripts/generate_config_files.py and update the relevant model and API key settings.

bash run-all.sh

Each project is cloned into project/ and removed after use to keep a clean copy between tools and conserve storage. Outputs are grouped by project under output/. Each tool run creates a date-and-time-stamped directory. For example, the file structure for an STPA output dir will be as follows:

outputs/project/cli_1_fixed/20260518_1108-stpa
├── control_diagram.txt
├── original_method.java
├── stpa-mutants
│   ├── mutant_0.java
│   ├── mutant_1.java
│   ├── mutant_2.java
│   ├── mutant_3.java
│   ├── mutant_4.java
│   └── mutant_summary.csv
├── stpa-test
│   ├── mutant_0_test.txt
│   ├── mutant_1_test.txt
│   ├── mutant_2_test.txt
│   ├── mutant_3_test.txt
│   └── mutant_4_test.txt
└── ucas.csv

Executing the Major Mutation Tool

We executed the Major Mutation tool using the Defects4J Framework integration. We used the Major mutation configuration available as a submodule in this repository. The configuration used in this study will be imported when the submodules are initialized.

To run the Major tool: Ensure the Major commands are not commented out in the run-all.sh script. Major may have already been executed as part of your original run if this is not commented out. Otherwise, comment out the other tool commands and re-run using only Major.

Analysis

In the ./analysis/ directory are all of the scripts used for analysing the outputs of multiplex and Major to answer the research questions in the paper. The data used in the paper is in the analysis/data directory.

For each RQ, there is one main Jupyter notebook as well as accompanying scripts in ./analysis/scripts/ for deeper analysis to produce the plots and tables in the paper.

Analysis directory structure

  • analysis/src/analysis/: core Python package containing analysis modules for mutant validity, subsumption, project summaries, and utility helpers.
  • analysis/data/: sample datasets and experiment artifacts (original methods, mutants, test logs, and summaries) used by the notebooks and scripts.
  • analysis/rq1.ipynb, analysis/rq2.ipynb: primary notebooks used to reproduce the figures and tables for each research question.
cd ./analysis/
uv sync
source .venv/bin/activate
uv pip install -e .
python3.13 ./path/to/script.py

The outputs appear in the data-summaries/ directory.

About

Replication Package: Empirically Comparing Hazard-Guided LLM Mutation Techniques with Existing LLM- and Rule-Based Approaches

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages