Replication package for the paper:
@inproceedings{Maton2026,
author = "Maton, Megan and Kapfhammer, Gregory M. and McMinn, Phil",
title = "Empirically Comparing Hazard-Guided LLM Mutation Techniques with Existing LLM- and
Rule-Based Approaches",
booktitle = "International Conference on Evaluation and Assessment in Software Engineering (EASE)",
year = "2026"
}
- Java 11: set the environment variable
JDK_11to your Java 11 installation. - Defects4J: follow the Defects4J setup instructions and confirm
defects4j test ...runs. - UV dependency management: install
uvfrom https://github.com/astral-sh/uv. -
yqfor parsing YAML config files in bash scripts. - Ollama running
gpt-oss:20blocally.
Clone this repo, including submodules:
git clone https://github.com/LLM-Mutation/EASE-2026-replication-package/
git submodule update --init --recursiveThis study uses projects from the Defects4J collection with the mgnmtn/defects4j configuration.
Follow the setup instructions.
Projects are listed in scripts/args_full.csv.
The multiplex submodule lives in ./multiplex/ and provides the implementations of:
- HAZOP
- STPA
- Unguided
- LLMorpheus
- Mutahunter
The core prompts can be found in config/config_prompts.yml, with the individual user prompts being configured within multiplex.
The Mutahunter prompts must be added by the user due to licensing differences.
The system prompt should be added to the end of the config/config_prompts.yml file under mutahunter_generate_mutants: |.
Then generate the experiment config files:
cd scripts
uv sync
python3.13 generate_config_files.py
cd ..This will generate all the configuration files used in the study.
Add the arguments for each Defects4J bug you want to study to the relevant args-*.csv files, and ensure args.csv contains all bugs to clone.
Run all tools from the repo root. This can be time-consuming for multiple projects, and depends on the chosen LLM.
To update the LLM for automated runs, edit scripts/generate_config_files.py and update the relevant model and API key settings.
bash run-all.shEach project is cloned into project/ and removed after use to keep a clean copy between tools and conserve storage.
Outputs are grouped by project under output/.
Each tool run creates a date-and-time-stamped directory.
For example, the file structure for an STPA output dir will be as follows:
outputs/project/cli_1_fixed/20260518_1108-stpa
├── control_diagram.txt
├── original_method.java
├── stpa-mutants
│ ├── mutant_0.java
│ ├── mutant_1.java
│ ├── mutant_2.java
│ ├── mutant_3.java
│ ├── mutant_4.java
│ └── mutant_summary.csv
├── stpa-test
│ ├── mutant_0_test.txt
│ ├── mutant_1_test.txt
│ ├── mutant_2_test.txt
│ ├── mutant_3_test.txt
│ └── mutant_4_test.txt
└── ucas.csv
We executed the Major Mutation tool using the Defects4J Framework integration. We used the Major mutation configuration available as a submodule in this repository. The configuration used in this study will be imported when the submodules are initialized.
To run the Major tool:
Ensure the Major commands are not commented out in the run-all.sh script.
Major may have already been executed as part of your original run if this is not commented out.
Otherwise, comment out the other tool commands and re-run using only Major.
In the ./analysis/ directory are all of the scripts used for analysing the outputs of multiplex and Major to answer the research questions in the paper.
The data used in the paper is in the analysis/data directory.
For each RQ, there is one main Jupyter notebook as well as accompanying scripts in ./analysis/scripts/ for deeper analysis to produce the plots and tables in the paper.
analysis/src/analysis/: core Python package containing analysis modules for mutant validity, subsumption, project summaries, and utility helpers.analysis/data/: sample datasets and experiment artifacts (original methods, mutants, test logs, and summaries) used by the notebooks and scripts.analysis/rq1.ipynb,analysis/rq2.ipynb: primary notebooks used to reproduce the figures and tables for each research question.
cd ./analysis/
uv sync
source .venv/bin/activate
uv pip install -e .
python3.13 ./path/to/script.pyThe outputs appear in the data-summaries/ directory.