Paper under review.
- This repo has code and scripts to fine-tune large language models (LLMs) for multi-task PPM.
- We use uv to manage our local environment.
- Tested only on Ubuntu 24.04 using Python 3.12.
Install all dependencies with:
uv venv .venvv -python 3.12
source .venv/bin/activate
uv pip install -r requirements.txt.
├── data/ # Event logs (automatically downloaded)
├── scripts/ # Experiment scripts and configs
│ ├── *.sh
│ ├── *.txt
│ └── *.slurm
├── notebooks/ # Analysis notebooks
├── ppm/ # Source code
├── luijken_transfer_learning.py # Competitor training script
├── rebmann_et_al.py # Narrative-style competitor training script
├── next_event_prediction.py # Main training script
├── requirements.txt # Python dependencies
└── README.md # This file
We use five public event logs. They will be downloaded via SkPM under data/<LOG>/:
RNN baseline
python next_event_prediction.py \
--dataset BPI20PrepaidTravelCosts \
--backbone rnn \
--embedding_size 32 \
--hidden_size 128 \
--lr 0.0005 \
--batch_size 64 \
--epochs 25 \
--categorical_features activity \
--continuous_features all \
--categorical_targets activity \
--continuous_targets remaining_timeLLM fine-tuning
In order to use LLMs, you need a HuggingFace token. A few options on how to use it:
- Create an
.envfile in the root of this repository and write your token likeHF_TOKEN=<YOUR_TOKEN> - Export a local variable
export HF_TOKEN="<YOUR_TOKEN>" - Hard code it here
For local debugging purposes, try the tiny setup below with a small r value for BPI20PrepaidTravelCosts and qwen25-05b. If it doesn't fit your GPU memory, keep decreasing the batch_size (=4 uses less than 2gb).
python next_event_prediction.py \
--dataset BPI20PrepaidTravelCosts \
--backbone qwen25-05b \
--embedding_size 896 \
--hidden_size 896 \
--lr 0.00005 \
--batch_size 64 \
--epochs 1 \
--categorical_features activity \
--continuous_features all \
--categorical_targets activity \
--continuous_targets remaining_time \
--fine_tuning lora \
--r 2 \
--lora_alpha 4Alternatively, use the argument --wandb to enable wandb.
We used Slurm on our HPC clusters. Check scripts/*.sh, scripts/*.txt, and scripts/*.slurm to see how to reproduce our jobs or run other configurations locally.
All metrics and analysis notebooks are in the notebooks/ folder. Check this notebook for plots that have not fit in the paper.
For questions or feedback, reach me at rafael.oyamada@kuleuven.be or open an issue here.