Tricky2

This repository contains the code used for the experiments and analyses for the Tricky2 dataset. All components are organized to support reproducibility, including dataset preprocessing, model training/evaluation scripts, and analysis utilities.

Tricky² is a benchmark designed to evaluate the robustness of automated software-engineering systems—particularly large language models (LLMs)—on realistic, multi-origin software defects. It extends prior bug–fixing datasets by introducing a controlled mixture of human-written bugs and LLM-generated bugs, enabling the study of how these defect types differ and interact. The dataset contains three primary splits:

Human-only: Programs containing naturally occurring bugs from real student or developer submissions.
LLM-only: Programs where the only defects were injected by large language models using structured prompts.
Human+LLM (mixed-origin): Programs that contain original human bugs along with additional LLM-injected bugs.

Each program includes:

The buggy code
A corresponding reference solution
A taxonomy label describing the fault type
Problem metadata (language, difficulty, problem category)
Test suites for evaluating correctness or attempted repairs

The benchmark supports multiple evaluation tasks, including:

Origin classification – determining whether a bug is human-authored, LLM-generated, or mixed.
Error identification – localizing the lines or regions responsible for the defect.
Program repair – producing fixes that pass the provided tests. Tricky² is intended to help researchers study failure modes, interaction effects among multiple bug sources, and the limits of current automated program-analysis and repair models.

Requirements

All requirements can be installed via

pip install -r requirements.txt

Data

The dataset used for this repository is not directly included. You can find the full dataset on Zenodo.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
analysis		analysis
eval		eval
injection		injection
plotting		plotting
utils		utils
validation		validation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__,py		__init__,py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tricky2

Requirements

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

WM-SEMERU/prj-syntax-errors

Folders and files

Latest commit

History

Repository files navigation

Tricky2

Requirements

Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages