Repository for Bachelor Thesis "Static Analysis for Automated Error Clustering of Haskell Programming Exercises"
The code in this repository works on Linux where Docker and Python are installed. For running it on other operating systems, adjustments will have to be made.
This repository includes the codebase of the prototype, sample data, and documentation for the regular expression-based clustering presented in the Bachelor Thesis of Esat Akif Avci: "Static Analysis for Automated Error Clustering of Haskell Programming Exercises".
- artificial_submissions: An artificial dataset of submissions to evaluate the clustering
- compilation: The Python script to compile the artificial submissions and save the results as a CSV
- evaluation: The experiments to evaluate the clustering with the artificial submissions set
- ghc_docker: A Docker image with GHC to compile the code
- implementation: The code as implemented in GATE
- list_of_clusters: A list of all clusters with German and English explanations and an example for each cluster
- prototype: The prototype of the clustering system with a test script to test every cluster
This repository uses git submodules. You can clone the repo with the standard cloning command, but if you want to include the submodule used for the AST-clustering, you need to recursively clone with the following command:
git clone --recursive [cloning link from above]Or you can initialize and update later with:
git submodule update --init --recursiveFor the repository, you must create a .env file in the root with credentials for the APIs from each LLM like the following:
GOOGLE_API_KEY=[API-Key]
CLAUDE_API_KEY=[API-Key]
OPENAI_API_KEY=[API-Key]
I used the current stable version for Python and LTS release for Java. The code will probably work for other versions as well.
- Python Version: 3.13.5
- Java Version: OpenJDK 21
For every script requiring Python packages, a requirements.txt file is provided.
-
First, a compilation of the artificial submissions is recommended. You can do this by running the
compilation/compile.pyscript. -
Secondly, classifying the submissions with the
prototype/classify_artificial_submissions.pyfile is recommended. -
After this, every script should be runnable independently