Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 1.52 KB

File metadata and controls

17 lines (12 loc) · 1.52 KB

Overview

  • This repository contains the code that was used to evaluate which categorization approach performed best at categorizing student implementations.
  • To assess categorization quality, an ideal categorization of the "shirt-size"-task was crafted. It is shown in the "sample-data/shirt-size/optimal-categories" folder.

Code Categorization Approaches

  • In the "approaches" folder, all the tested approaches are listed. They are seperated by approach type, currently "jaccard", "llm", and "tsed".
  • For each approach, there is one file that performs the offline clustering experiments and one that performs the online clustering experiments.
    • Within those files, the relevant environment variables are set (e.g. data of which task is used. Since evaluation is only possible for the "shirt-size" task (because only this task has a ground-truth categorization), it is currently set everywhere).
    • Running these files performs the clustering and outputs the result (as well as the evaluation if the "shirt-size" task data was used).
      • The results are persisted in a file and stored in the "results" folder of the approach.

Sample Data

  • Anonymized data of three old ACCESS tasks are contained in this repository ("arithmetic-expression", "invert-dictionary", and "shirt-size"). They can be found in the "sample-data" folder and can be used in the experiments.
  • For each task, there's one .json file containing all the submissions that students made, and one .json file that contains only the first submissions students made.