Chiang, Cheng-Han, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, and Hung-yi Lee. "Large language model as an assignment evaluator: Insights, feedback, and challenges in a 1000+ student course." arXiv preprint arXiv:2407.05216 (2024).
11 Poličar, Pavlin G., Martin Špendl, Tomaž Curk, and Blaž Zupan. "Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course." arXiv preprint arXiv:2501.14499 (2025).
Shankar, Shreya, J. D. Zamfirescu-Pereira, Björn Hartmann, Aditya Parameswaran, and Ian Arawjo. "Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences." In Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, pp. 1-14. 2024.