[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
-
Updated
Nov 15, 2025 - Python
[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
Benchmark methodology, task sets, and evaluation results for RA²R
MSc Thesis Project – Framework for Causality-Aware Structured Multi-Step Reasoning in Legal Argument Generation – AI Systems Engineering, supervised by Prof. R. Pietrantuono and PhD Cristian Mascia (2026)
The Marked Bench: a versioned contradiction-detection benchmark for AI reasoning evaluation.
Add a description, image, and links to the reasoning-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the reasoning-evaluation topic, visit your repo's landing page and select "manage topics."