An open multi-rater benchmark for characterizing architectural fingerprints in identity-scaffolded LLMs.
benchmark open-science large-language-models llm multi-rater ai-benchmark ai-identity inter-rater-reliability simulated-emergence
-
Updated
May 23, 2026 - Python