Open evaluation harness for mental health LLM responses. 5 clinically-grounded rubrics, LLM-as-judge with bias controls, crisis-detection routing to 988 protocols.
psychology cbt ai-safety conversational-ai clinical-ai cohen-kappa ollama llm-evaluation llm-as-judge mental-health-ai ai-eval inter-rater-reliability eval-harness lifeline-988 open-source-eval
-
Updated
May 24, 2026 - Python