Partner-grade LLM agent evaluation framework. Verbatim BCG rubric, adversarial Skeptic Agent for sycophancy/ambiguity detection, 10-signal Novelty Stack. Anthropic skill, Claude-native — no API keys.
consulting quality-assurance evaluation-framework mbb claude mckinsey bcg rubric novelty-detection ai-agent anthropic llm-evaluation hallucination-detection llm-as-judge claude-code agent-evaluation strategy-ai claude-skill sycophancy-detection anthropic-skill
-
Updated
May 17, 2026 - Python