Prompt-evaluation toolkit: run golden-case prompts, route models, track cost, and leaderboard.
mcp ai-agents claude pydantic prompt-engineering anthropic evals llm-evaluation llm-as-a-judge fastmcp evaluation-harness github0actions
-
Updated
May 27, 2026 - Python