Problem
The LLM-as-a-judge evaluator has a hardcoded list of models in its settings template (multiple_choice type with static options). Meanwhile, prompts in the playground use the dynamic model list from the model registry / vault.
Users expect to use the same models for evaluation that they use for their prompts.
Blocked By
This is blocked by the evaluator playground migration (AGE-3656). The fix requires rethinking how evaluator schemas work — specifically, the model field in LLM-as-a-judge should reference the same dynamic model list instead of being a hardcoded multiple_choice in the evaluator catalogue.
Notes from Sprint Planning (Mar 11)
- Related to the new schema design for evaluators
- Should be addressed as part of the evaluator playground migration work
Problem
The LLM-as-a-judge evaluator has a hardcoded list of models in its settings template (
multiple_choicetype with static options). Meanwhile, prompts in the playground use the dynamic model list from the model registry / vault.Users expect to use the same models for evaluation that they use for their prompts.
Blocked By
This is blocked by the evaluator playground migration (AGE-3656). The fix requires rethinking how evaluator schemas work — specifically, the model field in LLM-as-a-judge should reference the same dynamic model list instead of being a hardcoded
multiple_choicein the evaluator catalogue.Notes from Sprint Planning (Mar 11)