Scope
Third checkpoint: annotation queues with automatic evaluation integration.
- Blocked by backend loops (evaluation engine cleanup by JP)
- When loops ready: human evaluations run in backend (not frontend)
- Eventually "human evaluation" vs "automatic evaluation" distinction goes away
What's included
- Combined human + auto evaluator in same evaluation run
- Queue dispatch for mixed evaluation
- Comparison view: human vs auto eval results
Dependencies
- Checkpoint 1 and 2 must ship first
- Backend loops cleanup (JP) must be done
- Design must cover this flow
Scope
Third checkpoint: annotation queues with automatic evaluation integration.
What's included
Dependencies