@@ -34,10 +34,6 @@ Each evaluator comes with a predefined input and output schema. When using an ev
3434 <Card title = " Word Count Ratio" icon = " hashtag" >
3535 Measure the ratio of words to the input to compare input/output verbosity and expansion patterns.
3636 </Card >
37-
38- <Card title = " Tone Detection" icon = " smile" >
39- Classify emotional tone of responses (joy, anger, sadness, etc.).
40- </Card >
4137</CardGroup >
4238
4339### Quality & Correctness
@@ -55,8 +51,8 @@ Each evaluator comes with a predefined input and output schema. When using an ev
5551 Evaluate factual accuracy by comparing answers against ground truth.
5652 </Card >
5753
58- <Card title = " Answer Completeness" icon = " check- circle" >
59- Measure how completely responses use relevant context.
54+ <Card title = " Answer Completeness" icon = " circle-check " >
55+ Measure how completely responses use relevant context to ensure all relevant information is addressed .
6056 </Card >
6157
6258 <Card title = " Topic Adherence" icon = " hashtag" >
@@ -67,6 +63,10 @@ Each evaluator comes with a predefined input and output schema. When using an ev
6763 Validate semantic similarity between expected and actual responses to measure content alignment.
6864 </Card >
6965
66+ <Card title = " Instruction Adherence" icon = " clipboard-check" >
67+ Measure how well the LLM response follows given instructions to ensure compliance with specified requirements.
68+ </Card >
69+
7070 <Card title = " Prompt Perplexity" icon = " brain" >
7171 Measure how predictable/familiar a prompt is to a language model.
7272 </Card >
@@ -76,7 +76,11 @@ Each evaluator comes with a predefined input and output schema. When using an ev
7676 </Card >
7777
7878 <Card title = " Uncertainty Detector" icon = " gauge" >
79- Generate responses and measure model uncertainty from logprobs.
79+ Generate responses and measure model uncertainty from logprobs to identify when the model is less confident in its outputs.
80+ </Card >
81+
82+ <Card title = " Conversation Quality" icon = " comments" >
83+ Evaluate conversation quality based on tone, clarity, flow, responsiveness, and transparency.
8084 </Card >
8185</CardGroup >
8286
@@ -134,4 +138,24 @@ Each evaluator comes with a predefined input and output schema. When using an ev
134138 <Card title = " Agent Goal Accuracy" icon = " bullseye" >
135139 Validate agent goal accuracy to ensure AI systems achieve their intended objectives effectively.
136140 </Card >
141+
142+ <Card title = " Agent Tool Error Detector" icon = " wrench" >
143+ Detect errors or failures during tool execution to monitor agent tool performance.
144+ </Card >
145+
146+ <Card title = " Agent Flow Quality" icon = " route" >
147+ Validate agent trajectories against user-defined natural language tests to assess agent decision-making paths.
148+ </Card >
149+
150+ <Card title = " Agent Efficiency" icon = " zap" >
151+ Evaluate agent efficiency by checking for redundant calls and optimal paths to optimize agent performance.
152+ </Card >
153+
154+ <Card title = " Agent Goal Completeness" icon = " circle-check" >
155+ Measure whether the agent successfully accomplished all user goals to verify comprehensive goal achievement.
156+ </Card >
157+
158+ <Card title = " Intent Change" icon = " repeat" >
159+ Detect whether the user's primary intent or workflow changed significantly during a conversation.
160+ </Card >
137161</CardGroup >
0 commit comments