Skip to content

Commit b75c102

Browse files
authored
chore: Add new mbt evaluators (#125)
1 parent dd7ec47 commit b75c102

File tree

1 file changed

+75
-25
lines changed

1 file changed

+75
-25
lines changed

evaluators/made-by-traceloop.mdx

Lines changed: 75 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -16,72 +16,122 @@ Each evaluator comes with a predefined input and output schema. When using an ev
1616

1717
## Evaluator Types
1818

19+
### Style
20+
1921
<CardGroup cols={3}>
2022
<Card title="Character Count" icon="text">
2123
Analyze response length and verbosity to ensure outputs meet specific length requirements.
2224
</Card>
23-
25+
2426
<Card title="Character Count Ratio" icon="hashtag">
2527
Measure the ratio of characters to the input to assess response proportionality and expansion.
2628
</Card>
27-
29+
2830
<Card title="Word Count" icon="align-left">
2931
Ensure appropriate response detail level by tracking the total number of words in outputs.
3032
</Card>
31-
33+
3234
<Card title="Word Count Ratio" icon="hashtag">
3335
Measure the ratio of words to the input to compare input/output verbosity and expansion patterns.
3436
</Card>
35-
37+
38+
<Card title="Tone Detection" icon="smile">
39+
Classify emotional tone of responses (joy, anger, sadness, etc.).
40+
</Card>
41+
</CardGroup>
42+
43+
### Quality & Correctness
44+
45+
<CardGroup cols={3}>
3646
<Card title="Answer Relevancy" icon="bullseye">
3747
Verify responses address the query to ensure AI outputs stay on topic and remain relevant.
3848
</Card>
39-
49+
4050
<Card title="Faithfulness" icon="circle-check">
4151
Detect hallucinations and verify facts to maintain accuracy and truthfulness in AI responses.
4252
</Card>
43-
53+
54+
<Card title="Answer Correctness" icon="circle-check">
55+
Evaluate factual accuracy by comparing answers against ground truth.
56+
</Card>
57+
58+
<Card title="Answer Completeness" icon="check-circle">
59+
Measure how completely responses use relevant context.
60+
</Card>
61+
62+
<Card title="Topic Adherence" icon="hashtag">
63+
Validate topic adherence to ensure responses stay focused on the specified subject matter.
64+
</Card>
65+
66+
<Card title="Semantic Similarity" icon="hashtag">
67+
Validate semantic similarity between expected and actual responses to measure content alignment.
68+
</Card>
69+
70+
<Card title="Prompt Perplexity" icon="brain">
71+
Measure how predictable/familiar a prompt is to a language model.
72+
</Card>
73+
74+
<Card title="Measure Perplexity" icon="hashtag">
75+
Measure text perplexity from logprobs to assess the predictability and coherence of generated text.
76+
</Card>
77+
78+
<Card title="Uncertainty Detector" icon="gauge">
79+
Generate responses and measure model uncertainty from logprobs.
80+
</Card>
81+
</CardGroup>
82+
83+
### Security & Compliance
84+
85+
<CardGroup cols={3}>
4486
<Card title="PII Detection" icon="shield">
4587
Identify personal information exposure to protect user privacy and ensure data security compliance.
4688
</Card>
47-
89+
4890
<Card title="Profanity Detection" icon="triangle-exclamation">
4991
Flag inappropriate language use to maintain content quality standards and professional communication.
5092
</Card>
51-
93+
94+
<Card title="Sexism Detection" icon="triangle-exclamation">
95+
Detect sexist and discriminatory content.
96+
</Card>
97+
98+
<Card title="Prompt Injection" icon="shield-exclamation">
99+
Detect prompt injection attacks in user inputs.
100+
</Card>
101+
102+
<Card title="Toxicity Detector" icon="skull">
103+
Detect toxic content including personal attacks, mockery, hate, and threats.
104+
</Card>
105+
52106
<Card title="Secrets Detection" icon="lock">
53107
Monitor for credential and key leaks to prevent accidental exposure of sensitive information.
54108
</Card>
55-
109+
</CardGroup>
110+
111+
### Formatting
112+
113+
<CardGroup cols={3}>
56114
<Card title="SQL Validation" icon="database">
57115
Validate SQL queries to ensure proper syntax and structure in database-related AI outputs.
58116
</Card>
59-
117+
60118
<Card title="JSON Validation" icon="code">
61119
Validate JSON responses to ensure proper formatting and structure in API-related outputs.
62120
</Card>
63-
121+
64122
<Card title="Regex Validation" icon="asterisk">
65123
Validate regex patterns to ensure correct regular expression syntax and functionality.
66124
</Card>
67-
125+
68126
<Card title="Placeholder Regex" icon="asterisk">
69127
Validate placeholder regex patterns to ensure proper template and variable replacement structures.
70128
</Card>
71-
72-
<Card title="Semantic Similarity" icon="hashtag">
73-
Validate semantic similarity between expected and actual responses to measure content alignment.
74-
</Card>
75-
129+
</CardGroup>
130+
131+
### Agents
132+
133+
<CardGroup cols={3}>
76134
<Card title="Agent Goal Accuracy" icon="bullseye">
77135
Validate agent goal accuracy to ensure AI systems achieve their intended objectives effectively.
78136
</Card>
79-
80-
<Card title="Topic Adherence" icon="hashtag">
81-
Validate topic adherence to ensure responses stay focused on the specified subject matter.
82-
</Card>
83-
84-
<Card title="Measure Perplexity" icon="hashtag">
85-
Measure text perplexity from logprobs to assess the predictability and coherence of generated text.
86-
</Card>
87137
</CardGroup>

0 commit comments

Comments
 (0)