Skip to content

Commit 7536c0b

Browse files
roshan-vapiclaude
andcommitted
feat: add QA evaluation structured outputs for Starlight (Brent Council)
Add 5 structured output YAML files for automated post-call QA evaluation of Brent Council Housing Benefits calls: - starlight-qa-engagement.yml: 7 questions (3 auto-fail: 1.3, 1.4, 1.5) - starlight-qa-right-first-time.yml: 8 questions (3 auto-fail: 2.3, 2.4, 2.5) - starlight-qa-signposting.yml: 2 questions (no auto-fail) - starlight-qa-explaining.yml: 2 questions (no auto-fail) - starlight-wrap-up-code.yml: call classification into 19 wrap-up codes Each QA structured output evaluates per-question with result (yes/no/not_applicable), reasoning, and transcript evidence. Auto-fail logic: if ANY auto-fail question receives "no", the entire evaluation fails across all categories. All outputs include multilingual transcript support, AI agent adaptation notes, and the full Brent Council Housing Benefits glossary. Closes PRO-846 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4568ec3 commit 7536c0b

5 files changed

Lines changed: 778 additions & 0 deletions

File tree

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
name: starlight_qa_engagement
2+
type: ai
3+
target: messages
4+
description: |
5+
Evaluates the ENGAGEMENT quality of a Brent Council Housing Benefits call.
6+
This is 1 of 4 equally-weighted QA categories for the Starlight project.
7+
8+
IMPORTANT - AUTO-FAIL RULES:
9+
Questions 1.3, 1.4, and 1.5 are auto-fail. If ANY of these receives a "no" result,
10+
set auto_fail to true. When auto_fail is true across ANY of the 4 QA categories,
11+
the ENTIRE call evaluation fails (not just this section).
12+
13+
MULTILINGUAL TRANSCRIPTS:
14+
The call may be conducted in any language. Evaluate the transcript in whatever language
15+
it occurs in. Do not penalise the agent for using a language other than English if the
16+
caller initiated in that language.
17+
18+
AI AGENT ADAPTATION NOTES:
19+
- Question 1.3 (data security check): Use not_applicable if the call scenario did not
20+
require identity verification (e.g. general enquiry with no account lookup).
21+
- Question 1.6 (hold time): Use not_applicable if no hold occurred during the call.
22+
- Question 1.7 (after call work): Use not_applicable as AI agents do not perform ACW.
23+
24+
GLOSSARY OF BRENT COUNCIL TERMS:
25+
RSF - Resident Support Fund | DHP - Discretionary Housing Payment |
26+
CIC/s - Change in Circumstances | CTS - Council Tax Support |
27+
HB - Housing Benefit | UC - Universal Credit | Recons - Reconsideration |
28+
Portal/My Account/CAS - Citizen Access Service (customer self-service portal) |
29+
Non Dep - Non dependants | OP - Overpayments | LHA - Local Housing Allowance |
30+
HSF - Household Support Fund | SB - Switchboard |
31+
Welfare Benefit - PIP, Disability Allowance, ESA, etc.
32+
model:
33+
provider: openai
34+
model: gpt-4.1
35+
temperature: 0
36+
assistant_ids: []
37+
workflow_ids: []
38+
schema:
39+
type: object
40+
description: "Engagement QA evaluation for Brent Council Housing Benefits calls."
41+
properties:
42+
question_1_1:
43+
type: object
44+
description: "1.1 Warm greeting, gave service and own name and asked for their name if not SB."
45+
properties:
46+
result:
47+
type: string
48+
description: "yes if the agent provided a warm greeting with service name and own name and asked for caller name; no if not; not_applicable if this was a switchboard transfer."
49+
enum:
50+
- "yes"
51+
- "no"
52+
- "not_applicable"
53+
reasoning:
54+
type: string
55+
description: "Explanation of why this result was given, referencing specific parts of the conversation."
56+
evidence:
57+
type: array
58+
description: "Relevant excerpts from the transcript supporting the evaluation."
59+
items:
60+
type: object
61+
properties:
62+
message_text:
63+
type: string
64+
description: "The exact text from the transcript."
65+
timestamp:
66+
type: string
67+
description: "The timestamp or position in the conversation where this occurred."
68+
question_1_2:
69+
type: object
70+
description: "1.2 Apology given for the long wait / acknowledged and recognised service failure if mentioned."
71+
properties:
72+
result:
73+
type: string
74+
description: "yes if an apology or acknowledgement was given when appropriate; no if the caller mentioned a wait or service failure and it was not acknowledged; not_applicable if the caller did not mention any wait or service failure."
75+
enum:
76+
- "yes"
77+
- "no"
78+
- "not_applicable"
79+
reasoning:
80+
type: string
81+
description: "Explanation of why this result was given."
82+
evidence:
83+
type: array
84+
description: "Relevant excerpts from the transcript."
85+
items:
86+
type: object
87+
properties:
88+
message_text:
89+
type: string
90+
description: "The exact text from the transcript."
91+
timestamp:
92+
type: string
93+
description: "The timestamp or position in the conversation."
94+
question_1_3:
95+
type: object
96+
description: "1.3 Completed data security check. AUTO-FAIL: If result is 'no', the entire evaluation fails."
97+
properties:
98+
result:
99+
type: string
100+
description: "yes if identity/security verification was completed before accessing account details; no if account details were accessed without verification; not_applicable if the call did not require account access."
101+
enum:
102+
- "yes"
103+
- "no"
104+
- "not_applicable"
105+
reasoning:
106+
type: string
107+
description: "Explanation of why this result was given."
108+
evidence:
109+
type: array
110+
description: "Relevant excerpts from the transcript."
111+
items:
112+
type: object
113+
properties:
114+
message_text:
115+
type: string
116+
description: "The exact text from the transcript."
117+
timestamp:
118+
type: string
119+
description: "The timestamp or position in the conversation."
120+
question_1_4:
121+
type: object
122+
description: "1.4 Controlled the call and maintained professionalism throughout. AUTO-FAIL: If result is 'no', the entire evaluation fails."
123+
properties:
124+
result:
125+
type: string
126+
description: "yes if the agent maintained control and professionalism throughout; no if the agent lost control or was unprofessional at any point; not_applicable only in exceptional circumstances."
127+
enum:
128+
- "yes"
129+
- "no"
130+
- "not_applicable"
131+
reasoning:
132+
type: string
133+
description: "Explanation of why this result was given."
134+
evidence:
135+
type: array
136+
description: "Relevant excerpts from the transcript."
137+
items:
138+
type: object
139+
properties:
140+
message_text:
141+
type: string
142+
description: "The exact text from the transcript."
143+
timestamp:
144+
type: string
145+
description: "The timestamp or position in the conversation."
146+
question_1_5:
147+
type: object
148+
description: "1.5 Listened actively, positive tone, showed interest, empathy, patience and helpfulness. AUTO-FAIL: If result is 'no', the entire evaluation fails."
149+
properties:
150+
result:
151+
type: string
152+
description: "yes if the agent demonstrated active listening, positive tone, interest, empathy, patience and helpfulness; no if the agent was dismissive, impatient, or unhelpful; not_applicable only in exceptional circumstances."
153+
enum:
154+
- "yes"
155+
- "no"
156+
- "not_applicable"
157+
reasoning:
158+
type: string
159+
description: "Explanation of why this result was given."
160+
evidence:
161+
type: array
162+
description: "Relevant excerpts from the transcript."
163+
items:
164+
type: object
165+
properties:
166+
message_text:
167+
type: string
168+
description: "The exact text from the transcript."
169+
timestamp:
170+
type: string
171+
description: "The timestamp or position in the conversation."
172+
question_1_6:
173+
type: object
174+
description: "1.6 Explained any hold time, kept the customer updated, apologised for the hold."
175+
properties:
176+
result:
177+
type: string
178+
description: "yes if hold time was explained and apology given; no if the caller was put on hold without explanation or apology; not_applicable if no hold occurred during the call."
179+
enum:
180+
- "yes"
181+
- "no"
182+
- "not_applicable"
183+
reasoning:
184+
type: string
185+
description: "Explanation of why this result was given."
186+
evidence:
187+
type: array
188+
description: "Relevant excerpts from the transcript."
189+
items:
190+
type: object
191+
properties:
192+
message_text:
193+
type: string
194+
description: "The exact text from the transcript."
195+
timestamp:
196+
type: string
197+
description: "The timestamp or position in the conversation."
198+
question_1_7:
199+
type: object
200+
description: "1.7 Was the After Call Work necessary and justified for the full duration?"
201+
properties:
202+
result:
203+
type: string
204+
description: "yes if ACW was necessary and justified; no if ACW was unnecessary or excessive; not_applicable if this is an AI agent call (AI agents do not perform ACW)."
205+
enum:
206+
- "yes"
207+
- "no"
208+
- "not_applicable"
209+
reasoning:
210+
type: string
211+
description: "Explanation of why this result was given."
212+
evidence:
213+
type: array
214+
description: "Relevant excerpts from the transcript."
215+
items:
216+
type: object
217+
properties:
218+
message_text:
219+
type: string
220+
description: "The exact text from the transcript."
221+
timestamp:
222+
type: string
223+
description: "The timestamp or position in the conversation."
224+
auto_fail:
225+
type: boolean
226+
description: "Set to true if ANY auto-fail question (1.3, 1.4, 1.5) received a 'no' result. When true, the ENTIRE call evaluation fails across all categories."
227+
overall_pass:
228+
type: boolean
229+
description: "Set to true only if auto_fail is false. When auto_fail is true, this must be false regardless of other question results."
230+
category_score:
231+
type: string
232+
description: "Fraction of questions that received 'yes' out of total applicable questions, e.g. '5/7' or '4/5'. Exclude not_applicable questions from both numerator and denominator."
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
name: starlight_qa_explaining
2+
type: ai
3+
target: messages
4+
description: |
5+
Evaluates the EXPLAINING quality of a Brent Council Housing Benefits call.
6+
This is 1 of 4 equally-weighted QA categories for the Starlight project.
7+
8+
AUTO-FAIL RULES:
9+
This category has no auto-fail questions. However, if any OTHER category (Engagement,
10+
Right First Time, Signposting) triggers an auto-fail, the entire call evaluation still
11+
fails. The consuming application must check auto_fail across all 4 categories.
12+
13+
MULTILINGUAL TRANSCRIPTS:
14+
The call may be conducted in any language. Evaluate the transcript in whatever language
15+
it occurs in. Do not penalise the agent for using a language other than English if the
16+
caller initiated in that language.
17+
18+
EXPLAINING CONTEXT:
19+
This category assesses whether the agent clearly communicated what has been done, what
20+
will happen next, and any relevant terms, conditions, or timescales. For Housing Benefit
21+
calls this includes explaining processing times, required documentation, appeal rights,
22+
overpayment recovery terms, and any conditions attached to DHP, RSF, or CTS awards.
23+
24+
GLOSSARY OF BRENT COUNCIL TERMS:
25+
RSF - Resident Support Fund | DHP - Discretionary Housing Payment |
26+
CIC/s - Change in Circumstances | CTS - Council Tax Support |
27+
HB - Housing Benefit | UC - Universal Credit | Recons - Reconsideration |
28+
Portal/My Account/CAS - Citizen Access Service (customer self-service portal) |
29+
Non Dep - Non dependants | OP - Overpayments | LHA - Local Housing Allowance |
30+
HSF - Household Support Fund | SB - Switchboard |
31+
Welfare Benefit - PIP, Disability Allowance, ESA, etc. |
32+
T&Cs - Terms and Conditions
33+
model:
34+
provider: openai
35+
model: gpt-4.1
36+
temperature: 0
37+
assistant_ids: []
38+
workflow_ids: []
39+
schema:
40+
type: object
41+
description: "Explaining QA evaluation for Brent Council Housing Benefits calls."
42+
properties:
43+
question_4_1:
44+
type: object
45+
description: "4.1 Clarified details logged, actions taken and timescales for accuracy."
46+
properties:
47+
result:
48+
type: string
49+
description: "yes if the agent clearly explained what details were logged, what actions were taken or will be taken, and provided accurate timescales; no if the agent failed to clarify these to the caller; not_applicable if no actions or timescales were relevant to this call."
50+
enum:
51+
- "yes"
52+
- "no"
53+
- "not_applicable"
54+
reasoning:
55+
type: string
56+
description: "Explanation of why this result was given, referencing specific parts of the conversation."
57+
evidence:
58+
type: array
59+
description: "Relevant excerpts from the transcript supporting the evaluation."
60+
items:
61+
type: object
62+
properties:
63+
message_text:
64+
type: string
65+
description: "The exact text from the transcript."
66+
timestamp:
67+
type: string
68+
description: "The timestamp or position in the conversation where this occurred."
69+
question_4_2:
70+
type: object
71+
description: "4.2 T&Cs explained/indicated."
72+
properties:
73+
result:
74+
type: string
75+
description: "yes if relevant terms and conditions were explained or indicated to the caller (e.g. overpayment recovery terms, DHP conditions, appeal rights, reporting obligations for change in circumstances); no if T&Cs should have been mentioned but were not; not_applicable if no T&Cs were relevant to this call."
76+
enum:
77+
- "yes"
78+
- "no"
79+
- "not_applicable"
80+
reasoning:
81+
type: string
82+
description: "Explanation of why this result was given."
83+
evidence:
84+
type: array
85+
description: "Relevant excerpts from the transcript."
86+
items:
87+
type: object
88+
properties:
89+
message_text:
90+
type: string
91+
description: "The exact text from the transcript."
92+
timestamp:
93+
type: string
94+
description: "The timestamp or position in the conversation."
95+
auto_fail:
96+
type: boolean
97+
description: "Always false for this category as it has no auto-fail questions. The consuming application must still check auto_fail across all 4 QA categories."
98+
overall_pass:
99+
type: boolean
100+
description: "Set to true if the agent performed well on explaining. Since there are no auto-fail questions in this category, this is based purely on the question results."
101+
category_score:
102+
type: string
103+
description: "Fraction of questions that received 'yes' out of total applicable questions, e.g. '2/2' or '1/1'. Exclude not_applicable questions from both numerator and denominator."

0 commit comments

Comments
 (0)