-
Notifications
You must be signed in to change notification settings - Fork 3
feat(scratchnode): /ask operability telemetry query (PR C) #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -903,6 +903,78 @@ export const getAnswers = query({ | |
| }, | ||
| }); | ||
|
|
||
| /** | ||
| * /ask operability telemetry (PR C) — a bounded, read-only aggregate over an | ||
| * event's answers, for launch-ops + host visibility into the /ask pipeline: | ||
| * mode mix, PROVIDER FAILURE RATE (the headline degraded-health signal), | ||
| * quality pass rate, cost, and provider latency. | ||
| * | ||
| * Honesty (agentic_reliability): | ||
| * - BOUND: capped scan (≤1000), `capped` flag surfaced when the window is full. | ||
| * - HONEST_SCORES: every number is computed from real rows; rates are null | ||
| * (not a fake 0/100) when there's no denominator — the UI must show "—", | ||
| * never a fabricated "100% healthy". | ||
| * - No private data: liveEventAnswers are public; never touches userNotes. | ||
| */ | ||
| export const getAskTelemetry = query({ | ||
| args: { eventId: v.id("liveEvents"), limit: v.optional(v.number()) }, | ||
| handler: async (ctx, { eventId, limit }) => { | ||
| const cap = Math.min(Math.max(limit ?? 500, 1), 1000); // BOUND | ||
| const rows = await ctx.db | ||
| .query("liveEventAnswers") | ||
| .withIndex("by_event_time", (q) => q.eq("eventId", eventId)) | ||
| .order("desc") | ||
| .take(cap); | ||
|
|
||
| const modes = { provider: 0, cache: 0, deterministic: 0, provider_fallback: 0 }; | ||
| let costCentsTotal = 0; | ||
| let qualitySum = 0; | ||
| let qualityCount = 0; | ||
| let passCount = 0; | ||
| let providerLatencySum = 0; | ||
| let providerLatencyCount = 0; | ||
| let liveSearchCount = 0; | ||
|
|
||
| for (const r of rows) { | ||
| const mode = (r.agentMode ?? "deterministic") as keyof typeof modes; | ||
| if (mode in modes) modes[mode] += 1; | ||
| costCentsTotal += r.estimatedCostCents ?? 0; | ||
| liveSearchCount += r.externalSearches ?? 0; | ||
| if (r.evaluation) { | ||
| qualitySum += r.evaluation.score ?? 0; | ||
| qualityCount += 1; | ||
| if (r.evaluation.passed) passCount += 1; | ||
| } | ||
| const provStep = (r.trace ?? []).find( | ||
| (s: any) => s.step === "provider_llm" && s.status === "ok", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Severity: medium 🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage. |
||
| ); | ||
| if (provStep) { | ||
| providerLatencySum += provStep.durationMs ?? 0; | ||
| providerLatencyCount += 1; | ||
| } | ||
| } | ||
|
|
||
| // Provider failure rate = fallbacks / (real provider ATTEMPTS). A provider | ||
| // attempt is a success (mode=provider) OR a fallback (mode=provider_fallback); | ||
| // cache/deterministic never reached the provider, so they're excluded from | ||
| // the denominator. Null when no attempts — no fabricated "0% failures". | ||
| const providerAttempts = modes.provider + modes.provider_fallback; | ||
| const round = (x: number, p: number) => Math.round(x * 10 ** p) / 10 ** p; | ||
| return { | ||
| total: rows.length, | ||
| capped: rows.length >= cap, | ||
| modes, | ||
| providerAttempts, | ||
| providerFailureRate: providerAttempts > 0 ? round(modes.provider_fallback / providerAttempts, 3) : null, | ||
| qualityPassRate: qualityCount > 0 ? round(passCount / qualityCount, 3) : null, | ||
| avgQualityScore: qualityCount > 0 ? Math.round(qualitySum / qualityCount) : null, | ||
| totalCostCents: round(costCentsTotal, 4), | ||
| avgProviderLatencyMs: providerLatencyCount > 0 ? Math.round(providerLatencySum / providerLatencyCount) : null, | ||
| liveSearchCount, | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| export const getHostStatus = query({ | ||
| args: { | ||
| eventId: v.id("liveEvents"), | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convex/__tests__/scratchnode.events.test.ts:1083:telAnswer()builds a non-provider trace wheneverproviderMs == null, so theprovider_fallbackfixture ends up with noprovider_llmstep at all. In production, fallbacks include aprovider_llmstep withstatus: "error"+durationMs, so this test case isn’t actually exercising telemetry behavior against a realistic trace shape.Severity: low
Other Locations
convex/__tests__/scratchnode.events.test.ts:1124🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.