feat(scratchnode): /ask operability telemetry query (PR C)#446
Conversation
… (PR C)
PR C of the /ask launch-readiness sprint. Backend-only, additive (new query,
no schema/contract change).
Launch ops can't run /ask blind. getAskTelemetry(eventId) is a bounded, read-only
aggregate over an event's answers that surfaces the operate-the-launch signals:
- mode mix { provider, cache, deterministic, provider_fallback }
- PROVIDER FAILURE RATE = provider_fallback / provider ATTEMPTS (cache +
deterministic excluded from the denominator — they never reached the provider)
- quality pass rate + avg score (from the deterministic answer evaluation)
- total estimated cost (cents) and avg provider latency (from the provider_llm
trace step)
- live-search count
Honesty (agentic_reliability):
- BOUND: scan capped at ≤1000; `capped` flag surfaced when the window is full.
- HONEST_SCORES: every value is computed from real rows; rates are NULL (not a
fabricated 0% / 100%) when there's no denominator — the UI must render "—",
never invent "100% healthy" from zero data.
- No private data: liveEventAnswers are public; the query never touches userNotes.
Tests (convex/__tests__/scratchnode.events.test.ts): +3 scenario tests — the full
aggregate from a 7-answer mixed-mode room, the HONEST_SCORES empty-room null case,
and the BOUND cap/`capped` flag.
Follow-up (separate frontend PR, after PR #445 lands): surface this in a host
"ask health" line + a degraded badge on provider_fallback answers.
Verification: convex codegen 0, tsc 0, vitest 57 passed / 1 skipped, build 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🤖 Augment PR SummarySummary: Adds a new backend-only Convex query to surface bounded /ask operability telemetry for a live event, enabling launch/host operators to monitor health without scanning unbounded data. Changes:
Technical Notes: This is additive/read-only (no schema or contract changes) and relies on the existing 🤖 Was this summary useful? React with 👍 or 👎 |
| if (r.evaluation.passed) passCount += 1; | ||
| } | ||
| const provStep = (r.trace ?? []).find( | ||
| (s: any) => s.step === "provider_llm" && s.status === "ok", |
There was a problem hiding this comment.
convex/events.ts:949: avgProviderLatencyMs only includes trace steps where step === "provider_llm" && status === "ok", but provider_fallback rows record provider_llm with status: "error" (see existing trace emission in this file). This will systematically under-report provider latency during degraded periods (timeouts/errors) even though those attempts are part of the operability signal.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| }, | ||
| ): TableRecord { | ||
| const { score = 100, passed = true, costCents = 0, providerMs = null, liveSearches = 0, createdAt } = opts; | ||
| const trace = providerMs != null |
There was a problem hiding this comment.
convex/__tests__/scratchnode.events.test.ts:1083: telAnswer() builds a non-provider trace whenever providerMs == null, so the provider_fallback fixture ends up with no provider_llm step at all. In production, fallbacks include a provider_llm step with status: "error" + durationMs, so this test case isn’t actually exercising telemetry behavior against a realistic trace shape.
Severity: low
Other Locations
convex/__tests__/scratchnode.events.test.ts:1124
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
|
Demo: walkthrough of the surfaces this PR changed is available as a workflow artifact ( |
PR C of the /ask production-grade sprint. Backend-only, additive (new query, no schema/contract change).
Why
Launch ops can't run /ask blind.
getAskTelemetry(eventId)is a bounded, read-only aggregate over an event's answers surfacing the operate-the-launch signals.What it returns
{ provider, cache, deterministic, provider_fallback }provider_fallback / provider ATTEMPTS(cache + deterministic excluded — they never reached the provider)Honesty (agentic_reliability)
cappedflag when the window is fullnull(not a fabricated 0%/100%) when there's no denominator — UI renders \—, never invents \100% healthy`n- No private data:liveEventAnswersare public; never touchesuserNotesTests
+3 scenario tests: full mixed-mode aggregate (7 answers), HONEST_SCORES empty-room null case, BOUND cap/
cappedflag.Follow-up
Separate frontend PR (now that #445 landed): surface a host
ask healthline + a degraded badge onprovider_fallbackanswers.Verification floor
codegen 0 · tsc 0 · vitest 57 passed/1 skipped · build 0
🤖 Generated with Claude Code