feat(scratchnode): /ask operability telemetry query (PR C) by HomenShum · Pull Request #446 · HomenShum/nodebench-ai

HomenShum · 2026-06-01T17:20:49Z

PR C of the /ask production-grade sprint. Backend-only, additive (new query, no schema/contract change).

Why

Launch ops can't run /ask blind. getAskTelemetry(eventId) is a bounded, read-only aggregate over an event's answers surfacing the operate-the-launch signals.

What it returns

mode mix { provider, cache, deterministic, provider_fallback }
provider failure rate = provider_fallback / provider ATTEMPTS (cache + deterministic excluded — they never reached the provider)
quality pass rate + avg score, total est. cost (cents), avg provider latency, live-search count

Honesty (agentic_reliability)

BOUND: scan capped ≤1000; capped flag when the window is full
HONEST_SCORES: every value computed from real rows; rates are null (not a fabricated 0%/100%) when there's no denominator — UI renders \—, never invents \100% healthy`n- No private data: liveEventAnswers are public; never touches userNotes

Tests

+3 scenario tests: full mixed-mode aggregate (7 answers), HONEST_SCORES empty-room null case, BOUND cap/capped flag.

Follow-up

Separate frontend PR (now that #445 landed): surface a host ask health line + a degraded badge on provider_fallback answers.

Verification floor

codegen 0 · tsc 0 · vitest 57 passed/1 skipped · build 0

🤖 Generated with Claude Code

… (PR C) PR C of the /ask launch-readiness sprint. Backend-only, additive (new query, no schema/contract change). Launch ops can't run /ask blind. getAskTelemetry(eventId) is a bounded, read-only aggregate over an event's answers that surfaces the operate-the-launch signals: - mode mix { provider, cache, deterministic, provider_fallback } - PROVIDER FAILURE RATE = provider_fallback / provider ATTEMPTS (cache + deterministic excluded from the denominator — they never reached the provider) - quality pass rate + avg score (from the deterministic answer evaluation) - total estimated cost (cents) and avg provider latency (from the provider_llm trace step) - live-search count Honesty (agentic_reliability): - BOUND: scan capped at ≤1000; `capped` flag surfaced when the window is full. - HONEST_SCORES: every value is computed from real rows; rates are NULL (not a fabricated 0% / 100%) when there's no denominator — the UI must render "—", never invent "100% healthy" from zero data. - No private data: liveEventAnswers are public; the query never touches userNotes. Tests (convex/__tests__/scratchnode.events.test.ts): +3 scenario tests — the full aggregate from a 7-answer mixed-mode room, the HONEST_SCORES empty-room null case, and the BOUND cap/`capped` flag. Follow-up (separate frontend PR, after PR #445 lands): surface this in a host "ask health" line + a degraded badge on provider_fallback answers. Verification: convex codegen 0, tsc 0, vitest 57 passed / 1 skipped, build 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-01T17:20:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
nodebench-ai	Ready	Preview, Comment	Jun 1, 2026 5:23pm

augmentcode · 2026-06-01T17:24:17Z

🤖 Augment PR Summary

Summary: Adds a new backend-only Convex query to surface bounded /ask operability telemetry for a live event, enabling launch/host operators to monitor health without scanning unbounded data.

Changes:

Introduced getAskTelemetry(eventId, limit?) query that scans up to a capped window (default 500, max 1000) of liveEventAnswers.
Computes a mode mix (provider, cache, deterministic, provider_fallback) and derives providerAttempts and providerFailureRate.
Aggregates quality metrics (pass rate + average score) from stored evaluation rows.
Aggregates estimated total cost (cents), average provider latency from traces, and counts external/live searches.
Implements “honest scores” semantics by returning null rates when there is no denominator, and returns capped when the scan hits the window size.
Added 3 scenario tests covering mixed-mode aggregation, empty-event null rates, and bounded scanning with a limit cap.

Technical Notes: This is additive/read-only (no schema or contract changes) and relies on the existing liveEventAnswers.by_event_time index plus trace/evaluation fields for metrics.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-06-01T17:24:19Z

+        if (r.evaluation.passed) passCount += 1;
+      }
+      const provStep = (r.trace ?? []).find(
+        (s: any) => s.step === "provider_llm" && s.status === "ok",


convex/events.ts:949: avgProviderLatencyMs only includes trace steps where step === "provider_llm" && status === "ok", but provider_fallback rows record provider_llm with status: "error" (see existing trace emission in this file). This will systematically under-report provider latency during degraded periods (timeouts/errors) even though those attempts are part of the operability signal.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-06-01T17:24:19Z

+    },
+  ): TableRecord {
+    const { score = 100, passed = true, costCents = 0, providerMs = null, liveSearches = 0, createdAt } = opts;
+    const trace = providerMs != null


convex/__tests__/scratchnode.events.test.ts:1083: telAnswer() builds a non-provider trace whenever providerMs == null, so the provider_fallback fixture ends up with no provider_llm step at all. In production, fallbacks include a provider_llm step with status: "error" + durationMs, so this test case isn’t actually exercising telemetry behavior against a realistic trace shape.

Severity: low

Other Locations

convex/__tests__/scratchnode.events.test.ts:1124

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

github-actions · 2026-06-01T17:42:15Z

Demo: walkthrough of the surfaces this PR changed is available as a workflow artifact (pr-demo-446) at https://github.com/HomenShum/nodebench-ai/actions/runs/26771171588

HomenShum enabled auto-merge (squash) June 1, 2026 17:20

Merge branch 'main' into feat/ask-observability

a3328a7

vercel Bot deployed to Preview June 1, 2026 17:21 View deployment

vercel Bot deployed to Preview June 1, 2026 17:23 View deployment

augmentcode Bot reviewed Jun 1, 2026

View reviewed changes

HomenShum mentioned this pull request Jun 1, 2026

feat(scratchnode): degraded badge on provider-fallback /ask answers #447

Merged

HomenShum merged commit ad256d0 into main Jun 1, 2026
16 checks passed

HomenShum deleted the feat/ask-observability branch June 1, 2026 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scratchnode): /ask operability telemetry query (PR C)#446

feat(scratchnode): /ask operability telemetry query (PR C)#446
HomenShum merged 2 commits into
mainfrom
feat/ask-observability

HomenShum commented Jun 1, 2026

Uh oh!

vercel Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

augmentcode Bot commented Jun 1, 2026

Uh oh!

augmentcode Bot left a comment

Uh oh!

augmentcode Bot Jun 1, 2026 •

edited

Loading

Uh oh!

augmentcode Bot Jun 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HomenShum commented Jun 1, 2026

Why

What it returns

Honesty (agentic_reliability)

Tests

Follow-up

Verification floor

Uh oh!

vercel Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode Bot commented Jun 1, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 1, 2026 •

edited

Loading

augmentcode Bot Jun 1, 2026 •

edited

Loading

augmentcode Bot Jun 1, 2026 •

edited

Loading