fix(scratchnode): never silently drop a cold-load send (PR B — queue until live room ready)#445
Conversation
…the live room is ready PR B of the /ask launch-readiness sprint. Root cause (found during PR #443 live verification): in home-v5.html the Convex- routing send override is installed only AFTER the async init — lazy-loading the browser client from esm.sh, then `await joinEvent`. Until that completes, window.sendComposerMessage is still the prototype-only handler: it clears the composer and renders locally but NEVER persists to Convex. A public chat / `/ask` fired in that cold-load window is silently lost — the user believes it sent. This is exactly the first-send failure observed live (it was NOT an Enter-key bug, and NOT the member-row race — the not_joined auto-retry already covers that). Fix (additive, frontend-only): - Install an early queueing shim BEFORE the client-load + join awaits. Public sends are buffered into window._sn_pendingSends (not lost); the composer clears and shows a "Connecting to live room…" hint. Private notes still pass straight to the prototype handler. - The real Convex override drains the queue the moment it installs — replaying each buffered draft through the full sendMessage → askAgent path, in order. - If init fails (client load or join), _sn_failPendingSends restores the most recent un-sent draft to the composer + toasts, instead of dropping it. Verification: all 4 inline <script> blocks pass `node --check` (the live-room block is type=module / strict mode). Behavioral cold-load verification runs post-deploy on the live showcase per .claude/rules/live_dom_verification.md (static proto HTML isn't covered by tsc/build). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🤖 Augment PR SummarySummary: Prevents ScratchNode “cold-load” public sends from being silently dropped before the Convex live-room client finishes initializing. Changes:
🤖 Was this summary useful? React with 👍 or 👎 |
| if (_ci) { | ||
| for (let _i = 0; _i < _queued.length; _i += 1) { | ||
| _ci.value = _queued[_i]; | ||
| window.sendComposerMessage(); |
There was a problem hiding this comment.
public/proto/home-v5.html:5734 — This drains _sn_pendingSends by calling window.sendComposerMessage() in a tight loop, but the real send path is async (sendMessage/askAgent) so these will run concurrently. That can break the stated “in order” replay guarantee and potentially reorder posts under load.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| } catch (e) { | ||
| console.warn('[scratchnode] Convex client load failed, live room unavailable:', e.message); | ||
| showLiveRoomError('Could not load the realtime client. <a href="javascript:location.reload()" style="color:#f1d6c8;text-decoration:underline">Retry</a>'); | ||
| window._sn_failPendingSends && window._sn_failPendingSends(); |
There was a problem hiding this comment.
public/proto/home-v5.html:5472 — After an init failure, _sn_failPendingSends() runs but the queueing shim remains installed as window.sendComposerMessage, so subsequent clicks can keep clearing/queueing drafts with no path to ever drain (and without another restore call). This can reintroduce “silent loss” behavior after an error and/or allow _sn_pendingSends to grow until reload.
Severity: medium
Other Locations
public/proto/home-v5.html:5495
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
|
Demo: walkthrough of the surfaces this PR changed is available as a workflow artifact ( |
… (PR C) (#446) PR C of the /ask launch-readiness sprint. Backend-only, additive (new query, no schema/contract change). Launch ops can't run /ask blind. getAskTelemetry(eventId) is a bounded, read-only aggregate over an event's answers that surfaces the operate-the-launch signals: - mode mix { provider, cache, deterministic, provider_fallback } - PROVIDER FAILURE RATE = provider_fallback / provider ATTEMPTS (cache + deterministic excluded from the denominator — they never reached the provider) - quality pass rate + avg score (from the deterministic answer evaluation) - total estimated cost (cents) and avg provider latency (from the provider_llm trace step) - live-search count Honesty (agentic_reliability): - BOUND: scan capped at ≤1000; `capped` flag surfaced when the window is full. - HONEST_SCORES: every value is computed from real rows; rates are NULL (not a fabricated 0% / 100%) when there's no denominator — the UI must render "—", never invent "100% healthy" from zero data. - No private data: liveEventAnswers are public; the query never touches userNotes. Tests (convex/__tests__/scratchnode.events.test.ts): +3 scenario tests — the full aggregate from a 7-answer mixed-mode room, the HONEST_SCORES empty-room null case, and the BOUND cap/`capped` flag. Follow-up (separate frontend PR, after PR #445 lands): surface this in a host "ask health" line + a degraded badge on provider_fallback answers. Verification: convex codegen 0, tsc 0, vitest 57 passed / 1 skipped, build 0. Co-authored-by: hshum <hshum@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… restore + scenario tests The first /ask sent in the sub-second cold-load window (before the liveEventMembers row commits) can reject not_joined. PR #445 already added the pre-init send queue + idempotent join+resend retry; this closes the remaining gap: - home-v5.html: the final-catch draft restore now guards `if (!input.value)` like its sibling _sn_failPendingSends, so a total-failure restore never clobbers a newer draft typed while the send was in flight; the toast stays honest about whether it actually repopulated. - scratchnode.events.test.ts: two scenario tests pin the recovery contract — a pre-join send rejects not_joined and persists nothing, then join+resend lands exactly one message (not lost, not duplicated); and the idempotent re-join during recovery never forks a second member row. Additive only — no sendMessage/joinEvent contract changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…447) * feat(scratchnode): visible degraded badge on provider-fallback /ask answers Completes the honest-degraded-UX half of the /ask observability work. When the AI provider is unavailable, askAgent answers from public sources only and records agentMode=provider_fallback. renderAnswer already LABELLED that ("AI fallback · deterministic") but rendered it as neutral text, so a reader could mistake a degraded answer for a full AI one. Adds an amber "degraded · sources only" pill (icon + text, never colour alone for a11y; role=status so screen readers announce it; title tooltip explains it may be less complete) in the answer head, shown ONLY for provider_fallback answers. Verified: all 4 inline <script> blocks pass node --check (the live-room block is type=module / strict). Renders only on the rare fallback path; Tier-A live-DOM check post-deploy confirms the code shipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(scratchnode): harden cold-load send race — guard not_joined draft restore + scenario tests The first /ask sent in the sub-second cold-load window (before the liveEventMembers row commits) can reject not_joined. PR #445 already added the pre-init send queue + idempotent join+resend retry; this closes the remaining gap: - home-v5.html: the final-catch draft restore now guards `if (!input.value)` like its sibling _sn_failPendingSends, so a total-failure restore never clobbers a newer draft typed while the send was in flight; the toast stays honest about whether it actually repopulated. - scratchnode.events.test.ts: two scenario tests pin the recovery contract — a pre-join send rejects not_joined and persists nothing, then join+resend lands exactly one message (not lost, not duplicated); and the idempotent re-join during recovery never forks a second member row. Additive only — no sendMessage/joinEvent contract changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(scratchnode): re-trigger CI (Tier B Vercel preview-poll flake on #447) --------- Co-authored-by: hshum <hshum@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR B of the /ask production-grade sprint. Frontend-only, additive.
Root cause (found in PR #443 live verification)
The Convex-routing send override installs only AFTER the async init (lazy-load browser client from esm.sh, then
await joinEvent). Until then,window.sendComposerMessageis the prototype-only handler — it clears the composer and renders locally but NEVER persists. A public chat//askfired in that cold-load window is silently lost. This was the first-send failure observed live (not an Enter-key bug; not the member-row race — the not_joined auto-retry already covers that).Fix
_sn_pendingSends(not lost), composer clears, shows aConnecting to live room…hint. Private notes still pass through._sn_failPendingSendsrestores the draft instead of dropping it.Verification
All 4 inline
<script>blocks passnode --check(live-room block is type=module/strict). Behavioral cold-load verification runs post-deploy on the live showcase per live_dom_verification.🤖 Generated with Claude Code