Skip to content

Fix requestSnapshot publish ordering#4282

Open
KyleAMathews wants to merge 3 commits into
mainfrom
bug/requestSnapshot-race
Open

Fix requestSnapshot publish ordering#4282
KyleAMathews wants to merge 3 commits into
mainfrom
bug/requestSnapshot-race

Conversation

@KyleAMathews
Copy link
Copy Markdown
Contributor

Fix requestSnapshot() so callers can safely read subscribed state immediately after awaiting it.

Previously, requestSnapshot() fetched and injected snapshot data into the stream but did not await the async subscriber publication chain. This meant consumers such as TanStack DB on-demand loading could observe stale state immediately after await requestSnapshot().

Root Cause

requestSnapshot() called #onMessages(...) without awaiting it. That method publishes through the stream’s serialized subscriber chain, so async subscribers could still be processing the injected snapshot batch after requestSnapshot() had already resolved.

Adding the await exposed a second edge case: if a subscriber reentrantly called and awaited requestSnapshot(), the nested publish could be appended behind the currently-running subscriber callback, causing a deadlock.

Approach

  • Await #onMessages(...) in requestSnapshot() so the promise resolves only after the injected snapshot batch has been delivered to subscribers.
  • Make #publish() reentrant-safe by delivering nested publishes immediately when already inside subscriber delivery, avoiding self-deadlock.
  • Strengthen tests around async subscriber completion, control boundary messages, immediate Shape visibility, and reentrant requestSnapshot() calls.

Key Invariants

  • await requestSnapshot() means subscriber callbacks for the injected snapshot batch have completed.
  • Snapshot control messages such as snapshot-end and subset-end are delivered before requestSnapshot() resolves.
  • Reentrant requestSnapshot() calls from subscriber callbacks do not deadlock.
  • Non-reentrant message delivery remains serialized through the existing publish chain.

Non-goals

  • This does not change Electric shape fetch semantics or snapshot filtering.
  • This does not add new public APIs.
  • This does not modify the ShapeStream state-machine spec; the desired behavior was already clear, but the async publication step was not awaited.

Trade-offs

The fix keeps the main publish path serialized while special-casing reentrant delivery to avoid deadlock. Throwing on reentrant requestSnapshot() would have avoided the deadlock too, but it would introduce a new API restriction. Supporting it is safer and more ergonomic for subscribers.

Verification

cd packages/typescript-client

pnpm exec vitest --config vitest.unit.config.ts --run test/stream.test.ts -t "requestSnapshot can be awaited reentrantly"
pnpm exec vitest --config vitest.unit.config.ts --run test/stream.test.ts -t "requestSnapshot"
pnpm exec vitest --config vitest.unit.config.ts --run

Results:

requestSnapshot can be awaited reentrantly: 1 passed
requestSnapshot tests: 2 passed
full unit config: 13 files passed, 383 tests passed

Also red/green verified the reentrant regression: the new reentrant test fails with the old chained publish implementation and passes with the reentrant-safe publish fix.

Files changed

  • packages/typescript-client/src/client.ts

    • Await snapshot message publication in requestSnapshot().
    • Make #publish() reentrant-safe.
  • packages/typescript-client/test/stream.test.ts

    • Add async subscriber regression coverage.
    • Assert snapshot boundary messages are delivered before resolution.
    • Add reentrant requestSnapshot() regression coverage.
  • packages/typescript-client/test/client.test.ts

    • Strengthen high-level assertion so immediate reads after await requestSnapshot() are verified.
  • .changeset/fix-requestsnapshot-publish-ordering.md

    • Add patch changeset for @electric-sql/client.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 6, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@electric-sql/react@4282
npm i https://pkg.pr.new/@electric-sql/client@4282
npm i https://pkg.pr.new/@electric-sql/y-electric@4282

commit: d26eb26

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.21%. Comparing base (28d127b) to head (d26eb26).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4282       +/-   ##
===========================================
+ Coverage   58.64%   70.21%   +11.56%     
===========================================
  Files         218       53      -165     
  Lines       22112     7007    -15105     
  Branches     5698     1981     -3717     
===========================================
- Hits        12968     4920     -8048     
+ Misses       9140     2084     -7056     
+ Partials        4        3        -1     
Flag Coverage Δ
packages/agents ?
packages/agents-runtime ?
packages/agents-server 66.05% <ø> (ø)
packages/agents-server-ui ?
packages/electric-ax 38.59% <ø> (ø)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start ?
packages/typescript-client 94.51% <100.00%> (+0.23%) ⬆️
packages/y-electric 56.05% <ø> (ø)
typescript 70.21% <100.00%> (+11.56%) ⬆️
unit-tests 70.21% <100.00%> (+11.56%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@msfstef msfstef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this change breaks sequential processing of SSE messages, and that either way if we want snapshot messages to make it into processing out of band we should make the API explicit about that I think

Comment on lines +612 to +616
// Tracks when subscriber callbacks are actively being delivered from
// #messageChain. requestSnapshot can inject a nested batch from inside a
// subscriber; in that reentrant case #publish uses this as an intentional
// escape hatch to deliver the nested snapshot batch immediately rather than
// queueing it behind the subscriber that is awaiting it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Tracks when subscriber callbacks are actively being delivered from
// #messageChain. requestSnapshot can inject a nested batch from inside a
// subscriber; in that reentrant case #publish uses this as an intentional
// escape hatch to deliver the nested snapshot batch immediately rather than
// queueing it behind the subscriber that is awaiting it.

I think we should try to avoid agent comment-creep - the flag's function is fairly clear with the comment left when used in code further down

Comment on lines +1747 to +1749
if (this.#isPublishing) {
return deliver()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit odd - we have a message chain promise to serialize message processing, which gets bypassed if you publish messages while a batch of messages is being processed.

Effectively, anyone calling this.#publish either ends up on the message chain, or if the message chain is executing it gets published out of order. In fact, since SSE's onmessage doesn't await the call, I would actually expect for batches of messages from the same sse connection to end up being processed/published out of order.

I'd much prefer if we could explicitly specify an API parameter to skip the strict ordering such that the requestSnapshot call in particular avoids it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just seeing this, turns out we have the same concern :)

// behind the currently-running subscriber callback, which would deadlock
// when requestSnapshot awaits publication.
if (this.#isPublishing) {
return deliver()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we call deliver() without awaiting i think we can still get "wrong" execution order.
Concrete scenario with two subscribers A and B:

  1. Batch M1 arrives. #isPublishing flips to true, outer deliver() starts.
  2. Promise.all spins up callbackA(M1) and callbackB(M1) concurrently. Both run up to their first await.
  3. A awaits requestSnapshot(). B is still mid-await processing M1 (e.g. an async write, a fetch, awaiting downstream state).
  4. requestSnapshot() calls #onMessages(snapshot)#publish(snapshot) → reentrant branch fires → deliver() invokes both callbackA(snapshot) and callbackB(snapshot) immediately.
  5. B's callback is now re-entered with the snapshot batch before its M1 invocation has returned.

So the per-subscriber invariant "my callback won't be re-entered while a previous invocation is in flight" is broken for any bystander subscriber whose M1 callback is still awaiting something when the reentrant publish fires. The messageChain-level ordering is preserved (next queued batch still waits), but B can observe M1-start → snapshot-start → snapshot-end → M1-end interleaving in its own callback.

In practice this may be fine — if subscribers are effectively stateless across awaits, the window is invisible. But a subscriber that mutates this.something across an await in its callback could see corruption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants