Skip to content

Conversation

@KyleAMathews
Copy link
Contributor

@KyleAMathews KyleAMathews commented Feb 7, 2026

Adds system wake detection to ShapeStream for non-browser environments (Bun, Node.js, etc). Without this, when a daemon process wakes from OS sleep, in-flight long-poll or SSE requests hang until the OS TCP timeout (60-120+ seconds depending on platform), causing a gap in data delivery.

Approach

Uses timer gap detection: a setInterval ticks every 10 seconds. If the elapsed wall-clock time since the last tick exceeds 25 seconds (10s interval + 15s threshold), the system was likely asleep. On detection, the stale request is aborted with a SYSTEM_WAKE reason and a fresh non-live request is issued to catch up before resuming live mode.

This is the non-browser counterpart to visibilitychange-based pause/resume. The two mechanisms are mutually exclusive — #hasBrowserVisibilityAPI() gates which one activates.

Key invariants:

  • Wake detection only fires when #state === 'active' and an abort controller exists
  • #isRefreshing is set before abort so the reconnect issues a non-live request first (same as forceDisconnectAndRefresh)
  • timer.unref() prevents the interval from keeping the Node.js/Bun process alive
  • Cleanup runs via unsubscribeAll() alongside visibility change cleanup

Non-goals:

  • No logging on wake detection — this is normal operational behavior, not an error
  • No handling of wake during paused state — if the stream is paused, there's no stale request to abort

Verification

cd packages/typescript-client
pnpm vitest run --config vitest.unit.config.ts test/wake-detection.test.ts

Tests run in @vitest-environment node to ensure document is genuinely absent (rather than relying on delete globalThis.document under jsdom).

Files changed

  • src/client.ts — Add #subscribeToWakeDetection(), #hasBrowserVisibilityAPI(), update catch block with isRestartAbort for SYSTEM_WAKE
  • src/constants.ts — Add SYSTEM_WAKE constant
  • test/wake-detection.test.ts — New test file with node environment: timer setup, browser exclusion, and wake gap detection
  • vitest.unit.config.ts — Include wake detection tests in unit config

https://claude.ai/code/session_01VhBX3nM9TfKSngu9u4C1zB

When running the Electric client in a Bun daemon (or any non-browser
environment), in-flight HTTP long-poll requests hang until OS TCP
timeout (~75s on macOS) after a machine sleeps and wakes. The existing
visibility-based pause/resume mechanism requires `document` which
doesn't exist in Bun/Node.js.

This adds timer gap detection: a setInterval runs every 10s, and if
the elapsed wall-clock time between ticks exceeds 25s (10s interval +
15s threshold), the system likely slept. On detection, the stale
hanging fetch is aborted with a SYSTEM_WAKE reason and the fetch loop
immediately restarts with a fresh connection, reducing reconnect time
from 30-90s to near-instant.

Key changes:
- Add SYSTEM_WAKE constant alongside FORCE_DISCONNECT_AND_REFRESH
- Add #subscribeToWakeDetection() using timer gap detection (skipped
  in browser environments where visibilitychange handles this)
- Handle SYSTEM_WAKE abort in #requestShape() to restart the loop
- Timer is unref'd so it doesn't prevent process exit
- Cleanup on unsubscribeAll()

https://claude.ai/code/session_01VhBX3nM9TfKSngu9u4C1zB
@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 7, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@electric-sql/react@3814
npm i https://pkg.pr.new/@electric-sql/client@3814
npm i https://pkg.pr.new/@electric-sql/y-electric@3814

commit: e35c5f2

@codecov
Copy link

codecov bot commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 87.72%. Comparing base (586bd4c) to head (e35c5f2).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/typescript-client/src/client.ts 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3814      +/-   ##
==========================================
+ Coverage   87.36%   87.72%   +0.36%     
==========================================
  Files          23       23              
  Lines        2050     2078      +28     
  Branches      543      548       +5     
==========================================
+ Hits         1791     1823      +32     
+ Misses        257      253       -4     
  Partials        2        2              
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 93.71% <96.77%> (+0.42%) ⬆️
packages/y-electric 56.05% <ø> (ø)
typescript 87.72% <96.77%> (+0.36%) ⬆️
unit-tests 87.72% <96.77%> (+0.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

KyleAMathews and others added 3 commits February 8, 2026 22:11
…nect on wake

- Move wake detection tests to dedicated file with @vitest-environment node
  to fix unreliable globalThis.document deletion under jsdom
- Add wake-detection.test.ts to unit test config (no Electric server needed)
- Set #isRefreshing before wake abort to ensure non-live request on reconnect
- Assert abort reason is specifically SYSTEM_WAKE
- Broaden JSDoc to cover SSE and document browser exclusion
- Extract #hasBrowserVisibilityAPI() and simplify abort reason check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clean up wake detection interval when #start() exits via error or
normal completion, not just via unsubscribeAll(). Also move
vi.useRealTimers() to afterEach for failure-safe test cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@kevin-dp kevin-dp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. My only concern is that the threshold of 15 seconds is quite high and could it potentially miss sleeps and hence still have the issue where the HTTP request is stuck?

For instance, imagine that the OS put the process to sleep for 10 seconds and that because of that the request broke. Now, when the process is resumed, the drift isn't big enough and so we ignore it. Now the request is stuck and we still wait until the request times out. Could we be more aggressive about detecting these sleeps? Worst case if we are too aggressive we abort the request and create a new one, that's totally fine.


EDIT: i asked Claude about this. It seems to agree on being more aggressive:

The concern is valid — the threshold is too generous and short sleeps that break HTTP connections will go undetected.

Why the threshold matters

The detection fires when elapsed > INTERVAL + THRESHOLD. A lower threshold catches shorter sleeps. Normal timer jitter (GC pauses, system load) is on the order of milliseconds to ~1-2 seconds, so a 5s threshold still has massive safety margin over false positives.

Why the interval also matters

The interval determines how much sleep can hide within a normal tick gap. If the system sleeps and wakes before the next tick is due, the sleep is completely invisible — elapsed just equals the normal interval. So the worst-case minimum detectable sleep is INTERVAL + THRESHOLD, not just THRESHOLD:

Interval Threshold Worst-case detectable sleep
10s 15s 25s (current)
10s 5s 15s
2s 5s 7s
1s 5s 6s

Cost of being aggressive

A shorter interval (1-2s) with a lower threshold (4-5s) has negligible overhead — it's a Date.now() call and a subtraction per tick. The timer is unref()'d so it won't keep the process alive. Even with many ShapeStream instances the cost is effectively zero.

The worst case of a false positive is aborting one request and issuing a fresh one — the same thing forceDisconnectAndRefresh already does.

Suggestion

Reduce to something like INTERVAL_MS = 2_000 and WAKE_THRESHOLD_MS = 4_000 (worst case: 6s) for significantly better coverage with no meaningful cost.

Lower interval from 10s to 2s and threshold from 15s to 4s,
reducing minimum detectable sleep from 25s to 6s. The cost of
a false positive (one extra request) is negligible compared to
a missed sleep leaving a broken connection hanging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@KyleAMathews
Copy link
Contributor Author

Good call — changed and now merging.

@KyleAMathews KyleAMathews merged commit 091a232 into main Feb 9, 2026
41 checks passed
@KyleAMathews KyleAMathews deleted the claude/bun-daemon-wake-up-sruDu branch February 9, 2026 15:39
@K-Mistele
Copy link

awesome, thanks for this, guys! is there a new client version with this fix yet?

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

This PR has been released! 🚀

The following packages include changes from this PR:

  • @electric-sql/client@1.5.2

Thanks for contributing to Electric!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants