-
Notifications
You must be signed in to change notification settings - Fork 307
Add system wake detection for non-browser environments #3814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When running the Electric client in a Bun daemon (or any non-browser environment), in-flight HTTP long-poll requests hang until OS TCP timeout (~75s on macOS) after a machine sleeps and wakes. The existing visibility-based pause/resume mechanism requires `document` which doesn't exist in Bun/Node.js. This adds timer gap detection: a setInterval runs every 10s, and if the elapsed wall-clock time between ticks exceeds 25s (10s interval + 15s threshold), the system likely slept. On detection, the stale hanging fetch is aborted with a SYSTEM_WAKE reason and the fetch loop immediately restarts with a fresh connection, reducing reconnect time from 30-90s to near-instant. Key changes: - Add SYSTEM_WAKE constant alongside FORCE_DISCONNECT_AND_REFRESH - Add #subscribeToWakeDetection() using timer gap detection (skipped in browser environments where visibilitychange handles this) - Handle SYSTEM_WAKE abort in #requestShape() to restart the loop - Timer is unref'd so it doesn't prevent process exit - Cleanup on unsubscribeAll() https://claude.ai/code/session_01VhBX3nM9TfKSngu9u4C1zB
commit: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3814 +/- ##
==========================================
+ Coverage 87.36% 87.72% +0.36%
==========================================
Files 23 23
Lines 2050 2078 +28
Branches 543 548 +5
==========================================
+ Hits 1791 1823 +32
+ Misses 257 253 -4
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…nect on wake - Move wake detection tests to dedicated file with @vitest-environment node to fix unreliable globalThis.document deletion under jsdom - Add wake-detection.test.ts to unit test config (no Electric server needed) - Set #isRefreshing before wake abort to ensure non-live request on reconnect - Assert abort reason is specifically SYSTEM_WAKE - Broaden JSDoc to cover SSE and document browser exclusion - Extract #hasBrowserVisibilityAPI() and simplify abort reason check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clean up wake detection interval when #start() exits via error or normal completion, not just via unsubscribeAll(). Also move vi.useRealTimers() to afterEach for failure-safe test cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me. My only concern is that the threshold of 15 seconds is quite high and could it potentially miss sleeps and hence still have the issue where the HTTP request is stuck?
For instance, imagine that the OS put the process to sleep for 10 seconds and that because of that the request broke. Now, when the process is resumed, the drift isn't big enough and so we ignore it. Now the request is stuck and we still wait until the request times out. Could we be more aggressive about detecting these sleeps? Worst case if we are too aggressive we abort the request and create a new one, that's totally fine.
EDIT: i asked Claude about this. It seems to agree on being more aggressive:
The concern is valid — the threshold is too generous and short sleeps that break HTTP connections will go undetected.
Why the threshold matters
The detection fires when elapsed > INTERVAL + THRESHOLD. A lower threshold catches shorter sleeps. Normal timer jitter (GC pauses, system load) is on the order of milliseconds to ~1-2 seconds, so a 5s threshold still has massive safety margin over false positives.
Why the interval also matters
The interval determines how much sleep can hide within a normal tick gap. If the system sleeps and wakes before the next tick is due, the sleep is completely invisible — elapsed just equals the normal interval. So the worst-case minimum detectable sleep is INTERVAL + THRESHOLD, not just THRESHOLD:
| Interval | Threshold | Worst-case detectable sleep |
|---|---|---|
| 10s | 15s | 25s (current) |
| 10s | 5s | 15s |
| 2s | 5s | 7s |
| 1s | 5s | 6s |
Cost of being aggressive
A shorter interval (1-2s) with a lower threshold (4-5s) has negligible overhead — it's a Date.now() call and a subtraction per tick. The timer is unref()'d so it won't keep the process alive. Even with many ShapeStream instances the cost is effectively zero.
The worst case of a false positive is aborting one request and issuing a fresh one — the same thing forceDisconnectAndRefresh already does.
Suggestion
Reduce to something like INTERVAL_MS = 2_000 and WAKE_THRESHOLD_MS = 4_000 (worst case: 6s) for significantly better coverage with no meaningful cost.
Lower interval from 10s to 2s and threshold from 15s to 4s, reducing minimum detectable sleep from 25s to 6s. The cost of a false positive (one extra request) is negligible compared to a missed sleep leaving a broken connection hanging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Good call — changed and now merging. |
|
awesome, thanks for this, guys! is there a new client version with this fix yet? |
|
This PR has been released! 🚀 The following packages include changes from this PR:
Thanks for contributing to Electric! |
Adds system wake detection to
ShapeStreamfor non-browser environments (Bun, Node.js, etc). Without this, when a daemon process wakes from OS sleep, in-flight long-poll or SSE requests hang until the OS TCP timeout (60-120+ seconds depending on platform), causing a gap in data delivery.Approach
Uses timer gap detection: a
setIntervalticks every 10 seconds. If the elapsed wall-clock time since the last tick exceeds 25 seconds (10s interval + 15s threshold), the system was likely asleep. On detection, the stale request is aborted with aSYSTEM_WAKEreason and a fresh non-live request is issued to catch up before resuming live mode.This is the non-browser counterpart to
visibilitychange-based pause/resume. The two mechanisms are mutually exclusive —#hasBrowserVisibilityAPI()gates which one activates.Key invariants:
#state === 'active'and an abort controller exists#isRefreshingis set before abort so the reconnect issues a non-live request first (same asforceDisconnectAndRefresh)timer.unref()prevents the interval from keeping the Node.js/Bun process aliveunsubscribeAll()alongside visibility change cleanupNon-goals:
Verification
cd packages/typescript-client pnpm vitest run --config vitest.unit.config.ts test/wake-detection.test.tsTests run in
@vitest-environment nodeto ensuredocumentis genuinely absent (rather than relying ondelete globalThis.documentunder jsdom).Files changed
src/client.ts— Add#subscribeToWakeDetection(),#hasBrowserVisibilityAPI(), update catch block withisRestartAbortforSYSTEM_WAKEsrc/constants.ts— AddSYSTEM_WAKEconstanttest/wake-detection.test.ts— New test file with node environment: timer setup, browser exclusion, and wake gap detectionvitest.unit.config.ts— Include wake detection tests in unit confighttps://claude.ai/code/session_01VhBX3nM9TfKSngu9u4C1zB