fix(typescript-client): add exponential backoff to onError-driven retries#4054
fix(typescript-client): add exponential backoff to onError-driven retries#4054KyleAMathews wants to merge 8 commits intomainfrom
Conversation
…ries
When onError returns {} on persistent 4xx errors (e.g. expired auth
tokens returning 403), the stream retried immediately with zero delay,
creating a tight infinite loop that could hammer both Electric and the
upstream database.
Add exponential backoff with jitter (100ms base, 30s cap) to the
onError retry path. The backoff delay is abort-aware so stream teardown
remains responsive. Includes a console.warn on 2nd+ retry for
debuggability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
commit: |
- Exponential backoff grows delay between retries on persistent 403s - Stream tears down immediately when aborted during backoff delay - Console warning emitted on 2nd+ retry attempt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4054 +/- ##
==========================================
+ Coverage 84.85% 88.77% +3.92%
==========================================
Files 39 25 -14
Lines 2872 2459 -413
Branches 614 616 +2
==========================================
- Hits 2437 2183 -254
+ Misses 433 274 -159
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…hared-field analyzer warning Use a local `retryCount` variable so the field is not read across the async boundary in #start, satisfying the shape-stream-risks analyzer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Use retryCount - 1 in the exponent so the first onError retry has minimal delay (for legitimate auth token refresh), with exponential growth only kicking in on subsequent retries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lyzer Remove the reset from #requestShape so #start is the sole writer, satisfying the shared-instance-field analyzer check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace single-gap jitter-sensitive assertion with early-vs-late sum comparison that is robust against random jitter - Clean up abort listener when backoff timer expires normally to prevent closure accumulation on long-lived streams with many recoverable errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that abort listeners are removed when the backoff timer expires normally, preventing closure accumulation on long-lived streams. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Is it worth slowing down the backoff increases? Seems quite fast as in, a transitory error for a few seconds -> results in an extra few seconds delay. If the behavior right now is zero delay then perhaps a lower gradient backoff curve solves the problem whilst still keeping the default fairly eager to re-connect? I know this can be be overridden with your own onError handler and we don't like options but I feel like this being configurable could be useful DX. Like a small set of keywords for common strategies. |
|
@thruflo the assumption is that generally people have configured their |
Summary
Adds exponential backoff to
onError-driven retries inShapeStreamto prevent tight infinite loops whenonErrorreturns{}on persistent 4xx errors (e.g., expired auth tokens returning 403).Previously, the fetch backoff layer correctly skipped retrying 4xx errors, but when
onErrorreturned{}to signal "retry", the stream restarted immediately with zero delay — creating a tight loop that could hammer both Electric and the upstream database. A user reported this causing ~$200/day in Neon network egress from a development app with zero traffic.Root Cause
The client has two layers of error handling:
createFetchWithBackoff): Retries 5xx/429 with exponential backoff. Throws 4xx immediately.onErrorcallback (#start): Called after fetch backoff gives up. Returns{}to retry,voidto stop.When
onErrorreturned an object,#start()recursively called itself with no delay. The simplest "keep syncing" pattern —onError: () => ({})— became the most dangerous on persistent client errors.Approach
onErrorretry path: 100ms base, 30s cap, same algorithm as existing fast-loop and SSE backoffssetTimeoutlistens for the abort signal sostream.abort()/ component unmount tears down immediately instead of blocking up to 30s#onErrorRetryCountresets when the stream reaches up-to-date, so a successful auth token refresh isn't penalized on the next errorKey Invariants
Non-goals
onErrorretries (the user'sonErrorcontrols whether to give up)onErrorAPI contract (returning{}still means "retry")Verification
cd packages/typescript-client pnpm vitest run test/stream.test.ts test/wake-detection.test.ts test/fetch.test.ts test/expired-shapes-cache.test.tsAll 73 unit tests pass.
Files changed
packages/typescript-client/src/client.ts#start()onError path, reset in#requestShape()success pathpackages/typescript-client/test/wake-detection.test.ts.changeset/add-onerror-retry-backoff.md🤖 Generated with Claude Code