Conversation
Recovering a header-only WAL (a deploy that opened the WAL but committed nothing, e.g. a crash between UpgradeToWrite and Finalize) advanced the in-memory serial without persisting it. After a second such failure the next WAL header was written two serials ahead of the committed state, and every later command failed WAL recovery until the WAL was deleted by hand. Only advance the serial when the WAL carried entries, i.e. when the merged state is actually written back. Co-authored-by: Isaac
This was referenced Jun 15, 2026
Name it after the WAL condition it recovers, matching sibling tests (empty-wal, stale-wal, future-serial-wal). Use a generic test-bundle name. Co-authored-by: Isaac
Collaborator
Integration test reportCommit: ec65a6b
25 interesting tests: 15 SKIP, 7 KNOWN, 3 flaky
Top 30 slowest tests (at least 2 minutes):
|
andrewnester
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
The direct engine's local state is
resources.json(committed) plus aresources.json.walwrite-ahead log whose serial must be exactly one ahead ofthe committed serial. When a deploy opened the WAL but committed nothing — a
header-only WAL, e.g. a crash between
UpgradeToWriteandFinalize— recoveryadvanced the in-memory serial without persisting it. After a second such failure
the next WAL header was written two serials ahead of the committed state, and
every later
bundlecommand then failed WAL recovery (WAL serial (N) is ahead of expected) until the WAL was deleted by hand.Fix: only advance the serial when the WAL carried entries, i.e. when the merged
state is actually written back, so a no-op WAL is discarded without drift.
Why
Fixes #5557. Reported by a customer: any error that took more than one deploy
attempt to clear wedged the bundle on the second failure.
Tests
bundle/deploy/wal/two-crashed-deploys: two deploys killed mid-apply recoverwithout wedging.
TestHeaderOnlyWALRecoveryDoesNotAdvanceSerial. Both confirmed to failwhen the fix is reverted.
This pull request and its description were written by Isaac.