fix(postgres): add default StopGracePeriod to prevent WAL corruption#3607
Open
OthmanHaba wants to merge 1 commit intoDokploy:canaryfrom
Open
fix(postgres): add default StopGracePeriod to prevent WAL corruption#3607OthmanHaba wants to merge 1 commit intoDokploy:canaryfrom
OthmanHaba wants to merge 1 commit intoDokploy:canaryfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is this PR about?
Please describe in a short paragraph what this PR is about.
Summary
Adds a default 30-second StopGracePeriod for PostgreSQL Swarm services to prevent WAL corruption on redeployment and service stop/restart
Issues related
#3595
Problem
When redeploying or stopping a PostgreSQL service (especially on external servers), Docker Swarm sends SIGTERM and then SIGKILL after the grace period expires. Previously, StopGracePeriod was only applied if the user explicitly configured it — otherwise Docker's default of 10 seconds was used.
10 seconds is often not enough for PostgreSQL to complete its shutdown sequence (flush WAL buffers, write a final checkpoint). If SIGKILL arrives before that finishes, the WAL is left in an inconsistent state, causing this on the next startup:
PANIC: could not locate a valid checkpoint record
This makes the database unrecoverable without manual intervention (pg_resetwal).
Fix
Changed StopGracePeriod in buildPostgres() from opt-in to always-present, with a default fallback of 30 seconds (30,000,000,000 nanoseconds). If the user has configured a custom value, that value is still respected.
closes Postgres Template Database corrupted after assigning external port #3595