Skip to content

fix(postgres): add default StopGracePeriod to prevent WAL corruption#3607

Open
OthmanHaba wants to merge 1 commit intoDokploy:canaryfrom
OthmanHaba:canary
Open

fix(postgres): add default StopGracePeriod to prevent WAL corruption#3607
OthmanHaba wants to merge 1 commit intoDokploy:canaryfrom
OthmanHaba:canary

Conversation

@OthmanHaba
Copy link

What is this PR about?

Please describe in a short paragraph what this PR is about.

Summary

Adds a default 30-second StopGracePeriod for PostgreSQL Swarm services to prevent WAL corruption on redeployment and service stop/restart

Issues related

#3595

Problem

When redeploying or stopping a PostgreSQL service (especially on external servers), Docker Swarm sends SIGTERM and then SIGKILL after the grace period expires. Previously, StopGracePeriod was only applied if the user explicitly configured it — otherwise Docker's default of 10 seconds was used.

10 seconds is often not enough for PostgreSQL to complete its shutdown sequence (flush WAL buffers, write a final checkpoint). If SIGKILL arrives before that finishes, the WAL is left in an inconsistent state, causing this on the next startup:

PANIC: could not locate a valid checkpoint record
This makes the database unrecoverable without manual intervention (pg_resetwal).

Fix

Changed StopGracePeriod in buildPostgres() from opt-in to always-present, with a default fallback of 30 seconds (30,000,000,000 nanoseconds). If the user has configured a custom value, that value is still respected.

  • ...(StopGracePeriod !== null &&
  • StopGracePeriod !== undefined && { StopGracePeriod }),
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Postgres Template Database corrupted after assigning external port

1 participant