Skip to content

Increase Caddy retry window to reduce deploy 502s#67

Open
retlehs wants to merge 1 commit intomainfrom
fix/zero-downtime-deploy
Open

Increase Caddy retry window to reduce deploy 502s#67
retlehs wants to merge 1 commit intomainfrom
fix/zero-downtime-deploy

Conversation

@retlehs
Copy link
Copy Markdown
Member

@retlehs retlehs commented Mar 25, 2026

Summary

  • Bumps lb_try_duration from 5s → 10s and lb_try_interval from 250ms → 100ms
  • Gives Caddy more time to retry the backend during restarts, buffering requests instead of returning 502s

Test plan

  • Deploy and monitor for 502s during the restart window
Further improvements if 502s persist

Socket activation (zero-downtime)

Add a wppackages.socket unit so the kernel holds the listening socket open across restarts. Requires a Go code change to accept the fd via SD_LISTEN_FDS (e.g. go-systemd). This is the most robust option — no requests are dropped because the listen backlog buffers them while the new process starts.

Health-checked restart

Replace the systemctl restart in the deploy playbook with a start + health poll loop, so Ansible doesn't return until the new process is confirmed ready. Catches startup failures earlier but doesn't fully eliminate the gap on its own.

🤖 Generated with Claude Code

The 5s/250ms retry settings weren't always enough to cover the
litestream + Go startup during deploys, causing brief 502s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@retlehs retlehs self-assigned this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant