Add graceful Fly deploy helper by nexiumbiz-debug · Pull Request #250 · algora-io/algora

nexiumbiz-debug · 2026-05-09T00:23:28Z

/claim #78

Bounty issue: algora-io/tv#78. This targets algora-io/algora per the maintainer note that the graceful deploy work is still needed in the new app.

Summary

Adds Algora.DeploymentHealth and makes /health return 503 while a node is draining so Fly routes around old machines.
Adds Algora.Release.prepare_for_deploy/0 for release RPCs to mark a node unhealthy and pause local Oban queues.
Adds deploy.exs to warm replacement Fly machines with a supplied image, drain old machines, stop/destroy them, and restore the original process count.
Extends Fly/Phoenix shutdown windows and documents the deployment flow.

Verification

Ran git diff --cached --check before commit.
Added AlgoraWeb.HealthControllerTest covering healthy and draining responses.
Not run locally: mix test or a live Fly demo. This Windows environment does not have Elixir/Mix, Docker, flyctl, or Fly app credentials installed.

Demo plan for a Fly app with access

Run fly deploy --build-only --push and copy the produced registry.fly.io/algora:deployment-xxxx image ref.
Run mix run deploy.exs registry.fly.io/algora:deployment-xxxx.
Confirm replacement machines pass Fly checks before old machines are marked draining.
Confirm old machines return 503 from /health, pause local Oban queues, receive SIGTERM, and are destroyed after shutdown.

CLAassistant · 2026-05-09T00:23:36Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

nagiexplorer88 · 2026-05-12T15:34:28Z

deploy.exs drains the old machines before it has verified the replacements are healthy on the new image. The script only waits for the machine count before the update (deploy.exs:44-47, deploy.exs:107-115), then updates the replacement machines (deploy.exs:51-56) and immediately runs prepare_for_deploy on the old machines (deploy.exs:58-61). If the new image boots slowly or fails /health, this can mark the old healthy machines unhealthy and later stop them, which is exactly the livestream interruption this bounty is trying to avoid. Consider waiting for each replacement machine to pass the Fly /health check after machine update before preparing/stopping the old machines.

nexiumbiz-debug · 2026-05-12T23:13:13Z

Thanks, that is a good catch. I pushed 8c02889c to add a replacement-machine health gate before any old machines are marked draining.

The deploy helper now, after updating the replacement machines to the new image, polls fly machine status <id> --json for each replacement and requires the machine to be started with passing Fly checks before it runs prepare_for_deploy on any old machine. If a replacement does not become healthy within the timeout, the script raises and leaves the old healthy machines alone.

I also documented that ordering in the README. I still cannot run a live Fly demo from this environment because it does not have fly, Elixir/Mix, or app credentials installed.

Add graceful Fly deploy helper

94b976a

nexiumbiz-debug mentioned this pull request May 9, 2026

Graceful deployments algora-io/tv#78

Closed

zcesur force-pushed the main branch from 41cc64f to 0bab3c7 Compare May 12, 2026 11:12

Wait for replacement machines before draining

8c02889

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graceful Fly deploy helper#250

Add graceful Fly deploy helper#250
nexiumbiz-debug wants to merge 2 commits into
algora-io:mainfrom
nexiumbiz-debug:codex/graceful-fly-deployments

nexiumbiz-debug commented May 9, 2026

Uh oh!

CLAassistant commented May 9, 2026

Uh oh!

nagiexplorer88 commented May 12, 2026

Uh oh!

nexiumbiz-debug commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nexiumbiz-debug commented May 9, 2026

Summary

Verification

Demo plan for a Fly app with access

Uh oh!

CLAassistant commented May 9, 2026

Uh oh!

nagiexplorer88 commented May 12, 2026

Uh oh!

nexiumbiz-debug commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants