You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You have signed the CLA already but the status is still pending? Let us recheck it.
deploy.exs drains the old machines before it has verified the replacements are healthy on the new image. The script only waits for the machine count before the update (deploy.exs:44-47, deploy.exs:107-115), then updates the replacement machines (deploy.exs:51-56) and immediately runs prepare_for_deploy on the old machines (deploy.exs:58-61). If the new image boots slowly or fails /health, this can mark the old healthy machines unhealthy and later stop them, which is exactly the livestream interruption this bounty is trying to avoid. Consider waiting for each replacement machine to pass the Fly /health check after machine update before preparing/stopping the old machines.
Thanks, that is a good catch. I pushed 8c02889c to add a replacement-machine health gate before any old machines are marked draining.
The deploy helper now, after updating the replacement machines to the new image, polls fly machine status <id> --json for each replacement and requires the machine to be started with passing Fly checks before it runs prepare_for_deploy on any old machine. If a replacement does not become healthy within the timeout, the script raises and leaves the old healthy machines alone.
I also documented that ordering in the README. I still cannot run a live Fly demo from this environment because it does not have fly, Elixir/Mix, or app credentials installed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
/claim #78
Bounty issue: algora-io/tv#78. This targets
algora-io/algoraper the maintainer note that the graceful deploy work is still needed in the new app.Summary
Algora.DeploymentHealthand makes/healthreturn503while a node is draining so Fly routes around old machines.Algora.Release.prepare_for_deploy/0for release RPCs to mark a node unhealthy and pause local Oban queues.deploy.exsto warm replacement Fly machines with a supplied image, drain old machines, stop/destroy them, and restore the original process count.Verification
git diff --cached --checkbefore commit.AlgoraWeb.HealthControllerTestcovering healthy and draining responses.mix testor a live Fly demo. This Windows environment does not have Elixir/Mix, Docker, flyctl, or Fly app credentials installed.Demo plan for a Fly app with access
fly deploy --build-only --pushand copy the producedregistry.fly.io/algora:deployment-xxxximage ref.mix run deploy.exs registry.fly.io/algora:deployment-xxxx.503from/health, pause local Oban queues, receiveSIGTERM, and are destroyed after shutdown.