Skip to content

infra: separate VPS provisioning from Vercel crons#414

Open
0xSolace wants to merge 2 commits intodevfrom
fix/vps-provisioning-separation
Open

infra: separate VPS provisioning from Vercel crons#414
0xSolace wants to merge 2 commits intodevfrom
fix/vps-provisioning-separation

Conversation

@0xSolace
Copy link
Copy Markdown
Collaborator

Summary

Clean separation of container lifecycle (VPS) from frontend/billing (Vercel).

Problem

Both Vercel and the Milady VPS were running provisioning crons, causing:

  • Env var drift between environments
  • Double-execution of provisioning jobs
  • Confusion about which system owns container state

Changes

vercel.json

Removed 3 container-lifecycle crons (VPS now owns exclusively):

  • /api/v1/cron/process-provisioning-jobs
  • /api/v1/cron/health-check
  • /api/v1/cron/deployment-monitor

All billing, cleanup, and non-lifecycle crons retained on Vercel.

.github/workflows/deploy-backend.yml

Added sudo systemctl restart milady-provisioning-worker after eliza-cloud restart, so the worker always picks up new code on deploy.

INFRASTRUCTURE.md (new)

Documents the full architecture:

  • Vercel: frontend, billing crons, auth
  • VPS: container provisioning, health checks, docker node SSH
  • Docker nodes: milady-core-1 through core-6
  • Neon DB + Redis: shared, env-synced

VPS env sync (out of band)

Updated /opt/eliza-cloud/.env.local directly:

  • MILADY_DOCKER_IMAGEv2.0.0-steward-8 (was steward-7)
  • ELIZA_CLOUD_AGENT_BASE_DOMAINmilady.ai (was waifu.fun)
  • ✅ Added MILADY_SANDBOX_PROVIDER=docker
  • ✅ Added MILADY_BRIDGE_INTERNAL_PORT=2138
  • ✅ Added STEWARD_CONTAINER_URL=http://172.18.0.1:3200
  • ✅ Added REDIS_URL / KV_URL (Upstash)
  • ✅ Added MILADY_SSH_KEY (base64 encoded, replaces path-based var)

⚠️ GitHub Secrets Gap

The deploy workflow requires secrets not yet configured on this repo:

  • MILADY_VPS_HOST
  • MILADY_VPS_SSH_KEY
  • NEON_DATABASE_URL
  • DISCORD_WEBHOOK

These need to be added in Settings → Secrets → Actions before the deploy workflow will succeed.

- vercel.json: remove container-lifecycle crons (process-provisioning-jobs,
  health-check, deployment-monitor) — these are now exclusively owned by
  milady-provisioning-worker on the VPS
- deploy-backend.yml: add restart of milady-provisioning-worker after
  eliza-cloud restart so worker picks up new code on each deploy
- INFRASTRUCTURE.md: document Vercel vs VPS ownership, docker nodes,
  Neon DB, Redis, and missing GitHub Actions secrets

VPS .env.local updated separately (out of band):
- MILADY_DOCKER_IMAGE bumped to v2.0.0-steward-8
- ELIZA_CLOUD_AGENT_BASE_DOMAIN changed from waifu.fun to milady.ai
- Added: MILADY_SANDBOX_PROVIDER, MILADY_BRIDGE_INTERNAL_PORT,
  STEWARD_CONTAINER_URL, REDIS_URL, KV_URL, MILADY_SSH_KEY (base64)
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eliza-cloud-v2 Ready Ready Preview, Comment Mar 26, 2026 2:31pm

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 50d13e0d-72cb-472f-a141-2cf2de157825

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/vps-provisioning-separation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

Code Review

Overall this is a clean, well-motivated change. The separation of VPS lifecycle from Vercel crons is the right architectural call. A few issues worth addressing:


Bug / Reliability

systemctl restart milady-provisioning-worker will fail on first deploy if the systemd unit does not yet exist on the VPS, which will abort the deploy step after eliza-cloud has already been restarted.

Suggested guard in deploy-backend.yml line 181:

sudo systemctl restart milady-provisioning-worker || echo "milady-provisioning-worker not found, skipping"

Or use systemctl is-enabled to fail loudly only when expected. Right now this is a latent failure for any fresh VPS setup.


Dead code — confusing constant

DEFAULT_BRIDGE_PORT (31337) is still injected into containers as BRIDGE_PORT env var (docker-sandbox-provider.ts line 432), but the comment on line 513 says "nothing ever listened there". If that is true, passing it into every container is misleading. Either remove BRIDGE_PORT: DEFAULT_BRIDGE_PORT from baseEnv, or add a comment explaining why it is kept (e.g. legacy compat for older steward images).


No health check for the provisioning worker

The Health Check step only polls eliza-cloud on ports 3000/3334. If milady-provisioning-worker crashes on start the deploy reports success while container provisioning is silently broken. Suggest adding:

sudo systemctl is-active milady-provisioning-worker || { echo "provisioning worker failed to start"; exit 1; }

Security / Info disclosure

INFRASTRUCTURE.md hardcodes the VPS public IP (89.167.63.246) and the Upstash Redis hostname. This is a public repo. Neither is a credential, but exposing the IP and Redis endpoint increases attack surface unnecessarily. Consider a hostname alias for the IP and omitting the Redis endpoint from the doc.


Minor

  • Both bridgePort and webUiPort now map to the same container port (MILADY_PORT=2138). This is intentional and well-commented, but worth noting in runbook docs that a single listener failure now breaks both bridge and health URLs simultaneously.
  • The four missing GitHub secrets (MILADY_VPS_HOST, MILADY_VPS_SSH_KEY, NEON_DATABASE_URL, DISCORD_WEBHOOK) should be added before merging to main if the deploy workflow is expected to trigger on merge.

The core logic changes (vercel.json cron removal, port mapping fix, MILADY_API_BIND: "0.0.0.0") are correct. The provisioning worker restart guard is the only must-fix before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant