From 52f32a6681fffdbfe5e21b06e02991e4b5072c43 Mon Sep 17 00:00:00 2001
From: MorganOnCode <87934408+MorganOnCode@users.noreply.github.com>
Date: Fri, 15 May 2026 08:58:40 +0000
Subject: [PATCH] chore: retire deploy.yml in favor of Tailscale-only manual
 deploys

The VPS is reachable only over Tailscale; SSH is closed to the public
internet. Re-enabling auto-deploy via appleboy/ssh-action would require
widening the firewall to GitHub's runner IP ranges -- a strictly worse
security posture for a payment facilitator with a live mainnet seed
phrase on disk.

Changes:
- Delete .github/workflows/deploy.yml (was broken on every merge anyway:
  parse-time failures before #25, missing DEPLOY_* secrets after #25)
- Document the canonical phased manual deploy in docs/operations.md
  (matches the pattern we used for the 2026-05-15 quick-wins deploy)
- Add a "production deploys are manual by design" section to
  docs/deployment.md explaining why and pointing to the runbook
- CI (.github/workflows/ci.yml) stays untouched -- it runs only inside
  the runner with no outbound SSH

If auto-deploy is ever wanted again, the right shape is the Tailscale
GitHub Action, which adds the runner to the tailnet for the deploy
duration without opening any public port. Deferred until there's need.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .github/workflows/deploy.yml | 51 ------------------------------------
 docs/deployment.md           |  8 ++++++
 docs/operations.md           | 44 +++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 51 deletions(-)
 delete mode 100644 .github/workflows/deploy.yml

diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
deleted file mode 100644
index 29c7a88..0000000
--- a/.github/workflows/deploy.yml
+++ /dev/null
@@ -1,51 +0,0 @@
-name: Deploy to Production
-
-on:
-  push:
-    branches: [master]
-  workflow_dispatch:  # Manual trigger
-
-# Only run one deploy at a time
-concurrency:
-  group: production-deploy
-  cancel-in-progress: false
-
-jobs:
-  # Run CI checks first
-  ci:
-    uses: ./.github/workflows/ci.yml
-
-  deploy:
-    name: Deploy to cardano402.com
-    needs: ci
-    runs-on: ubuntu-latest
-    timeout-minutes: 10
-    if: github.ref == 'refs/heads/master'
-
-    steps:
-      - name: Deploy via SSH
-        uses: appleboy/ssh-action@v1.2.2
-        with:
-          host: ${{ secrets.DEPLOY_HOST }}
-          username: ${{ secrets.DEPLOY_USER }}
-          key: ${{ secrets.DEPLOY_SSH_KEY }}
-          port: ${{ secrets.DEPLOY_PORT }}
-          script: |
-            set -euo pipefail
-            cd /opt/cardano402
-            git pull origin master
-            docker compose -f docker-compose.prod.yml up -d --build facilitator
-
-            # Poll /health for up to 60s instead of a fixed sleep.
-            for i in $(seq 1 30); do
-              HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/health || echo 000)
-              if [ "$HTTP_CODE" = "200" ]; then
-                echo "Deploy successful -- health check passed after ${i} polls"
-                exit 0
-              fi
-              sleep 2
-            done
-
-            echo "Deploy failed -- /health did not return 200 within 60s (last code: ${HTTP_CODE})"
-            docker compose -f docker-compose.prod.yml logs facilitator --tail 40
-            exit 1
diff --git a/docs/deployment.md b/docs/deployment.md
index 06276d0..969f5cb 100644
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -155,6 +155,14 @@ Image details:
 - **Size:** ~180 MB
 - **Health check:** Built-in (`wget` to `/health` every 30s)
 
+## Production deploys are manual by design
+
+There is no auto-deploy on merge. The VPS is reachable only over Tailscale and SSH is closed to the public internet — adding a GitHub-Actions deploy key would require widening the firewall to GitHub's runner IP ranges, which is a worse security posture than `bash deploy.sh` from a tailnet-attached laptop. See [`operations.md` § Manual deploy procedure](operations.md#manual-deploy-procedure) for the canonical runbook.
+
+CI (`.github/workflows/ci.yml`) still runs on every push and PR — lint, typecheck, test, build, docker build, security audit. It only runs inside the GitHub-Actions runner and makes no outbound SSH connection.
+
+If auto-deploy ever becomes desirable again, the right approach is the [Tailscale GitHub Action](https://github.com/tailscale/github-action), which attaches the runner to your tailnet for the deploy duration without opening any public port. Deferred until there's actual need.
+
 ## Bare Metal Deployment
 
 If you prefer running without Docker:
diff --git a/docs/operations.md b/docs/operations.md
index 75e719c..9ab907f 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -15,6 +15,50 @@
 4. Start server: `pnpm dev`
 5. Verify: `curl http://localhost:3000/health`
 
+## Manual deploy procedure
+
+Production deploys run manually from a tailnet-attached laptop (the VPS is Tailscale-only, no public SSH). The canonical "phased deploy" pattern used for any change that touches `docker-compose.prod.yml` or `Dockerfile`:
+
+```bash
+# On the VPS, in /opt/cardano402
+git pull origin master
+
+# Phase 1 — preserve current image as a rollback tag
+docker tag cardano402:latest cardano402:rollback-$(date +%Y-%m-%d)
+
+# Phase 2 — build the new image (no production impact)
+docker compose -f docker-compose.prod.yml build --no-cache facilitator
+
+# Phase 3 — smoke-test on a side port (no production impact)
+docker run --rm -d --name cardano402-smoke -p 127.0.0.1:3001:3000 \
+  -v /opt/cardano402/config/config.json:/app/config/config.json:ro \
+  --network cardano402_default \
+  -e NODE_ENV=production -e MAINNET=true \
+  cardano402:latest
+sleep 8 && curl -s http://127.0.0.1:3001/health && docker stop cardano402-smoke
+
+# Phase 4 — swap (~30s downtime, watch for healthy)
+docker compose -f docker-compose.prod.yml up -d facilitator
+for i in $(seq 1 30); do
+  [ "$(docker inspect cardano402 --format '{{.State.Health.Status}}')" = "healthy" ] && break
+  sleep 2
+done
+
+# Phase 5 — verify
+curl -s http://localhost:3000/health
+docker inspect cardano402 --format 'mem_limit: {{.HostConfig.Memory}}  restartCount: {{.RestartCount}}'
+docker logs --since 5m cardano402 2>&1 | grep -iE '"level":(50|40)' | head -5
+```
+
+**Rollback** if Phase 4 or 5 reveals a problem:
+
+```bash
+docker tag cardano402:rollback-<date> cardano402:latest
+docker compose -f docker-compose.prod.yml up -d facilitator
+```
+
+For routine deploys (no Dockerfile or compose change), `bash deploy.sh` runs the same pull + build + restart sequence in one shot — it skips the phased smoke-test gate, so use the phased procedure above whenever the change could affect container behavior.
+
 ## Production Deployment (Docker)
 
 ### 1. Create production config