From baceff5bc5a439d980e76f6e9631a3d4edb4b122 Mon Sep 17 00:00:00 2001 From: MorganOnCode <87934408+MorganOnCode@users.noreply.github.com> Date: Fri, 15 May 2026 09:02:09 +0000 Subject: [PATCH] feat(backup): nightly encrypted off-host backups via restic Closes audit #3 (no backups). Biggest remaining production risk after the resource-limits deploy: a VPS disk failure today loses the mainnet seed phrase from config/config.json with no recovery path. Adds an opinionated, backend-agnostic backup chain: - `scripts/backup.sh` -- nightly snapshot of: - config/config.json (the seed phrase + Blockfrost key) - .env (REDIS_PASSWORD, MAINNET flag) - Redis AOF volume (dedup keys, UTXO reservations) - data/files/ (uploaded payment-gated content) Stages to /tmp, runs `restic backup` to an encrypted remote repo, prunes per retention policy (14 daily / 8 weekly / 12 monthly), verifies a 5% pack-file sample. Lock file prevents overlapping runs. - `scripts/restore.sh` -- restores any snapshot to a target dir (defaults to /tmp). Intentionally NEVER restores over the live tree; recovery is a deliberate operator copy from the restored staging area. - `scripts/cardano402-backup.cron` -- cron template, 03:00 UTC nightly. - `scripts/cardano402-restic.env.example` -- credentials template with inline blocks for Backblaze B2 (default), Cloudflare R2, AWS S3, Hetzner Storage Box, and tailnet SFTP. The env file lives at /etc/cardano402/restic.env (mode 0600, root-owned), outside the repo. - `docs/backup-restore.md` -- operator runbook: one-time setup, restore verification, routine ops, three disaster-recovery scenarios, what's intentionally NOT backed up. - `docs/operations.md` -- new pointer to backup-restore.md. Restic gives us: client-side AES-256 encryption, content-addressed dedup (the growing AOF costs almost nothing per night), one binary, many backend choices via one config. The encryption passphrase MUST be stored offline; the runbook is explicit about that. No production state changes from this PR -- it just adds files. The backup loop activates only after the operator installs restic, fills in the env file, runs `restic init`, and copies the cron template into place. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/backup-restore.md | 211 ++++++++++++++++++++++++++ docs/operations.md | 4 + scripts/backup.sh | 133 ++++++++++++++++ scripts/cardano402-backup.cron | 16 ++ scripts/cardano402-restic.env.example | 52 +++++++ scripts/restore.sh | 55 +++++++ 6 files changed, 471 insertions(+) create mode 100644 docs/backup-restore.md create mode 100755 scripts/backup.sh create mode 100644 scripts/cardano402-backup.cron create mode 100644 scripts/cardano402-restic.env.example create mode 100755 scripts/restore.sh diff --git a/docs/backup-restore.md b/docs/backup-restore.md new file mode 100644 index 0000000..3494a7e --- /dev/null +++ b/docs/backup-restore.md @@ -0,0 +1,211 @@ +# Backup & Restore Runbook + +cardano402 stores three things on the VPS that aren't in git and aren't trivially reconstructible: + +1. **`config/config.json`** — mainnet facilitator seed phrase + Blockfrost project ID. **Most valuable target.** +2. **`.env`** — `REDIS_PASSWORD` + `MAINNET` guardrail flag. +3. **Redis AOF volume (`cardano402_redis_data`)** — payment dedup keys, UTXO reservations. Rebuildable from chain state but accelerates recovery. +4. **`data/files/`** — uploaded payment-gated content (per-tenant; can grow over time). + +This runbook covers nightly encrypted backups, restore verification, and disaster recovery. + +## Architecture + +- **Tool:** [restic](https://restic.net/) — client-side AES-256 encryption, content-addressed dedup, single binary, supports many backends through one config. +- **Backend:** swappable via env file. Default documentation uses Backblaze B2 (cheapest for tiny repos); the example file also shows Cloudflare R2, AWS S3, Hetzner Storage Box, and tailnet SFTP. +- **Schedule:** nightly at 03:00 UTC via `/etc/cron.d/cardano402-backup`. +- **Retention:** 14 daily snapshots, 8 weekly, 12 monthly (~24 snapshots steady-state). +- **Integrity:** every run does a `restic check --read-data-subset=5%`. +- **Verification:** documented restore-to-temp procedure, recommended quarterly. + +## One-time setup + +### 1. Pick and configure a backend + +Sign up for whichever backend you want (see `scripts/cardano402-restic.env.example` for B2 / R2 / S3 / Hetzner / SFTP). For B2 (the recommended default): + +1. Create account at https://www.backblaze.com/sign-up/cloud-storage +2. Create a private bucket named `cardano402-backups` +3. Generate an Application Key scoped to that bucket with read+write access +4. Save the keyID and applicationKey — they're shown only once + +### 2. Generate the encryption passphrase + +This passphrase decrypts the entire repository. Without it, the backups are unreadable, including by you. + +```bash +openssl rand -base64 36 +``` + +**Store this passphrase in at least two durable, offline locations** before the first backup runs: + +- Paper copy in a fireproof safe / safety deposit box +- Password manager you control (1Password, Bitwarden, etc.) +- Optional: a YubiKey-encrypted file +- Optional: secret-sharing across trusted parties (Shamir, with thresholds) + +If you only have the passphrase on this VPS, a VPS failure means losing both the data and the backups of the data. + +### 3. Install restic and the credentials file + +On the VPS (via Tailscale SSH): + +```bash +sudo apt update && sudo apt install -y restic + +sudo mkdir -p /etc/cardano402 +sudo cp /opt/cardano402/scripts/cardano402-restic.env.example /etc/cardano402/restic.env +sudo chown root:root /etc/cardano402/restic.env +sudo chmod 600 /etc/cardano402/restic.env + +sudo nano /etc/cardano402/restic.env # fill in RESTIC_PASSWORD + backend creds +``` + +### 4. Initialize the restic repository + +This creates the encryption keys in the remote bucket. Only run once. + +```bash +sudo -E env $(grep -v '^#' /etc/cardano402/restic.env | xargs) restic init +``` + +(The `sudo -E env ...` dance loads the env file into restic's environment for this one command.) + +### 5. Run an initial backup + +```bash +sudo bash /opt/cardano402/scripts/backup.sh +``` + +Watch the log: + +```bash +sudo tail -f /var/log/cardano402-backup.log +``` + +Expected: completes in seconds for a fresh facilitator (~30KB total). Restic outputs the snapshot ID at the end (`snapshot abc12345 saved`). + +### 6. Verify the restore loop works + +This is the most important post-setup step. **A backup you've never restored is not a backup.** + +```bash +sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/restore-test +``` + +Inspect: + +```bash +sudo ls -la /tmp/restore-test +sudo cat /tmp/restore-test/.../MANIFEST.txt +sudo diff /opt/cardano402/config/config.json /tmp/restore-test/.../sensitive/config.json +# Should be identical. + +sudo rm -rf /tmp/restore-test +``` + +### 7. Enable the cron job + +```bash +sudo cp /opt/cardano402/scripts/cardano402-backup.cron /etc/cron.d/cardano402-backup +sudo chmod 644 /etc/cron.d/cardano402-backup +``` + +Confirm cron will pick it up: + +```bash +sudo grep cardano402 /etc/cron.d/cardano402-backup +sudo systemctl status cron +``` + +The next scheduled run is 03:00 UTC. + +## Routine operations + +### Check recent runs + +```bash +sudo tail -100 /var/log/cardano402-backup.log +``` + +Look for `=== cardano402 backup completed successfully ===` lines. + +### List snapshots + +```bash +sudo bash /opt/cardano402/scripts/restore.sh list +``` + +### Run a backup on demand + +```bash +sudo bash /opt/cardano402/scripts/backup.sh +``` + +### Run a periodic restore test (recommended quarterly) + +```bash +sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/restore-q$(date +%Y%m%d) +# Inspect a few files, confirm MANIFEST.txt looks right, then: +sudo rm -rf /tmp/restore-q* +``` + +Calendar reminder: every 90 days. A backup chain that hasn't been tested in a year tends to silently break. + +## Disaster recovery scenarios + +### Scenario A — `config/config.json` got corrupted/deleted on the live VPS + +```bash +sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/recover +sudo cp /tmp/recover/.../sensitive/config.json /opt/cardano402/config/config.json +sudo chown morganic:morganic /opt/cardano402/config/config.json +sudo chmod 644 /opt/cardano402/config/config.json +docker compose -f /opt/cardano402/docker-compose.prod.yml restart facilitator +curl http://localhost:3000/health +``` + +### Scenario B — VPS disk failure, new VPS available + +1. Provision the new VPS, attach to Tailscale, install Docker. +2. Clone the repo: `sudo git clone https://github.com/MorganOnCode/cardano402 /opt/cardano402` +3. Install restic, copy the **same** `/etc/cardano402/restic.env` (you have an offline copy of the passphrase). +4. Restore: `sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/recover` +5. Put files back: + ```bash + sudo cp /tmp/recover/.../sensitive/config.json /opt/cardano402/config/config.json + sudo cp /tmp/recover/.../sensitive/dotenv /opt/cardano402/.env + sudo mkdir -p /opt/cardano402/data + sudo cp -a /tmp/recover/.../data-files /opt/cardano402/data/files + ``` +6. Restore Redis volume: + ```bash + docker volume create cardano402_redis_data + docker run --rm \ + -v cardano402_redis_data:/dest \ + -v /tmp/recover/.../redis:/source:ro \ + alpine sh -c "cp -a /source/. /dest/" + ``` +7. Start: `cd /opt/cardano402 && docker compose -f docker-compose.prod.yml up -d` +8. Verify: `curl http://localhost:3000/health` returns `healthy`. + +### Scenario C — VPS is gone AND the passphrase is gone + +You're in trouble. The mainnet seed phrase is no longer recoverable from any encrypted backup — only from whatever offline seed-phrase backup you kept at the moment you set up the facilitator (per Cardano custody best practice, the 24-word phrase should already be written down somewhere safe, completely independent of this backup chain). Re-import the seed into a fresh facilitator install and recover funds. Uploaded payment-gated content is lost. + +This is the scenario the offline passphrase storage in step 2 of setup is designed to prevent. + +## Failure modes & alerts + +The backup script writes to syslog and `/var/log/cardano402-backup.log`. To get notified of failures, the simplest option is a Sentry integration: wrap the cron entry in `curl --data-raw "..." https://` before and after. Defer until repeat failures actually happen — `restic check` failures are rare and `restic backup` failures are usually self-explanatory in the log. + +If you want to get fancier: `cronitor.io`, `healthchecks.io`, or a dead-man's-switch via Pingdom. + +## What's NOT backed up (by design) + +- **Docker images** — rebuildable from `docker compose build` against the same git commit +- **`node_modules`** — pnpm install reproduces this from `pnpm-lock.yaml` in git +- **Application logs** — handled by Docker's log rotation (json-file driver, max-size 50m, max-file 5) +- **The git working tree** — already redundantly stored on GitHub + +If the VPS dies and the backups die *and* the GitHub repo dies, that's a three-way disaster the runbook doesn't cover. diff --git a/docs/operations.md b/docs/operations.md index 9ab907f..3ea43da 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -15,6 +15,10 @@ 4. Start server: `pnpm dev` 5. Verify: `curl http://localhost:3000/health` +## Backups + +See [`backup-restore.md`](backup-restore.md) for the encrypted off-host backup runbook (restic, nightly cron, retention policy, restore procedure, disaster recovery scenarios). + ## Manual deploy procedure Production deploys run manually from a tailnet-attached laptop (the VPS is Tailscale-only, no public SSH). The canonical "phased deploy" pattern used for any change that touches `docker-compose.prod.yml` or `Dockerfile`: diff --git a/scripts/backup.sh b/scripts/backup.sh new file mode 100755 index 0000000..7180b5a --- /dev/null +++ b/scripts/backup.sh @@ -0,0 +1,133 @@ +#!/usr/bin/env bash +# cardano402 backup — nightly snapshot of sensitive config, Redis AOF, +# and uploaded payment-gated files. Encrypted off-host via restic. +# +# Usage: bash scripts/backup.sh +# Cron: see scripts/cardano402-backup.cron +# +# Credentials and backend choice live in /etc/cardano402/restic.env +# (mode 0600, root-owned). See scripts/cardano402-restic.env.example. + +set -euo pipefail + +REPO_ROOT="${CARDANO402_REPO_ROOT:-/opt/cardano402}" +ENV_FILE="${CARDANO402_RESTIC_ENV:-/etc/cardano402/restic.env}" +LOG_FILE="${CARDANO402_BACKUP_LOG:-/var/log/cardano402-backup.log}" +LOCK_FILE="${CARDANO402_BACKUP_LOCK:-/var/run/cardano402-backup.lock}" + +log() { + printf '[%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$*" \ + | tee -a "$LOG_FILE" +} + +STAGE_DIR="" +cleanup() { + local rc=$? + if [ -n "$STAGE_DIR" ] && [ -d "$STAGE_DIR" ]; then + rm -rf "$STAGE_DIR" + fi + rm -f "$LOCK_FILE" + if [ "$rc" -ne 0 ]; then + log "=== cardano402 backup FAILED (exit $rc) ===" + fi + exit "$rc" +} +trap cleanup EXIT + +# Prevent overlapping runs. +if [ -f "$LOCK_FILE" ]; then + existing_pid=$(cat "$LOCK_FILE" 2>/dev/null || echo "") + if [ -n "$existing_pid" ] && kill -0 "$existing_pid" 2>/dev/null; then + log "FATAL: backup already running (pid $existing_pid). Aborting." + exit 1 + fi + log "Stale lock at $LOCK_FILE (no live pid), reclaiming" +fi +echo $$ > "$LOCK_FILE" + +log "=== cardano402 backup starting ===" + +if [ ! -r "$ENV_FILE" ]; then + log "FATAL: $ENV_FILE missing or unreadable." + log " Copy scripts/cardano402-restic.env.example to $ENV_FILE, fill it in," + log " chown root:root, chmod 600." + exit 1 +fi +# shellcheck disable=SC1090 +. "$ENV_FILE" +: "${RESTIC_REPOSITORY:?RESTIC_REPOSITORY not set in $ENV_FILE}" +: "${RESTIC_PASSWORD:?RESTIC_PASSWORD not set in $ENV_FILE}" +export RESTIC_REPOSITORY RESTIC_PASSWORD +# Pass-through any backend env vars that the env file may have exported. +[ -n "${B2_ACCOUNT_ID:-}" ] && export B2_ACCOUNT_ID B2_ACCOUNT_KEY +[ -n "${AWS_ACCESS_KEY_ID:-}" ] && export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY + +if ! command -v restic >/dev/null 2>&1; then + log "FATAL: restic not installed. Install with: apt install restic" + exit 1 +fi + +STAGE_DIR=$(mktemp -d /tmp/cardano402-backup-XXXXXX) +chmod 700 "$STAGE_DIR" +log "Staging to $STAGE_DIR" + +# 1. Sensitive config (the most valuable target — seed phrase + Blockfrost key). +mkdir -p "$STAGE_DIR/sensitive" +cp -p "$REPO_ROOT/config/config.json" "$STAGE_DIR/sensitive/config.json" +cp -p "$REPO_ROOT/.env" "$STAGE_DIR/sensitive/dotenv" +log "Staged: sensitive config ($(du -sh "$STAGE_DIR/sensitive" | cut -f1))" + +# 2. Redis AOF volume. AOF is append-only; copying the on-disk state while +# redis is running yields a valid replica that may be slightly behind +# the in-memory state. Restic deduplicates so growing AOFs are cheap. +log "Snapshotting Redis volume" +docker run --rm \ + -v cardano402_redis_data:/source:ro \ + -v "$STAGE_DIR/redis":/dest \ + alpine sh -c "cp -a /source/. /dest/" \ + >> "$LOG_FILE" 2>&1 +log "Staged: redis ($(du -sh "$STAGE_DIR/redis" 2>/dev/null | cut -f1))" + +# 3. Uploaded payment-gated files (the storage backend's filesystem root). +if [ -d "$REPO_ROOT/data/files" ]; then + cp -a "$REPO_ROOT/data/files" "$STAGE_DIR/data-files" + log "Staged: data-files ($(du -sh "$STAGE_DIR/data-files" | cut -f1))" +else + log "Skipped: $REPO_ROOT/data/files does not exist yet" +fi + +# 4. Manifest with metadata for forensic traceability. +container_image=$(docker inspect cardano402 --format '{{.Image}}' 2>/dev/null || echo "container-not-running") +git_sha=$(git -C "$REPO_ROOT" rev-parse HEAD 2>/dev/null || echo "unknown") +cat > "$STAGE_DIR/MANIFEST.txt" <> /var/log/cardano402-backup.log 2>&1 diff --git a/scripts/cardano402-restic.env.example b/scripts/cardano402-restic.env.example new file mode 100644 index 0000000..c3a9241 --- /dev/null +++ b/scripts/cardano402-restic.env.example @@ -0,0 +1,52 @@ +# /etc/cardano402/restic.env — sourced by scripts/backup.sh and scripts/restore.sh +# +# Install: +# sudo mkdir -p /etc/cardano402 +# sudo cp /opt/cardano402/scripts/cardano402-restic.env.example /etc/cardano402/restic.env +# sudo chmod 600 /etc/cardano402/restic.env +# sudo chown root:root /etc/cardano402/restic.env +# sudo nano /etc/cardano402/restic.env # fill in credentials +# +# This file contains the encryption passphrase + backend credentials. +# Lose it -> lose access to the backups. Keep an offline copy. + +# ===================================================================== +# Encryption passphrase. WITHOUT THIS, BACKUPS ARE UNREADABLE. +# Generate with: openssl rand -base64 36 +# Store an offline copy: paper, password manager, YubiKey. +# ===================================================================== +RESTIC_PASSWORD="" + +# ===================================================================== +# Backend selection -- uncomment ONE of the blocks below. +# ===================================================================== + +# --- Backblaze B2 (recommended default; cheapest for small repos) --- +# Sign up: https://www.backblaze.com/sign-up/cloud-storage +# 1. Create a private bucket, e.g. cardano402-backups +# 2. App Keys -> Add a New Application Key +# - Allow access to: that bucket only +# - Type of Access: Read and Write +# - Save the keyID and applicationKey -- they're shown ONCE +RESTIC_REPOSITORY="b2:cardano402-backups:/" +B2_ACCOUNT_ID="" +B2_ACCOUNT_KEY="" + +# --- Cloudflare R2 (no egress fees -- good for frequent restores) --- +# RESTIC_REPOSITORY="s3:https://.r2.cloudflarestorage.com/cardano402-backups" +# AWS_ACCESS_KEY_ID="" +# AWS_SECRET_ACCESS_KEY="" + +# --- AWS S3 (most familiar; pricier on egress) --- +# RESTIC_REPOSITORY="s3:s3.amazonaws.com/cardano402-backups" +# AWS_ACCESS_KEY_ID="" +# AWS_SECRET_ACCESS_KEY="" + +# --- Hetzner Storage Box (cheap flat-rate; SFTP) --- +# 1. Order a Storage Box: https://www.hetzner.com/storage/storage-box +# 2. Add your SSH key in the Storage Box admin +# RESTIC_REPOSITORY="sftp:u123456@u123456.your-storagebox.de:/cardano402" + +# --- SFTP to another machine on your tailnet (zero third-party deps) --- +# Set up: ssh-copy-id backup@nas.tailnet.ts.net (from this VPS) +# RESTIC_REPOSITORY="sftp:backup@nas.tailnet.ts.net:/path/to/backups/cardano402" diff --git a/scripts/restore.sh b/scripts/restore.sh new file mode 100755 index 0000000..3770ccb --- /dev/null +++ b/scripts/restore.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# cardano402 restore — restore a backup snapshot to a target directory. +# +# Usage: +# bash scripts/restore.sh # restore latest to /tmp +# bash scripts/restore.sh latest /tmp/restore +# bash scripts/restore.sh +# bash scripts/restore.sh list # list available snapshots +# +# This script intentionally does NOT restore over the live /opt/cardano402 +# tree. Restores always go to a target dir for inspection. Production +# recovery is a manual operator step that copies specific files back. + +set -euo pipefail + +ACTION="${1:-latest}" +TARGET="${2:-/tmp/cardano402-restore-$(date +%s)}" +ENV_FILE="${CARDANO402_RESTIC_ENV:-/etc/cardano402/restic.env}" + +if [ ! -r "$ENV_FILE" ]; then + echo "FATAL: $ENV_FILE missing or unreadable" >&2 + exit 1 +fi +# shellcheck disable=SC1090 +. "$ENV_FILE" +: "${RESTIC_REPOSITORY:?RESTIC_REPOSITORY not set in $ENV_FILE}" +: "${RESTIC_PASSWORD:?RESTIC_PASSWORD not set in $ENV_FILE}" +export RESTIC_REPOSITORY RESTIC_PASSWORD +[ -n "${B2_ACCOUNT_ID:-}" ] && export B2_ACCOUNT_ID B2_ACCOUNT_KEY +[ -n "${AWS_ACCESS_KEY_ID:-}" ] && export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY + +if ! command -v restic >/dev/null 2>&1; then + echo "FATAL: restic not installed. apt install restic" >&2 + exit 1 +fi + +if [ "$ACTION" = "list" ]; then + restic snapshots --tag automated --compact + exit 0 +fi + +mkdir -p "$TARGET" + +echo "Restoring snapshot '$ACTION' to $TARGET ..." +restic restore "$ACTION" --target "$TARGET" + +echo +echo "Restore complete." +echo "Target: $TARGET" +echo +echo "Inventory (top-level):" +find "$TARGET" -maxdepth 4 -type f | sort | head -40 +echo +echo "MANIFEST.txt contents (if present):" +find "$TARGET" -name MANIFEST.txt -exec cat {} \; 2>/dev/null