Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 211 additions & 0 deletions docs/backup-restore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# Backup & Restore Runbook

cardano402 stores three things on the VPS that aren't in git and aren't trivially reconstructible:

1. **`config/config.json`** — mainnet facilitator seed phrase + Blockfrost project ID. **Most valuable target.**
2. **`.env`** — `REDIS_PASSWORD` + `MAINNET` guardrail flag.
3. **Redis AOF volume (`cardano402_redis_data`)** — payment dedup keys, UTXO reservations. Rebuildable from chain state but accelerates recovery.
4. **`data/files/`** — uploaded payment-gated content (per-tenant; can grow over time).

This runbook covers nightly encrypted backups, restore verification, and disaster recovery.

## Architecture

- **Tool:** [restic](https://restic.net/) — client-side AES-256 encryption, content-addressed dedup, single binary, supports many backends through one config.
- **Backend:** swappable via env file. Default documentation uses Backblaze B2 (cheapest for tiny repos); the example file also shows Cloudflare R2, AWS S3, Hetzner Storage Box, and tailnet SFTP.
- **Schedule:** nightly at 03:00 UTC via `/etc/cron.d/cardano402-backup`.
- **Retention:** 14 daily snapshots, 8 weekly, 12 monthly (~24 snapshots steady-state).
- **Integrity:** every run does a `restic check --read-data-subset=5%`.
- **Verification:** documented restore-to-temp procedure, recommended quarterly.

## One-time setup

### 1. Pick and configure a backend

Sign up for whichever backend you want (see `scripts/cardano402-restic.env.example` for B2 / R2 / S3 / Hetzner / SFTP). For B2 (the recommended default):

1. Create account at https://www.backblaze.com/sign-up/cloud-storage
2. Create a private bucket named `cardano402-backups`
3. Generate an Application Key scoped to that bucket with read+write access
4. Save the keyID and applicationKey — they're shown only once

### 2. Generate the encryption passphrase

This passphrase decrypts the entire repository. Without it, the backups are unreadable, including by you.

```bash
openssl rand -base64 36
```

**Store this passphrase in at least two durable, offline locations** before the first backup runs:

- Paper copy in a fireproof safe / safety deposit box
- Password manager you control (1Password, Bitwarden, etc.)
- Optional: a YubiKey-encrypted file
- Optional: secret-sharing across trusted parties (Shamir, with thresholds)

If you only have the passphrase on this VPS, a VPS failure means losing both the data and the backups of the data.

### 3. Install restic and the credentials file

On the VPS (via Tailscale SSH):

```bash
sudo apt update && sudo apt install -y restic

sudo mkdir -p /etc/cardano402
sudo cp /opt/cardano402/scripts/cardano402-restic.env.example /etc/cardano402/restic.env
sudo chown root:root /etc/cardano402/restic.env
sudo chmod 600 /etc/cardano402/restic.env

sudo nano /etc/cardano402/restic.env # fill in RESTIC_PASSWORD + backend creds
```

### 4. Initialize the restic repository

This creates the encryption keys in the remote bucket. Only run once.

```bash
sudo -E env $(grep -v '^#' /etc/cardano402/restic.env | xargs) restic init
```

(The `sudo -E env ...` dance loads the env file into restic's environment for this one command.)

### 5. Run an initial backup

```bash
sudo bash /opt/cardano402/scripts/backup.sh
```

Watch the log:

```bash
sudo tail -f /var/log/cardano402-backup.log
```

Expected: completes in seconds for a fresh facilitator (~30KB total). Restic outputs the snapshot ID at the end (`snapshot abc12345 saved`).

### 6. Verify the restore loop works

This is the most important post-setup step. **A backup you've never restored is not a backup.**

```bash
sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/restore-test
```

Inspect:

```bash
sudo ls -la /tmp/restore-test
sudo cat /tmp/restore-test/.../MANIFEST.txt
sudo diff /opt/cardano402/config/config.json /tmp/restore-test/.../sensitive/config.json
# Should be identical.

sudo rm -rf /tmp/restore-test
```

### 7. Enable the cron job

```bash
sudo cp /opt/cardano402/scripts/cardano402-backup.cron /etc/cron.d/cardano402-backup
sudo chmod 644 /etc/cron.d/cardano402-backup
```

Confirm cron will pick it up:

```bash
sudo grep cardano402 /etc/cron.d/cardano402-backup
sudo systemctl status cron
```

The next scheduled run is 03:00 UTC.

## Routine operations

### Check recent runs

```bash
sudo tail -100 /var/log/cardano402-backup.log
```

Look for `=== cardano402 backup completed successfully ===` lines.

### List snapshots

```bash
sudo bash /opt/cardano402/scripts/restore.sh list
```

### Run a backup on demand

```bash
sudo bash /opt/cardano402/scripts/backup.sh
```

### Run a periodic restore test (recommended quarterly)

```bash
sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/restore-q$(date +%Y%m%d)
# Inspect a few files, confirm MANIFEST.txt looks right, then:
sudo rm -rf /tmp/restore-q*
```

Calendar reminder: every 90 days. A backup chain that hasn't been tested in a year tends to silently break.

## Disaster recovery scenarios

### Scenario A — `config/config.json` got corrupted/deleted on the live VPS

```bash
sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/recover
sudo cp /tmp/recover/.../sensitive/config.json /opt/cardano402/config/config.json
sudo chown morganic:morganic /opt/cardano402/config/config.json
sudo chmod 644 /opt/cardano402/config/config.json
docker compose -f /opt/cardano402/docker-compose.prod.yml restart facilitator
curl http://localhost:3000/health
```

### Scenario B — VPS disk failure, new VPS available

1. Provision the new VPS, attach to Tailscale, install Docker.
2. Clone the repo: `sudo git clone https://github.com/MorganOnCode/cardano402 /opt/cardano402`
3. Install restic, copy the **same** `/etc/cardano402/restic.env` (you have an offline copy of the passphrase).
4. Restore: `sudo bash /opt/cardano402/scripts/restore.sh latest /tmp/recover`
5. Put files back:
```bash
sudo cp /tmp/recover/.../sensitive/config.json /opt/cardano402/config/config.json
sudo cp /tmp/recover/.../sensitive/dotenv /opt/cardano402/.env
sudo mkdir -p /opt/cardano402/data
sudo cp -a /tmp/recover/.../data-files /opt/cardano402/data/files
```
6. Restore Redis volume:
```bash
docker volume create cardano402_redis_data
docker run --rm \
-v cardano402_redis_data:/dest \
-v /tmp/recover/.../redis:/source:ro \
alpine sh -c "cp -a /source/. /dest/"
```
7. Start: `cd /opt/cardano402 && docker compose -f docker-compose.prod.yml up -d`
8. Verify: `curl http://localhost:3000/health` returns `healthy`.

### Scenario C — VPS is gone AND the passphrase is gone

You're in trouble. The mainnet seed phrase is no longer recoverable from any encrypted backup — only from whatever offline seed-phrase backup you kept at the moment you set up the facilitator (per Cardano custody best practice, the 24-word phrase should already be written down somewhere safe, completely independent of this backup chain). Re-import the seed into a fresh facilitator install and recover funds. Uploaded payment-gated content is lost.

This is the scenario the offline passphrase storage in step 2 of setup is designed to prevent.

## Failure modes & alerts

The backup script writes to syslog and `/var/log/cardano402-backup.log`. To get notified of failures, the simplest option is a Sentry integration: wrap the cron entry in `curl --data-raw "..." https://<sentry-cron-monitor-url>` before and after. Defer until repeat failures actually happen — `restic check` failures are rare and `restic backup` failures are usually self-explanatory in the log.

If you want to get fancier: `cronitor.io`, `healthchecks.io`, or a dead-man's-switch via Pingdom.

## What's NOT backed up (by design)

- **Docker images** — rebuildable from `docker compose build` against the same git commit
- **`node_modules`** — pnpm install reproduces this from `pnpm-lock.yaml` in git
- **Application logs** — handled by Docker's log rotation (json-file driver, max-size 50m, max-file 5)
- **The git working tree** — already redundantly stored on GitHub

If the VPS dies and the backups die *and* the GitHub repo dies, that's a three-way disaster the runbook doesn't cover.
4 changes: 4 additions & 0 deletions docs/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
4. Start server: `pnpm dev`
5. Verify: `curl http://localhost:3000/health`

## Backups

See [`backup-restore.md`](backup-restore.md) for the encrypted off-host backup runbook (restic, nightly cron, retention policy, restore procedure, disaster recovery scenarios).

## Manual deploy procedure

Production deploys run manually from a tailnet-attached laptop (the VPS is Tailscale-only, no public SSH). The canonical "phased deploy" pattern used for any change that touches `docker-compose.prod.yml` or `Dockerfile`:
Expand Down
133 changes: 133 additions & 0 deletions scripts/backup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#!/usr/bin/env bash
# cardano402 backup — nightly snapshot of sensitive config, Redis AOF,
# and uploaded payment-gated files. Encrypted off-host via restic.
#
# Usage: bash scripts/backup.sh
# Cron: see scripts/cardano402-backup.cron
#
# Credentials and backend choice live in /etc/cardano402/restic.env
# (mode 0600, root-owned). See scripts/cardano402-restic.env.example.

set -euo pipefail

REPO_ROOT="${CARDANO402_REPO_ROOT:-/opt/cardano402}"
ENV_FILE="${CARDANO402_RESTIC_ENV:-/etc/cardano402/restic.env}"
LOG_FILE="${CARDANO402_BACKUP_LOG:-/var/log/cardano402-backup.log}"
LOCK_FILE="${CARDANO402_BACKUP_LOCK:-/var/run/cardano402-backup.lock}"

log() {
printf '[%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$*" \
| tee -a "$LOG_FILE"
}

STAGE_DIR=""
cleanup() {
local rc=$?
if [ -n "$STAGE_DIR" ] && [ -d "$STAGE_DIR" ]; then
rm -rf "$STAGE_DIR"
fi
rm -f "$LOCK_FILE"
if [ "$rc" -ne 0 ]; then
log "=== cardano402 backup FAILED (exit $rc) ==="
fi
exit "$rc"
}
trap cleanup EXIT

# Prevent overlapping runs.
if [ -f "$LOCK_FILE" ]; then
existing_pid=$(cat "$LOCK_FILE" 2>/dev/null || echo "")
if [ -n "$existing_pid" ] && kill -0 "$existing_pid" 2>/dev/null; then
log "FATAL: backup already running (pid $existing_pid). Aborting."
exit 1
fi
log "Stale lock at $LOCK_FILE (no live pid), reclaiming"
fi
echo $$ > "$LOCK_FILE"

log "=== cardano402 backup starting ==="

if [ ! -r "$ENV_FILE" ]; then
log "FATAL: $ENV_FILE missing or unreadable."
log " Copy scripts/cardano402-restic.env.example to $ENV_FILE, fill it in,"
log " chown root:root, chmod 600."
exit 1
fi
# shellcheck disable=SC1090
. "$ENV_FILE"
: "${RESTIC_REPOSITORY:?RESTIC_REPOSITORY not set in $ENV_FILE}"
: "${RESTIC_PASSWORD:?RESTIC_PASSWORD not set in $ENV_FILE}"
export RESTIC_REPOSITORY RESTIC_PASSWORD
# Pass-through any backend env vars that the env file may have exported.
[ -n "${B2_ACCOUNT_ID:-}" ] && export B2_ACCOUNT_ID B2_ACCOUNT_KEY
[ -n "${AWS_ACCESS_KEY_ID:-}" ] && export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY

if ! command -v restic >/dev/null 2>&1; then
log "FATAL: restic not installed. Install with: apt install restic"
exit 1
fi

STAGE_DIR=$(mktemp -d /tmp/cardano402-backup-XXXXXX)
chmod 700 "$STAGE_DIR"
log "Staging to $STAGE_DIR"

# 1. Sensitive config (the most valuable target — seed phrase + Blockfrost key).
mkdir -p "$STAGE_DIR/sensitive"
cp -p "$REPO_ROOT/config/config.json" "$STAGE_DIR/sensitive/config.json"
cp -p "$REPO_ROOT/.env" "$STAGE_DIR/sensitive/dotenv"
log "Staged: sensitive config ($(du -sh "$STAGE_DIR/sensitive" | cut -f1))"

# 2. Redis AOF volume. AOF is append-only; copying the on-disk state while
# redis is running yields a valid replica that may be slightly behind
# the in-memory state. Restic deduplicates so growing AOFs are cheap.
log "Snapshotting Redis volume"
docker run --rm \
-v cardano402_redis_data:/source:ro \
-v "$STAGE_DIR/redis":/dest \
alpine sh -c "cp -a /source/. /dest/" \
>> "$LOG_FILE" 2>&1
log "Staged: redis ($(du -sh "$STAGE_DIR/redis" 2>/dev/null | cut -f1))"

# 3. Uploaded payment-gated files (the storage backend's filesystem root).
if [ -d "$REPO_ROOT/data/files" ]; then
cp -a "$REPO_ROOT/data/files" "$STAGE_DIR/data-files"
log "Staged: data-files ($(du -sh "$STAGE_DIR/data-files" | cut -f1))"
else
log "Skipped: $REPO_ROOT/data/files does not exist yet"
fi

# 4. Manifest with metadata for forensic traceability.
container_image=$(docker inspect cardano402 --format '{{.Image}}' 2>/dev/null || echo "container-not-running")
git_sha=$(git -C "$REPO_ROOT" rev-parse HEAD 2>/dev/null || echo "unknown")
cat > "$STAGE_DIR/MANIFEST.txt" <<EOF
cardano402 backup manifest
host: $(hostname)
date_utc: $(date -u +%Y-%m-%dT%H:%M:%SZ)
git_sha: $git_sha
container_image: $container_image
EOF

# 5. Run restic backup. Tags make `restic forget --tag automated` safe.
log "Running restic backup"
restic backup \
--tag automated \
--tag cardano402 \
--host "$(hostname)" \
"$STAGE_DIR" \
| tee -a "$LOG_FILE"

# 6. Prune old snapshots per retention policy.
log "Pruning old snapshots (keep 14d / 8w / 12m)"
restic forget \
--tag automated \
--keep-daily 14 \
--keep-weekly 8 \
--keep-monthly 12 \
--prune \
| tee -a "$LOG_FILE"

# 7. Cheap integrity check on a 5% sample of pack files.
log "Verifying repo integrity (5% sample)"
restic check --read-data-subset=5% | tee -a "$LOG_FILE"

log "=== cardano402 backup completed successfully ==="
16 changes: 16 additions & 0 deletions scripts/cardano402-backup.cron
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# cardano402 nightly backup
#
# Install:
# sudo cp /opt/cardano402/scripts/cardano402-backup.cron /etc/cron.d/cardano402-backup
# sudo chmod 644 /etc/cron.d/cardano402-backup
#
# Logs:
# /var/log/cardano402-backup.log (the script writes here directly)
# journalctl -t CRON (cron's own logs)

SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MAILTO=""

# Nightly at 03:00 UTC. Adjust if your VPS has a busier window then.
0 3 * * * root /opt/cardano402/scripts/backup.sh >> /var/log/cardano402-backup.log 2>&1
Loading
Loading