Skip to content

feat(backup): nightly encrypted off-host backups via restic#41

Merged
MorganOnCode merged 1 commit into
masterfrom
feat/backup-infrastructure
May 15, 2026
Merged

feat(backup): nightly encrypted off-host backups via restic#41
MorganOnCode merged 1 commit into
masterfrom
feat/backup-infrastructure

Conversation

@MorganOnCode
Copy link
Copy Markdown
Owner

Closes audit #3 (no backups). Biggest remaining production risk after the resource-limits deploy: a VPS disk failure today loses the mainnet seed phrase from `config/config.json` with no recovery path.

Architecture

  • Tool: restic — client-side AES-256 encryption, content-addressed dedup, single binary
  • Backend: swappable via env file. Default docs use Backblaze B2 (cheapest); template also covers R2, S3, Hetzner Storage Box, tailnet SFTP
  • Schedule: nightly 03:00 UTC via cron
  • Retention: 14 daily / 8 weekly / 12 monthly (~24 snapshots steady-state)
  • Integrity: `restic check --read-data-subset=5%` per run
  • Encryption passphrase: lives at `/etc/cardano402/restic.env` (mode 0600, root-owned). Runbook is explicit that it MUST be stored offline (paper / password-manager / YubiKey) — losing it loses the backups

What's backed up

Target Why
`config/config.json` Mainnet seed phrase + Blockfrost project ID — most valuable
`.env` `REDIS_PASSWORD`, `MAINNET` guardrail flag
Redis AOF volume (`cardano402_redis_data`) Dedup keys, UTXO reservations — rebuildable but accelerates recovery
`data/files/` Uploaded payment-gated content — grows with usage

Not backed up (by design): Docker images, `node_modules`, application logs, git working tree.

Files

  • `scripts/backup.sh` — main backup script. Stages to /tmp, runs restic backup, prunes, verifies. Lock file prevents overlapping runs.
  • `scripts/restore.sh` — restore to a target dir (defaults to /tmp/). Intentionally never restores over the live tree.
  • `scripts/cardano402-backup.cron` — cron template (03:00 UTC nightly)
  • `scripts/cardano402-restic.env.example` — credentials template with backend blocks
  • `docs/backup-restore.md` — full operator runbook (setup, restore verification, three DR scenarios, what's NOT backed up)
  • `docs/operations.md` — new "Backups" pointer section

What happens on merge

Nothing in production changes. This PR only adds files to the repo. The backup loop activates only after a manual operator setup on the VPS:

  1. `apt install restic`
  2. `cp scripts/cardano402-restic.env.example /etc/cardano402/restic.env`, fill in passphrase + backend credentials, chmod 600
  3. `restic init`
  4. `bash scripts/backup.sh` (first backup)
  5. `bash scripts/restore.sh latest /tmp/test` (verify the loop works)
  6. `cp scripts/cardano402-backup.cron /etc/cron.d/`

Full step-by-step in `docs/backup-restore.md` § "One-time setup".

Why these design choices

  • Backend-agnostic via restic — one tool, swap backends by changing env file. Avoids lock-in to any particular cloud
  • Stages to /tmp before backup — gets a consistent point-in-time view of all four targets, lets restic dedupe across them, and avoids long-held locks on Redis or open file handles on data/
  • AOF live-copy (no BGSAVE) — append-only files yield a valid replica even copied live; restic dedupes the growing tail efficiently. Avoids the extra coordination of a BGSAVE+wait
  • Restore never overlays the live tree — recovery is an explicit operator copy from /tmp/restored to /opt/cardano402. Prevents an accidental `bash scripts/restore.sh` from wiping live state

Test plan

  • `bash -n` syntax-checks both scripts (clean)
  • CI passes on this branch
  • On the VPS, after install/configure: `bash scripts/backup.sh` completes successfully
  • `bash scripts/restore.sh latest /tmp/restore-test` recovers files; `diff` against live config.json is empty
  • After 24h, cron fires automatically at 03:00 UTC and adds a new snapshot
  • After 14+ days, retention policy prunes the oldest daily snapshot

🤖 Generated with Claude Code

Closes audit #3 (no backups). Biggest remaining production risk after
the resource-limits deploy: a VPS disk failure today loses the mainnet
seed phrase from config/config.json with no recovery path.

Adds an opinionated, backend-agnostic backup chain:

- `scripts/backup.sh` -- nightly snapshot of:
  - config/config.json (the seed phrase + Blockfrost key)
  - .env (REDIS_PASSWORD, MAINNET flag)
  - Redis AOF volume (dedup keys, UTXO reservations)
  - data/files/ (uploaded payment-gated content)
  Stages to /tmp, runs `restic backup` to an encrypted remote repo,
  prunes per retention policy (14 daily / 8 weekly / 12 monthly),
  verifies a 5% pack-file sample. Lock file prevents overlapping runs.

- `scripts/restore.sh` -- restores any snapshot to a target dir
  (defaults to /tmp). Intentionally NEVER restores over the live tree;
  recovery is a deliberate operator copy from the restored staging area.

- `scripts/cardano402-backup.cron` -- cron template, 03:00 UTC nightly.

- `scripts/cardano402-restic.env.example` -- credentials template with
  inline blocks for Backblaze B2 (default), Cloudflare R2, AWS S3,
  Hetzner Storage Box, and tailnet SFTP. The env file lives at
  /etc/cardano402/restic.env (mode 0600, root-owned), outside the repo.

- `docs/backup-restore.md` -- operator runbook: one-time setup, restore
  verification, routine ops, three disaster-recovery scenarios, what's
  intentionally NOT backed up.

- `docs/operations.md` -- new pointer to backup-restore.md.

Restic gives us: client-side AES-256 encryption, content-addressed
dedup (the growing AOF costs almost nothing per night), one binary,
many backend choices via one config. The encryption passphrase MUST
be stored offline; the runbook is explicit about that.

No production state changes from this PR -- it just adds files. The
backup loop activates only after the operator installs restic, fills
in the env file, runs `restic init`, and copies the cron template
into place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MorganOnCode MorganOnCode merged commit ec532f6 into master May 15, 2026
5 checks passed
@MorganOnCode MorganOnCode deleted the feat/backup-infrastructure branch May 15, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant