Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: cloudnative-pg
namespace: cnpg-system
spec:
interval: 30m
chart:
spec:
chart: cloudnative-pg
version: ">=0.23.0 <1.0.0"
sourceRef:
kind: HelmRepository
name: cloudnative-pg
namespace: flux-system
install:
crds: CreateReplace
remediation:
retries: 3
upgrade:
crds: CreateReplace
remediation:
retries: 3
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: cloudnative-pg
namespace: flux-system
spec:
interval: 24h
url: https://cloudnative-pg.github.io/charts
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- helmrepository.yaml
- helmrelease.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: cnpg-system
15 changes: 15 additions & 0 deletions clusters/hlcl1/infra/databases/cloudnativepg/ks-chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infra-cloudnativepg
namespace: flux-system
spec:
interval: 30m
retryInterval: 1m
path: ./clusters/hlcl1/infra/databases/cloudnativepg/chart
prune: true
sourceRef:
kind: GitRepository
name: flux-system
wait: true
timeout: 10m
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ks-chart.yaml
5 changes: 5 additions & 0 deletions clusters/hlcl1/infra/databases/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- cloudnativepg/
- postgres/
21 changes: 21 additions & 0 deletions clusters/hlcl1/infra/databases/postgres/config/ff-dev.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: ff-postgres
namespace: ff-dev
spec:
instances: 1

imageName: ghcr.io/cloudnative-pg/postgresql:16

storage:
size: 5Gi

postgresql:
parameters:
timezone: UTC

bootstrap:
initdb:
database: flockfeed
owner: flockfeed
21 changes: 21 additions & 0 deletions clusters/hlcl1/infra/databases/postgres/config/ff-production.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: ff-postgres
namespace: ff-production
spec:
instances: 3

imageName: ghcr.io/cloudnative-pg/postgresql:16

storage:
size: 10Gi

postgresql:
parameters:
timezone: UTC

bootstrap:
initdb:
database: flockfeed
owner: flockfeed
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ff-dev.yaml
- ff-production.yaml
17 changes: 17 additions & 0 deletions clusters/hlcl1/infra/databases/postgres/ks-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infra-ff-postgres
namespace: flux-system
spec:
interval: 30m
retryInterval: 1m
path: ./clusters/hlcl1/infra/databases/postgres/config
prune: true
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: infra-shared-namespaces
- name: infra-cloudnativepg
timeout: 10m
4 changes: 4 additions & 0 deletions clusters/hlcl1/infra/databases/postgres/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ks-config.yaml
14 changes: 14 additions & 0 deletions clusters/hlcl1/infra/shared-namespaces/ks-namespaces.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infra-shared-namespaces
namespace: flux-system
spec:
interval: 30m
retryInterval: 1m
path: ./clusters/hlcl1/infra/shared-namespaces/namespaces
prune: true
sourceRef:
kind: GitRepository
name: flux-system
timeout: 5m
4 changes: 4 additions & 0 deletions clusters/hlcl1/infra/shared-namespaces/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ks-namespaces.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespaces.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: Namespace
metadata:
name: ff-dev
---
apiVersion: v1
kind: Namespace
metadata:
name: ff-production
2 changes: 2 additions & 0 deletions clusters/hlcl1/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@ resources:
- infra/storage/nfs
- infra/network/metallb
- infra/secrets/external-secrets
- infra/shared-namespaces
- infra/databases
- apps/pihole
- apps/monitoring
17 changes: 17 additions & 0 deletions docs/adr.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,20 @@ Deploy the NFS CSI driver (csi-driver-nfs) as a second StorageClass alongside Lo
- NFS CSI driver deploys a DaemonSet (node plugin) and a Deployment (controller) — lightweight resource footprint.
- StorageClass `nfs` is explicitly not default — workloads must opt-in by specifying `storageClassName: nfs`.
- If NAS is down, Prometheus stops writing but the cluster keeps running, acceptable.

## 008 - Shared namespaces for multi-Kustomization workloads

### Context

When an application has multiple independent Flux Kustomizations that share a namespace (e.g. a CNPG database cluster managed by infra and an app deployment managed via a separate GitRepository), namespace ownership becomes ambiguous. If the namespace is created by one Kustomization and Flux prune is enabled, deleting that Kustomization would delete the namespace and take down the other workload with it.

### Decision

Introduce `infra/shared-namespaces/` as the single owner of any namespace that is shared between two or more independent Flux Kustomizations. Neither the app nor the database config creates or owns these namespaces — they are pre-created infrastructure.

### Impacts

- Deleting an app or its database config will not accidentally delete the namespace or affect other workloads in it.
- Namespaces are created early in the reconciliation order, before anything that depends on them.
- New apps with shared namespaces add their namespace entry to `infra/shared-namespaces/namespaces.yaml` rather than creating their own namespace resource.
- Single-namespace apps that own their namespace entirely (e.g. pihole) are unaffected and continue to manage their namespace locally.
20 changes: 20 additions & 0 deletions docs/disaster-recovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
- [ ] Longhorn backup target configured (NAS NFS share)
- [ ] Longhorn recurring snapshots scheduled
- [X] Age private key backed up offline
- [ ] CNPG backup target configured (NAS NFS share or object storage)
- [ ] CNPG scheduled backups enabled for ff-postgres (production)
- [ ] NAS app config directories backed up
- [ ] Find per app solution for NAS backed apps
- [ ] Sonarr
Expand All @@ -29,6 +31,8 @@
| Longhorn volumes | Last snapshot | ~1 hr | Restore from Longhorn backup target (NFS Share) |
| OpenBao data | Last Raft snapshot | ~1 hr | Requires restore + unseal |
| Prometheus/Grafana data | Last Longhorn snapshot | ~1 hr | Dashboards are in Git (ConfigMaps) |
| ff-postgres (production) | Last CNPG backup | ~30 min | 3-replica HA; restore from CNPG backup target |
| ff-postgres (dev) | N/A | ~5 min | Single instance, dev data is disposable — recreate from scratch |
| Media library | N/A | N/A | Not backed up (re-downloadable, want to find a way to track what currently exists, will *arr config backups work here?) |
| App configs (Sonarr, Radarr, etc.) | Last NAS backup | ~30 min | Restore config dirs, redeploy |
| SOPS encryption key (age) | Offline backup and external Password Manager | Manual | Required to decrypt all secrets |
Expand Down Expand Up @@ -65,6 +69,22 @@ MetalLB is fully managed by Flux via the chart/config split pattern (see [ADR-00

No manual intervention required.

### ff-postgres (CloudNativePG) recovery

ff-postgres is managed by the CloudNativePG operator, reconciled by Flux via the `infra-ff-postgres` Kustomization.

**Production** runs 3 replicas (1 primary, 2 standbys). On single-node failure, CNPG promotes a standby automatically — no data loss, no manual intervention.

**On total cluster loss:**

1. Flux reconciles `infra-shared-namespaces` → creates `ff-dev` and `ff-production` namespaces
2. Flux reconciles `infra-cloudnativepg` → installs CNPG operator (`wait: true`)
3. Flux reconciles `infra-ff-postgres` → creates `Cluster` resources
4. CNPG restores from backup target (once backup is configured — see checklist above)
5. `ff-postgres-app` Secret is recreated by CNPG and available to the ff app

**Dev** instance (single replica) is treated as disposable — recreate from scratch, run migrations.

### PiHole recovery

1. One instance runs on k8s. If it dies, it should migrate to another node. If the entire cluster is down, we still have a second instance running on
Expand Down
Loading