Writeup still a WIP, please pardon the dust.
Below is mostly braindumps & rough commands for creating/tweaking these services. Formal writeup coming soon!
| Service | Uptime (1mo) | ArgoCD |
|---|---|---|
| copyparty | ||
| gogs | ||
| plex | ||
| homeassistant | ||
| jellyfin | ||
| miniflux | ||
| ntfy |
| Application | ArgoCD |
|---|---|
| argocd | |
| rook | |
| cloudflared | |
| media-automation | |
| traefik | |
| monitoring | |
| upgrade-plan | |
| duplicati |
TODO
# First node
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.34.3+k3s1 INSTALL_K3S_EXEC="server --cluster-init" sh -
export NODE_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token)
# Remaining nodes
curl -sfL https://get.k3s.io | K3S_TOKEN=$NODE_TOKEN INSTALL_K3S_VERSION=v1.34.3+k3s1 INSTALL_K3S_EXEC="server --server https://<server node ip>:6443 --kubelet-arg=allowed-unsafe-sysctls=net.ipv4.*,net.ipv6.conf.all.forwarding" sh -
# All nodes
# /etc/sysctl.d/01-kube.conf
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 4096
https://docs.k3s.io/upgrades/automated
Ensure you account for any node taints. Anecdotal, but I had one node fail to run upgrade pods due to a taint, & it appeared upgrades were postponed across the entire cluster.
$ sudo crictl rmi --prune
https://github.com/containerd/containerd/blob/main/docs/content-flow.md
containerd really doesn't want you batch-deleting snapshots.
Run the below command a few times until it stops returning results:
sudo k3s ctr -n k8s.io i rm $(sudo k3s ctr -n k8s.io i ls -q)
This other command below has given me problems before, but may purge more images. Beware of error unpacking image: failed to extract layer sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68: failed to get reader from content store: content digest sha256:fbe1a72f5dcd08ba4ca3ce3468c742786c1f6578c1f6bb401be1c4620d6ff705: not found (if it's not found... redownload it??)
for sha in $(sudo k3s ctr snapshot usage | awk '{print $1}'); do sudo k3s ctr snapshot rm $sha && echo $sha; done
Uses traefik, the k3s default.
externalTrafficPolicy: Local is used to preserve forwarded IPs.
A cluster-ingress=true label is given to the node my router is pointing to. Some services use a nodeAffinity to request it. (ex: for pods with hostNetwork: true, this ensures they run on the node with the right IP)
https://argo-cd.readthedocs.io/en/stable/getting_started/
kubectl create namespace argocd
kubectl apply -n argocd --server-side --force-conflicts -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.3.1/manifests/install.yaml
& install the CLI: https://argo-cd.readthedocs.io/en/stable/cli_installation/
The default admin account does not have the ability to generate api keys, so make a dedicated webhook user:
$ kubectl -n argocd edit configmap argocd-cm
...
data:
accounts.webhook: apiKey
...
Generate a token for the user:
argocd account generate-token --account webhook
See rook/rook-ceph-operator-values.yaml and rook/rook-ceph-cluster-values.yaml.
https://rook.io/docs/rook/latest-release/Upgrade/rook-upgrade/?h=upgrade
https://rook.io/docs/rook/latest-release/Upgrade/ceph-upgrade/?h=upgrade
ceph osd metadata <id> | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'
$ ceph osd metadata osd.1 | grep -e '"hostname"' -e '"bluestore_bdev_dev_node"'
"bluestore_bdev_dev_node": "/dev/sdd",
"hostname": "node1",
My setup divides k8s nodes into ceph & non-ceph nodes (using the label storage-node=true).
Ensure labels & a toleration are set properly, so non-rook nodes can still run PV plugin Daemonsets. I accomplished this with a storage-node=false label on non-rook nodes, with a toleration checking for storage-node.
Otherwise, any pod scheduled on a non-ceph node won't be able to mount ceph-backed PVCs.
See rook-ceph-cluster-values.yaml->cephClusterSpec->placement for an example.
EC-backed filesystems require a regular replicated pool as a default.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QI42CLL3GJ6G7PZEMAD3CXBHA5BNWSYS/ https://tracker.ceph.com/issues/42450
Then setfattr a directory on the filesystem with an EC-backed pool. Any new data written to the folder will go to the EC-backed pool.
setfattr -n ceph.dir.layout.pool -v cephfs-erasurecoded /mnt/cephfs/my-erasure-coded-dir
https://docs.ceph.com/en/quincy/cephfs/file-layouts/
Create CephFilesystem
Create SC backed by Filesystem & Pool
Ensure the CSI subvolumegroup was created. If not, ceph fs subvolumegroup create <fsname> csi
Create PVC without a specified PV: PV will be auto-created
Super important: Set created PV's persistentVolumeReclaimPolicy to Retain
Save the PV yaml, remove any extra information (see rook/data/data-static-pv.yaml for an example of what's required). Give it a more descriptive name.
Delete the PVC, and PV.
Apply your new PV YAML. Create a new PVC, pointing at this new PV.
Grow resources->storage on PV Grow resources->storage on PVC
Verify the new limit: getfattr -n ceph.quota.max_bytes /mnt/volumes/csi/csi-vol-<uuid>/<uuid>
Removing a cephfs instance with a subvolume group requires deleting the group + all snapshots.
Simply deleting the CephFileSystem CRD may result in this error appearing in operator logs:
2026-02-08 17:27:15.558449 E | ceph-file-controller: failed to reconcile CephFilesystem "rook-ceph/data" will not be deleted until all dependents are removed: filesystem subvolume groups that contain subvolumes (could be from CephFilesystem PVCs or CephNFS exports): [csi]
Trying to remove the subvolumegroup may indicate it has snapshots:
$ kubectl rook-ceph ceph fs subvolumegroup rm data csi
Info: running 'ceph' command with args: [fs subvolumegroup rm data csi]
Error ENOTEMPTY: subvolume group csi contains subvolume(s) or retained snapshots of deleted subvolume(s)
Error: . failed to run command. command terminated with exit code 39
$ kubectl rook-ceph ceph fs subvolume ls data csi
Info: running 'ceph' command with args: [fs subvolume ls data csi]
[
{
"name": "csi-vol-42675a4d-052f-11ed-8662-4a986e7745e3"
}
]
$ kubectl rook-ceph ceph fs subvolume rm data csi-vol-42675a4d-052f-11ed-8662-4a986e7745e3 csi
Info: running 'ceph' command with args: [fs subvolume rm data csi-vol-42675a4d-052f-11ed-8662-4a986e7745e3 csi]
After this, CephFileSystem deletion should proceed normally.
for i in ceph osd pool ls; do echo $i: ceph osd pool get $i crush_rule; done
On ES backed pools, device class information is in the erasure code profile, not the crush rule. https://docs.ceph.com/en/latest/dev/erasure-coded-pool/
for i in ceph osd erasure-code-profile ls; do echo $i: ceph osd erasure-code-profile get $i; done
If hostNetwork is enabled on the cluster, ensure rook-ceph-operator is not running with hostNetwork enable. It doesn't need host network access to orchestrate the cluster, & impedes orchestration of objectstores & associated resources.
This is great for setting up easy public downloads.
- Create a user (see
rook/buckets/user-josh.yaml) kubectl -n rook-ceph get secret rook-ceph-object-user-ceph-objectstore-josh -o go-template='{{range $k,$v := .data}}{{printf "%s: " $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}- Create bucket (
rook/buckets/bucket.py::create_bucket) - Set policy (
rook/buckets/bucket.py::set_public_read_policy) - Upload file
from bucket import *
conn = connect()
conn.upload_file('path/to/s3-bucket-listing/index.html', 'public', 'index.html', ExtraArgs={'ContentType': 'text/html'})https://github.com/TheJJ/ceph-balancer
See the README for how this balancing strategy compares to ceph's balancer module.
TLDR:
kubectl -n rook-ceph cp placementoptimizer.py $(kubectl -n rook-ceph get pod -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}'):/tmp/
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- bash -c 'python3 /tmp/placementoptimizer.py -v balance --max-pg-moves 50 | tee /tmp/balance-upmaps'
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- bash /tmp/balance-upmaps
https://docs.ceph.com/en/latest/man/8/mount.ceph/
sudo mount -t ceph user@<cluster FSID>.<filesystem name>=/ /mnt/ceph -o secret=<secret key>,x-systemd.requires=ceph.target,x-systemd.mount-timeout=5min,_netdev,mon_addr=192.168.1.1
sudo vi /etc/fstab
192.168.1.1,192.168.1.2:/ /ceph ceph name=admin,secret=<secret key>,x-systemd.mount-timeout=5min,_netdev,mds_namespace=data
# /etc/ceph/ceph.conf
[global]
fsid = <my cluster uuid>
mon_host = [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] [v2:192.168.1.2:3300/0,v1:192.168.1.2:6789/0]
# /etc/ceph/ceph.client.admin.keyring
[client.admin]
key = <my key>
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"
# /etc/fstab
none /ceph fuse.ceph ceph.id=admin,ceph.client_fs=data,x-systemd.requires=ceph.target,x-systemd.mount-timeout=5min,_netdev 0 0
192.168.1.1:/seedbox /nfs/seedbox nfs rw,soft 0 0
https://unix.stackexchange.com/questions/554908/disable-spectre-and-meltdown-mitigations
The monitoring folder is mostly the manifests from https://rpi4cluster.com/monitoring/monitor-intro/.
I tried https://github.com/prometheus-operator/kube-prometheus, & when I did, the only way to persist dashboards is to add them to Jsonnet & apply the generated configmap. I don't need that kind of IaC commitment in monitoring personal use dashboards.
kubectl expose svc/some-service --name=some-service-external --port 1234 --target-port 1234 --type LoadBalancer
Service will then be available on port 1234 of any k8s node.
An A record for lan.jibby.org & *.lan.jibby.org points to an internal IP.
To be safe, a middleware is included to filter out source IPs outside of the LAN network & k3s CIDR. See traefik/middleware-lanonly.yaml.
Then, internal services can be exposed with an Ingress, as a subdomain of lan.jibby.org. See examples/nginx's Ingress.
TODO: k3s, argocd, rook
These backups can be restored to the remote k3s instance to ensure functionality, or function as a secondary service instance.
https://velero.io/docs/v1.3.0/restore-reference/
Velero does not support hostPath PVCs, but works just fine with the openebs-hostpath storageClass.
KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm install openebs --namespace openebs openebs/openebs --create-namespace --set localprovisioner.basePath=/k3s-storage/openebs
This is a nice PVC option for simpler backup target setups.
- move to https://argo-workflows.readthedocs.io/en/latest/quick-start/
- https://external-secrets.io/latest/introduction/getting-started/
- upgrade rook https://rook.io/docs/rook/v1.14/Upgrade/rook-upgrade/
- rook CSI snapshots https://rook.io/docs/rook/v1.19/Storage-Configuration/Ceph-CSI/ceph-csi-snapshot/
- velero CSI snapshots https://velero.io/docs/v1.17/csi/
- redo backup target
- argocd + lan ui domain
- I think about my backup target way less often, IaC would be very helpful for it
- single host ceph
- removes openebs & minio requirement, plus self-healing
- external-secrets
- weekly restore + validation
- argocd + lan ui domain
- redo paperless, with dedicated postgres cluster (applicationset)
- Use https://github.com/dgzlopes/cdk8s-on-argocd to deduplicate main/backup manifests
- write up: node affinity + eviction, how i limit non-rook pods running on rook nodes
- PreferNoSchedule taint on rook nodes
- write up: seedbox setup & sharing the disk w/ NFS
- update gogs write up for "next" image
- finish this writeup
- try https://kubevirt.io/
- metallb failover, or cilium?
- logs
- backup over tailscale?