Skip to content

NeptuneHub/k3s-supreme-waffle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

494 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k3s-supreme-waffle

k3s-supreme-waffle goals is track the progress of an K3S home study project: is not for a production envirorment

Several guide was followed to create this script, at the end of this page (and each sub directory page) you can find the full reference list.

This repository contains script is to configure a K3S cluster in High Availability Embedded etcd with at least 3 server node. As O.S is used Ubuntu 24.04 server AMD64.

This script was tested on a (old) laptop machine with Virtualbox on top, in this case is suggested 3 node with 1vcpu and 8gb ram each one.

This script was also test on Hetzner cloud service, in this case 3 node with 2vcpu (intel) 4gb ram each one works fine too.

In the root of this repository you can find several script for install K3S on main server node and additional node server, deploy longhorn and run a Wikijs/MariaDB web application for testing purpose.

The file in this root folder are:

  • K3S_first_UBUNTU.sh - Run on the first server node, install K3S, Longhorn with all the dependencies, mariadb and wikijs by applying different yaml file. You need to edit it to change the ip of your machine in it;
  • K3S_second_server_UBUNTU.sh - Install the additionals servers node. You need to change the TOKEN of your cluster, you can found it in your first server node installation;
  • wikijs-config.yaml - Contains service and deployment of MariaDb;
  • wikijs-deployment.yaml - Contains the deployment of Wikijs;
  • wikijs-secret.yaml - Contains secret for the configruation of MariaDB, you need to edit it to change different user and password;
  • wikijs-pvc.yaml - Contains the persistent volume claims using longhorn;
  • wikijs-pvc-local-path.yaml - Alternative persistent volume claim using a local path on the node;
  • wikijs-service.yaml - Contains the service;
  • restart-longhorn.sh - Can be run in order to restart longhorn
  • mount-longhorn.yaml - is an utility pod that start with your longhorn pvc mounted, useful to edit file with the vi command on longhorn volume. You can access it with the command kubectl -n < namespace > exec -it longhorn-editor -- sh
  • ingress.yaml - ingress to expose longhorn frontend (or can be a template for other kind of front-end.

Additional application for your server

If you want to continue your learning you can also follow this additional guide:

  • /velero - Is an application useful for schedule backup of the entire cluster on an external Bucket (AWS S3 or even a minio local server)
  • /minio - Is an application useful to create your own bucket on your local machine. For development purphose you can run two different cluster and one of them having minio for keeping the backup made by velero
  • /cert-manager - In an application used to manager your TLS certificate. Useful for application that need encryption like nextcloud
  • /nextcloud - Is an application that can be used to storage file
  • /storagebox - How to configure and mount an SSH folder on K3S, for example the Storage Box offered by hetzner. Is useful if you want to use an external storage for the data of nextcloud
  • /hardening - Same hardening suggestions if your "home lab" is on the cloud
  • /prometheus-stack - Prometheus stack, usefull to deploy a prometheus full stack with Grafana dashboard to monitor your K3S cluster
  • /imaginary - Image preview generator, useful to be used with Nextcloud
  • /raspberrypi5 and raspberrypi02w - some test and configuration tested and work on Raspberry
  • /elasticsearc - aggregating and monitor the log of your cluster
  • /pihole - DNS server (and AD blocker) on K3S. Useful also if you want to resolve internal domain on your lan
  • /hetzner DDNS - SH script that interact with hetzner DNS api for updating the ip of your DNS name
  • /Authentik - Identity provider that you can integrate with Traefik in K3S
  • /Metallb - software loadbalancer, very useful in case of multiple node cluster
  • /homer - static light home page for your homelab

Kubernetes useful commands

  • Change the replica number of an already existing deployment
    • kubectl -n < namespace > scale deployment/< deployment_name > --replicas=0
  • Edit external IPs of a service already deployed and check the changes
    • kubectl patch svc serviceName -n < namespace > -p '{"spec":{"externalIPs":["< Put.here.the.ip >"]}}'
    • kubectl describe service serviceName -n < namespace >
  • Get pod log
    • kubectl logs < pod-specific-instance > -n < namespace >
    • kubectl describe pod -n < namespace >
  • Apply a Yaml file
    • kubectl apply -f file.yaml
  • Get Yaml file of a deployment
    • kubectl get deploy -o yaml -n < namespace >
  • Delete all resource in a namespace and then delete the namespace
  • kubectl delete all --all -n < namespace >
  • kubectl delete namespace < namespace >
  • Delete a resource, like a PVC, stucked on terminating
    • kubectl delete pvc nextcloud-db-pvc -n --force –grace-period=0
  • Delete a namspace locked on Terminating status
(
NAMESPACE=namespace
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
)
  • Some error checking error
CRUL:
kubectl exec -n nextcloud nextcloud-6cb95f8c6d-fz6qx -- curl -v http://imaginary.imaginary.svc.cluster.local:9000

DNS CHECK:
kubectl run -i --tty dnsutils --image=busybox --restart=Never --rm --namespace nextcloud -- /bin/sh -c "nslookup imaginary.imaginary.svc.cluster.local"

ENTER IN THE POD AND NAVIGATE IT
kubectl exec -it nextcloud-6cb95f8c6d-2dtj2 -n nextcloud -- /bin/bash
  • Storage FIX (need to be unmounted. maybe attached with an external USB adapter)
lsblk
sudo fsck -f -y /dev/sda2sudo fsck -f /dev/sda2
  • Assign label to a node
kubectl label node ubuntu2 k3s-upgrade=server
  • Restore etcd snapshot on 1 node cluster
/usr/local/bin/k3s server \
--cluster-reset \
--cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-ubuntu1-1723888802
  • Restore etcd snapshot on 3 node cluster
on each node:
systemctl stop k3s

on first node (the one used to initializate the Cluster the first time):
/usr/local/bin/k3s server \
--cluster-reset \
--cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-ubuntu1-1724752804
systemctl start k3s

on other nodes (not first)
rm -rf /var/lib/rancher/k3s/server/db/
systemctl start k3s
  • Delete image from a node to force the download (work even with pullpolicy:ifnotpresent
sudo crictl rmi nextcloud:stable
sudo ctr -n k8s.io images rm docker.io/library/nextcloud:stable
  • Delete a cert-manager certificate to force the re-issuing
kubectl delete challenge -n <namespace> <certificate-challenge>
kubectl delete certificate -n <namespace> <certificate-name>
  • Lis all the image already downloaded in K3S
sudo crictl images | grep collabora
  • Delete collabora to force the re-downloading last version
sudo ctr -n k8s.io images rm docker.io/collabora/code:latest
  • Delete one node from K3S cluster, for example ubuntu4
kubectl delete node ubuntu4
kubectl get pods --all-namespaces --field-selector spec.nodeName=ubuntu4 -o json | jq -r '.items[].metadata | "kubectl delete pod \(.name) -n \(.namespace) --force --grace-period=0"' | sh
  • Install/Update Traefik CRDs
kubectl apply -f https://raw.githubusercontent.com/traefik/traefik/v3.2/docs/content/reference/dynamic-configuration/kubernetes-crd-definition-v1.yml
  • Fix ETCD that show a node as already present when is not true
curl -LO https://github.com/etcd-io/etcd/releases/download/v3.5.15/etcd-v3.6.1-linux-arm64.tar.gz
tar -zxvf etcd-v3.6.1-linux-amd64.tar.gz -C ./
cd etcd-v3.6.1-linux-amd64

sudo etcdctl version \
  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key

  sudo ./etcdctl member remove 5f3b9d6a7c34d1a   --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt   --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt   --key=/var/lib/rancher/k3s/server/tls/etcd/client.key

Linux useful commands

  • Disable use of swap
    • sudo swapoff -a
  • Enable use of swap
    • sudo swapon -a
  • Disable IPV6
    • net.ipv6.conf.all.disable_ipv6 = 1
    • net.ipv6.conf.default.disable_ipv6 = 1
    • net.ipv6.conf.lo.disable_ipv6 = 1
  • check disk mounted to the machine
    • lsblk
  • Check the open port and open one port
    • sudo ufw status
    • sudo ufw allow 6443/tcp
  • Check space used from a directory
    • du -sh
  • Check disk usage
    • df -h
  • Check number of file in a directory and sub directory
    • find . -type f | wc -l
  • Check the space used by a directory and the number of file in it
DIRECTORY="./"; echo "Total size: $(du -sh "$DIRECTORY" | awk '{print $1}')"; echo "Total number of files: $(find "$DIRECTORY" -type f | wc -l)"
  • Install Docker with nvidia driver
# Install NVIDIA driver + utils explicitly
sudo apt update
sudo apt install -y nvidia-driver-535 nvidia-utils-535
sudo reboot

# Verify
nvidia-smi

# Install Docker
sudo apt install -y docker.io
sudo systemctl enable --now docker

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker


# Test
docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
  • Different command to check log on linux (tested on ubuntu 24.04)
List of reboot
last reboot

List of error in the last 2 days
journalctl --since "2 days ago" -p 3

Journalctl general LOG:
sudo journalctl --since "2024-08-02 04:29" --until "2024-08-02 04:35"

Journalctl LOG only for K3S:
sudo journalctl -u k3s --since "2024-08-02 04:29" --until "2024-08-02 04:35"

Journalctl LOG only for SSH:
sudo journalctl -u ssh --since "2024-08-02 04:29" --until "2024-08-02 04:35"

NETWORK LOG:
sudo grep "wlan0" /var/log/syslog | awk '$0 ~ /2024-07-28T02:/'

AUTH LOG:
sudo grep "error\|fail\|critical" /var/log/auth.log | grep "2024-07-29T02:"

ALL K3S container log:
grep -i "error\|fail\|critical" /var/log/containers/*.log
grep '2024-07-29T11:' /var/log/containers/*.log | grep -i 'error\|fail\|critical'

CHECK NVME
journalctl --since "2 days ago" | grep -i nvme
sudo journalctl --since "7 days ago" | grep "controller is down"
  • DNS setting (tested on ubuntu 24.04)
sudo systemctl stop systemd-resolved 
sudo systemctl disable systemd-resolved

sudo rm /etc/resolv.conf 
echo -e "nameserver 8.8.8.8\nnameserver 8.8.4.4" | sudo tee /etc/resolv.conf

sudo systemctl restart systemd-networkd
  • Check video configuration
lshw -C display
lspci -nn | grep -Ei "3d|display|vga"
  • TOP commmand but for gpu
sudo apt install intel-gpu-tools
intel_gpu_top
  • iNotify error fix
sudo vim /etc/sysctl.conf

Set this value

fs.inotify.max_user_instances=32768
fs.inotify.max_user_watches=2097152

then run this

sudo sysctl -p

OR

sudo vim /etc/sysctl.d/99-inotify.conf

Put:

fs.inotify.max_user_instances=32768
fs.inotify.max_user_watches=2097152

Then run:

sudo sysctl --system

References:

Disclaimer: some of this references wasn't used in the script but was useful for the learning purpose.

Releases

No releases published

Packages

 
 
 

Contributors

Languages