|
| 1 | +# Skill: Hetzner K3s Cluster Manager |
| 2 | + |
| 3 | +> **Purpose:** Manage the Hetzner CAX31 K3s cluster for TaskFlow and related projects |
| 4 | +> **When to use:** Any task involving deployment, debugging, scaling, or monitoring on the Hetzner VPS |
| 5 | +
|
| 6 | +## Cluster Identity |
| 7 | + |
| 8 | +```yaml |
| 9 | +Server: |
| 10 | + hostname: junaid-k8-lab |
| 11 | + ip: 46.224.224.56 |
| 12 | + ssh: root@46.224.224.56 |
| 13 | + specs: CAX31 ARM64 | 8 vCPU | 16GB RAM | 160GB SSD |
| 14 | + cost: $13.49/mo |
| 15 | + provider: Hetzner Cloud |
| 16 | + |
| 17 | +Kubernetes: |
| 18 | + distribution: K3s v1.34.3 |
| 19 | + kubeconfig_local: ~/.kube/config-hetzner |
| 20 | + kubeconfig_server: /etc/rancher/k3s/k3s.yaml |
| 21 | + ingress: Traefik (built-in) |
| 22 | + load_balancer: ServiceLB (built-in, no cost) |
| 23 | + |
| 24 | +Domain: avixato.com |
| 25 | + subdomains: |
| 26 | + - avixato.com (web-dashboard) |
| 27 | + - sso.avixato.com (sso-platform) |
| 28 | + - api.avixato.com (taskflow-api) |
| 29 | + - mcp.avixato.com (mcp-server) |
| 30 | +``` |
| 31 | +
|
| 32 | +## Quick Access Commands |
| 33 | +
|
| 34 | +```bash |
| 35 | +# Always prefix with KUBECONFIG or export it |
| 36 | +export KUBECONFIG=~/.kube/config-hetzner |
| 37 | + |
| 38 | +# Or prefix each command |
| 39 | +KUBECONFIG=~/.kube/config-hetzner kubectl get pods -A |
| 40 | +``` |
| 41 | + |
| 42 | +## Installed Components |
| 43 | + |
| 44 | +| Component | Namespace | Purpose | Version | |
| 45 | +|-----------|-----------|---------|---------| |
| 46 | +| K3s | - | Kubernetes distribution | v1.34.3 | |
| 47 | +| Traefik | kube-system | Ingress controller | bundled | |
| 48 | +| ServiceLB | kube-system | Load balancer (host ports) | bundled | |
| 49 | +| CoreDNS | kube-system | Cluster DNS | bundled | |
| 50 | +| Dapr | dapr-system | Pub/sub, state, workflows | v1.15.13 | |
| 51 | +| cert-manager | cert-manager | Auto SSL via Let's Encrypt | v1.14.0 | |
| 52 | + |
| 53 | +## Namespaces & Applications |
| 54 | + |
| 55 | +### taskflow (Production App) |
| 56 | +``` |
| 57 | +Pods: |
| 58 | + - sso-platform (Better Auth SSO) |
| 59 | + - taskflow-api (FastAPI + Dapr sidecar) |
| 60 | + - web-dashboard (Next.js 15) |
| 61 | + - mcp-server (FastMCP) |
| 62 | + - notification-service (FastAPI + Dapr sidecar) |
| 63 | +
|
| 64 | +Resources: |
| 65 | + CPU Requests: 450m | Limits: 2250m |
| 66 | + Memory Requests: 1152Mi | Limits: 2304Mi |
| 67 | +``` |
| 68 | + |
| 69 | +## External Managed Services |
| 70 | + |
| 71 | +### Neon PostgreSQL (Free Tier) |
| 72 | +```yaml |
| 73 | +console: https://console.neon.tech |
| 74 | +databases: |
| 75 | + - sso-v1 (SSO users, sessions) |
| 76 | + - api-v1 (Tasks, projects, workers) |
| 77 | + - chatkit-v1 (Chat conversations) |
| 78 | + - notify-v1 (Notifications, reminders) |
| 79 | +connection: Use pooler endpoint with ?sslmode=require |
| 80 | +``` |
| 81 | +
|
| 82 | +### Upstash Redis (Free Tier) |
| 83 | +```yaml |
| 84 | +console: https://console.upstash.com |
| 85 | +host: refined-ant-42302.upstash.io:6379 |
| 86 | +tls: Required (enableTLS: "true") |
| 87 | +usage: Dapr pub/sub messaging |
| 88 | +``` |
| 89 | +
|
| 90 | +## GitHub Actions Integration |
| 91 | +
|
| 92 | +### Repository: mjunaidca/taskforce |
| 93 | +
|
| 94 | +**Secrets (sensitive):** |
| 95 | +| Secret | Purpose | |
| 96 | +|--------|---------| |
| 97 | +| KUBECONFIG | Base64 K3s kubeconfig | |
| 98 | +| UPSTASH_REDIS_HOST | Redis host:port | |
| 99 | +| UPSTASH_REDIS_PASSWORD | Redis password | |
| 100 | +| NEON_SSO_DATABASE_URL | SSO PostgreSQL | |
| 101 | +| NEON_API_DATABASE_URL | API PostgreSQL | |
| 102 | +| NEON_CHATKIT_DATABASE_URL | ChatKit PostgreSQL | |
| 103 | +| NEON_NOTIFICATION_DATABASE_URL | Notification PostgreSQL | |
| 104 | +| OPENAI_API_KEY | AI agent responses | |
| 105 | +| BETTER_AUTH_SECRET | SSO encryption | |
| 106 | +| SMTP_USER / SMTP_PASSWORD | Email sending | |
| 107 | +
|
| 108 | +**Variables (non-sensitive):** |
| 109 | +| Variable | Value | |
| 110 | +|----------|-------| |
| 111 | +| CLOUD_PROVIDER | kubeconfig | |
| 112 | +| INGRESS_CLASS | traefik | |
| 113 | +| DOMAIN | avixato.com | |
| 114 | +
|
| 115 | +## Common Operations |
| 116 | +
|
| 117 | +### Check Cluster Health |
| 118 | +```bash |
| 119 | +# Node status |
| 120 | +KUBECONFIG=~/.kube/config-hetzner kubectl get nodes |
| 121 | + |
| 122 | +# All pods across namespaces |
| 123 | +KUBECONFIG=~/.kube/config-hetzner kubectl get pods -A |
| 124 | + |
| 125 | +# Resource usage |
| 126 | +KUBECONFIG=~/.kube/config-hetzner kubectl top nodes |
| 127 | +KUBECONFIG=~/.kube/config-hetzner kubectl top pods -A |
| 128 | +``` |
| 129 | + |
| 130 | +### Check TaskFlow Status |
| 131 | +```bash |
| 132 | +# Pods |
| 133 | +KUBECONFIG=~/.kube/config-hetzner kubectl get pods -n taskflow |
| 134 | + |
| 135 | +# Logs |
| 136 | +KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/taskflow-api -c api --tail=100 |
| 137 | +KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/sso-platform --tail=100 |
| 138 | + |
| 139 | +# SSL Certificates |
| 140 | +KUBECONFIG=~/.kube/config-hetzner kubectl get certificates -n taskflow |
| 141 | +``` |
| 142 | + |
| 143 | +### Restart Services |
| 144 | +```bash |
| 145 | +# Restart a deployment (picks up new secrets/configmaps) |
| 146 | +KUBECONFIG=~/.kube/config-hetzner kubectl rollout restart deployment/taskflow-api -n taskflow |
| 147 | + |
| 148 | +# Restart all TaskFlow deployments |
| 149 | +KUBECONFIG=~/.kube/config-hetzner kubectl rollout restart deployment -n taskflow |
| 150 | +``` |
| 151 | + |
| 152 | +### View Logs for Debugging |
| 153 | +```bash |
| 154 | +# API errors |
| 155 | +KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/taskflow-api -c api --tail=200 | grep -i error |
| 156 | + |
| 157 | +# Dapr sidecar logs |
| 158 | +KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/taskflow-api -c daprd --tail=100 |
| 159 | + |
| 160 | +# SSO logs |
| 161 | +KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/sso-platform --tail=200 |
| 162 | +``` |
| 163 | + |
| 164 | +### Scale Deployments |
| 165 | +```bash |
| 166 | +# Scale up |
| 167 | +KUBECONFIG=~/.kube/config-hetzner kubectl scale deployment/taskflow-api -n taskflow --replicas=2 |
| 168 | + |
| 169 | +# Scale down |
| 170 | +KUBECONFIG=~/.kube/config-hetzner kubectl scale deployment/taskflow-api -n taskflow --replicas=1 |
| 171 | +``` |
| 172 | + |
| 173 | +### Helm Operations |
| 174 | +```bash |
| 175 | +# List releases |
| 176 | +KUBECONFIG=~/.kube/config-hetzner helm list -n taskflow |
| 177 | + |
| 178 | +# Upgrade/redeploy |
| 179 | +KUBECONFIG=~/.kube/config-hetzner helm upgrade taskflow ./infrastructure/helm/taskflow \ |
| 180 | + -n taskflow -f infrastructure/helm/taskflow/values-hetzner.yaml \ |
| 181 | + --set "api.openai.apiKey=$OPENAI_API_KEY" \ |
| 182 | + # ... other --set flags |
| 183 | + |
| 184 | +# Uninstall (CAREFUL!) |
| 185 | +KUBECONFIG=~/.kube/config-hetzner helm uninstall taskflow -n taskflow |
| 186 | +``` |
| 187 | + |
| 188 | +## Deploying New Projects |
| 189 | + |
| 190 | +### Step 1: Create Namespace |
| 191 | +```bash |
| 192 | +KUBECONFIG=~/.kube/config-hetzner kubectl create namespace <project-name> |
| 193 | +``` |
| 194 | + |
| 195 | +### Step 2: Create GHCR Pull Secret |
| 196 | +```bash |
| 197 | +KUBECONFIG=~/.kube/config-hetzner kubectl create secret docker-registry ghcr-secret \ |
| 198 | + --namespace <project-name> \ |
| 199 | + --docker-server=ghcr.io \ |
| 200 | + --docker-username=<github-user> \ |
| 201 | + --docker-password=$(gh auth token) |
| 202 | +``` |
| 203 | + |
| 204 | +### Step 3: Configure Ingress (Traefik + cert-manager) |
| 205 | +```yaml |
| 206 | +ingress: |
| 207 | + enabled: true |
| 208 | + className: traefik # MUST be traefik, not nginx |
| 209 | + host: myapp.avixato.com |
| 210 | + annotations: |
| 211 | + cert-manager.io/cluster-issuer: letsencrypt-prod |
| 212 | + tls: |
| 213 | + enabled: true |
| 214 | + secretName: myapp-tls |
| 215 | +``` |
| 216 | +
|
| 217 | +### Step 4: Add DNS Record |
| 218 | +Add A record: `myapp.avixato.com` → `46.224.224.56` |
| 219 | + |
| 220 | +### Step 5: Cross-Namespace SSO Access |
| 221 | +```yaml |
| 222 | +env: |
| 223 | + SSO_URL: http://sso-platform.taskflow.svc.cluster.local:3001 |
| 224 | +``` |
| 225 | + |
| 226 | +## Resource Capacity |
| 227 | + |
| 228 | +``` |
| 229 | +Total: 8000m CPU | 16384Mi Memory |
| 230 | +Used: ~250m CPU | ~1800Mi Memory (3% | 11%) |
| 231 | +Available: ~7750m CPU | ~14500Mi Memory |
| 232 | + |
| 233 | +Estimate: Can fit 8-10 more TaskFlow-sized projects |
| 234 | +``` |
| 235 | +
|
| 236 | +## Troubleshooting Playbook |
| 237 | +
|
| 238 | +### Pod Not Starting |
| 239 | +```bash |
| 240 | +# Check events |
| 241 | +KUBECONFIG=~/.kube/config-hetzner kubectl describe pod <pod> -n <ns> |
| 242 | +
|
| 243 | +# Common causes: |
| 244 | +# - ImagePullBackOff: GHCR secret missing or wrong |
| 245 | +# - CrashLoopBackOff: Check logs, likely env var or DB connection |
| 246 | +# - Pending: Resource limits exceeded |
| 247 | +``` |
| 248 | + |
| 249 | +### SSL Certificate Not Issuing |
| 250 | +```bash |
| 251 | +# Check cert-manager |
| 252 | +KUBECONFIG=~/.kube/config-hetzner kubectl logs -n cert-manager deploy/cert-manager |
| 253 | + |
| 254 | +# Check challenges |
| 255 | +KUBECONFIG=~/.kube/config-hetzner kubectl get challenges -A |
| 256 | +KUBECONFIG=~/.kube/config-hetzner kubectl describe challenge <name> -n <ns> |
| 257 | + |
| 258 | +# Common causes: |
| 259 | +# - DNS not pointing to 46.224.224.56 |
| 260 | +# - Wrong ingress class (must be traefik) |
| 261 | +# - Rate limited by Let's Encrypt |
| 262 | +``` |
| 263 | + |
| 264 | +### Dapr Sidecar Issues |
| 265 | +```bash |
| 266 | +# Check Dapr system |
| 267 | +KUBECONFIG=~/.kube/config-hetzner kubectl get pods -n dapr-system |
| 268 | + |
| 269 | +# Check component |
| 270 | +KUBECONFIG=~/.kube/config-hetzner kubectl get components -n taskflow |
| 271 | + |
| 272 | +# Common causes: |
| 273 | +# - Redis connection failed (check Upstash host/password) |
| 274 | +# - Component misconfigured |
| 275 | +``` |
| 276 | + |
| 277 | +### Service Unreachable |
| 278 | +```bash |
| 279 | +# Test from inside cluster |
| 280 | +KUBECONFIG=~/.kube/config-hetzner kubectl run curl --rm -it --image=curlimages/curl -- \ |
| 281 | + curl http://taskflow-api.taskflow.svc.cluster.local:8000/health |
| 282 | + |
| 283 | +# Check service endpoints |
| 284 | +KUBECONFIG=~/.kube/config-hetzner kubectl get endpoints -n taskflow |
| 285 | +``` |
| 286 | + |
| 287 | +## SSH Server Access |
| 288 | + |
| 289 | +```bash |
| 290 | +# Direct SSH |
| 291 | +ssh root@46.224.224.56 |
| 292 | + |
| 293 | +# On server, use K3s kubectl |
| 294 | +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml |
| 295 | +kubectl get pods -A |
| 296 | + |
| 297 | +# Server-side Helm (if needed) |
| 298 | +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml |
| 299 | +helm list -A |
| 300 | +``` |
| 301 | + |
| 302 | +## Backup & Recovery |
| 303 | + |
| 304 | +### Get Current State |
| 305 | +```bash |
| 306 | +# Export all resources |
| 307 | +KUBECONFIG=~/.kube/config-hetzner kubectl get all -n taskflow -o yaml > taskflow-backup.yaml |
| 308 | + |
| 309 | +# Export secrets (sensitive!) |
| 310 | +KUBECONFIG=~/.kube/config-hetzner kubectl get secrets -n taskflow -o yaml > taskflow-secrets.yaml |
| 311 | +``` |
| 312 | + |
| 313 | +### Restore (if needed) |
| 314 | +```bash |
| 315 | +# Reapply resources |
| 316 | +KUBECONFIG=~/.kube/config-hetzner kubectl apply -f taskflow-backup.yaml |
| 317 | +``` |
| 318 | + |
| 319 | +## Maintenance |
| 320 | + |
| 321 | +### Update Dapr |
| 322 | +```bash |
| 323 | +KUBECONFIG=~/.kube/config-hetzner helm repo update |
| 324 | +KUBECONFIG=~/.kube/config-hetzner helm upgrade dapr dapr/dapr -n dapr-system |
| 325 | +``` |
| 326 | + |
| 327 | +### Update cert-manager |
| 328 | +```bash |
| 329 | +KUBECONFIG=~/.kube/config-hetzner kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml |
| 330 | +``` |
| 331 | + |
| 332 | +### K3s Updates (on server) |
| 333 | +```bash |
| 334 | +ssh root@46.224.224.56 |
| 335 | +curl -sfL https://get.k3s.io | sh - |
| 336 | +``` |
| 337 | + |
| 338 | +## Cost Tracking |
| 339 | + |
| 340 | +| Item | Monthly Cost | |
| 341 | +|------|-------------| |
| 342 | +| Hetzner CAX31 | $13.49 | |
| 343 | +| Neon PostgreSQL | $0 (free tier) | |
| 344 | +| Upstash Redis | $0 (free tier) | |
| 345 | +| Domain (avixato.com) | ~$1 (amortized) | |
| 346 | +| **Total** | **~$14.50/mo** | |
| 347 | + |
| 348 | +--- |
| 349 | + |
| 350 | +## When User Asks About Hetzner/K3s/Deployment |
| 351 | + |
| 352 | +1. **Always use** `KUBECONFIG=~/.kube/config-hetzner` prefix |
| 353 | +2. **Check pods first** - most issues show in pod status |
| 354 | +3. **Check logs** - errors are usually in container logs |
| 355 | +4. **Ingress = traefik** - never nginx on this cluster |
| 356 | +5. **SSL = cert-manager** - with letsencrypt-prod issuer |
| 357 | +6. **Secrets via GitHub Actions** - never hardcode in values files |
0 commit comments