Skip to content

Commit d8954e7

Browse files
committed
bump: logs
1 parent 0126261 commit d8954e7

4 files changed

Lines changed: 1561 additions & 0 deletions

File tree

Lines changed: 357 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,357 @@
1+
# Skill: Hetzner K3s Cluster Manager
2+
3+
> **Purpose:** Manage the Hetzner CAX31 K3s cluster for TaskFlow and related projects
4+
> **When to use:** Any task involving deployment, debugging, scaling, or monitoring on the Hetzner VPS
5+
6+
## Cluster Identity
7+
8+
```yaml
9+
Server:
10+
hostname: junaid-k8-lab
11+
ip: 46.224.224.56
12+
ssh: root@46.224.224.56
13+
specs: CAX31 ARM64 | 8 vCPU | 16GB RAM | 160GB SSD
14+
cost: $13.49/mo
15+
provider: Hetzner Cloud
16+
17+
Kubernetes:
18+
distribution: K3s v1.34.3
19+
kubeconfig_local: ~/.kube/config-hetzner
20+
kubeconfig_server: /etc/rancher/k3s/k3s.yaml
21+
ingress: Traefik (built-in)
22+
load_balancer: ServiceLB (built-in, no cost)
23+
24+
Domain: avixato.com
25+
subdomains:
26+
- avixato.com (web-dashboard)
27+
- sso.avixato.com (sso-platform)
28+
- api.avixato.com (taskflow-api)
29+
- mcp.avixato.com (mcp-server)
30+
```
31+
32+
## Quick Access Commands
33+
34+
```bash
35+
# Always prefix with KUBECONFIG or export it
36+
export KUBECONFIG=~/.kube/config-hetzner
37+
38+
# Or prefix each command
39+
KUBECONFIG=~/.kube/config-hetzner kubectl get pods -A
40+
```
41+
42+
## Installed Components
43+
44+
| Component | Namespace | Purpose | Version |
45+
|-----------|-----------|---------|---------|
46+
| K3s | - | Kubernetes distribution | v1.34.3 |
47+
| Traefik | kube-system | Ingress controller | bundled |
48+
| ServiceLB | kube-system | Load balancer (host ports) | bundled |
49+
| CoreDNS | kube-system | Cluster DNS | bundled |
50+
| Dapr | dapr-system | Pub/sub, state, workflows | v1.15.13 |
51+
| cert-manager | cert-manager | Auto SSL via Let's Encrypt | v1.14.0 |
52+
53+
## Namespaces & Applications
54+
55+
### taskflow (Production App)
56+
```
57+
Pods:
58+
- sso-platform (Better Auth SSO)
59+
- taskflow-api (FastAPI + Dapr sidecar)
60+
- web-dashboard (Next.js 15)
61+
- mcp-server (FastMCP)
62+
- notification-service (FastAPI + Dapr sidecar)
63+
64+
Resources:
65+
CPU Requests: 450m | Limits: 2250m
66+
Memory Requests: 1152Mi | Limits: 2304Mi
67+
```
68+
69+
## External Managed Services
70+
71+
### Neon PostgreSQL (Free Tier)
72+
```yaml
73+
console: https://console.neon.tech
74+
databases:
75+
- sso-v1 (SSO users, sessions)
76+
- api-v1 (Tasks, projects, workers)
77+
- chatkit-v1 (Chat conversations)
78+
- notify-v1 (Notifications, reminders)
79+
connection: Use pooler endpoint with ?sslmode=require
80+
```
81+
82+
### Upstash Redis (Free Tier)
83+
```yaml
84+
console: https://console.upstash.com
85+
host: refined-ant-42302.upstash.io:6379
86+
tls: Required (enableTLS: "true")
87+
usage: Dapr pub/sub messaging
88+
```
89+
90+
## GitHub Actions Integration
91+
92+
### Repository: mjunaidca/taskforce
93+
94+
**Secrets (sensitive):**
95+
| Secret | Purpose |
96+
|--------|---------|
97+
| KUBECONFIG | Base64 K3s kubeconfig |
98+
| UPSTASH_REDIS_HOST | Redis host:port |
99+
| UPSTASH_REDIS_PASSWORD | Redis password |
100+
| NEON_SSO_DATABASE_URL | SSO PostgreSQL |
101+
| NEON_API_DATABASE_URL | API PostgreSQL |
102+
| NEON_CHATKIT_DATABASE_URL | ChatKit PostgreSQL |
103+
| NEON_NOTIFICATION_DATABASE_URL | Notification PostgreSQL |
104+
| OPENAI_API_KEY | AI agent responses |
105+
| BETTER_AUTH_SECRET | SSO encryption |
106+
| SMTP_USER / SMTP_PASSWORD | Email sending |
107+
108+
**Variables (non-sensitive):**
109+
| Variable | Value |
110+
|----------|-------|
111+
| CLOUD_PROVIDER | kubeconfig |
112+
| INGRESS_CLASS | traefik |
113+
| DOMAIN | avixato.com |
114+
115+
## Common Operations
116+
117+
### Check Cluster Health
118+
```bash
119+
# Node status
120+
KUBECONFIG=~/.kube/config-hetzner kubectl get nodes
121+
122+
# All pods across namespaces
123+
KUBECONFIG=~/.kube/config-hetzner kubectl get pods -A
124+
125+
# Resource usage
126+
KUBECONFIG=~/.kube/config-hetzner kubectl top nodes
127+
KUBECONFIG=~/.kube/config-hetzner kubectl top pods -A
128+
```
129+
130+
### Check TaskFlow Status
131+
```bash
132+
# Pods
133+
KUBECONFIG=~/.kube/config-hetzner kubectl get pods -n taskflow
134+
135+
# Logs
136+
KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/taskflow-api -c api --tail=100
137+
KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/sso-platform --tail=100
138+
139+
# SSL Certificates
140+
KUBECONFIG=~/.kube/config-hetzner kubectl get certificates -n taskflow
141+
```
142+
143+
### Restart Services
144+
```bash
145+
# Restart a deployment (picks up new secrets/configmaps)
146+
KUBECONFIG=~/.kube/config-hetzner kubectl rollout restart deployment/taskflow-api -n taskflow
147+
148+
# Restart all TaskFlow deployments
149+
KUBECONFIG=~/.kube/config-hetzner kubectl rollout restart deployment -n taskflow
150+
```
151+
152+
### View Logs for Debugging
153+
```bash
154+
# API errors
155+
KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/taskflow-api -c api --tail=200 | grep -i error
156+
157+
# Dapr sidecar logs
158+
KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/taskflow-api -c daprd --tail=100
159+
160+
# SSO logs
161+
KUBECONFIG=~/.kube/config-hetzner kubectl logs -n taskflow deployment/sso-platform --tail=200
162+
```
163+
164+
### Scale Deployments
165+
```bash
166+
# Scale up
167+
KUBECONFIG=~/.kube/config-hetzner kubectl scale deployment/taskflow-api -n taskflow --replicas=2
168+
169+
# Scale down
170+
KUBECONFIG=~/.kube/config-hetzner kubectl scale deployment/taskflow-api -n taskflow --replicas=1
171+
```
172+
173+
### Helm Operations
174+
```bash
175+
# List releases
176+
KUBECONFIG=~/.kube/config-hetzner helm list -n taskflow
177+
178+
# Upgrade/redeploy
179+
KUBECONFIG=~/.kube/config-hetzner helm upgrade taskflow ./infrastructure/helm/taskflow \
180+
-n taskflow -f infrastructure/helm/taskflow/values-hetzner.yaml \
181+
--set "api.openai.apiKey=$OPENAI_API_KEY" \
182+
# ... other --set flags
183+
184+
# Uninstall (CAREFUL!)
185+
KUBECONFIG=~/.kube/config-hetzner helm uninstall taskflow -n taskflow
186+
```
187+
188+
## Deploying New Projects
189+
190+
### Step 1: Create Namespace
191+
```bash
192+
KUBECONFIG=~/.kube/config-hetzner kubectl create namespace <project-name>
193+
```
194+
195+
### Step 2: Create GHCR Pull Secret
196+
```bash
197+
KUBECONFIG=~/.kube/config-hetzner kubectl create secret docker-registry ghcr-secret \
198+
--namespace <project-name> \
199+
--docker-server=ghcr.io \
200+
--docker-username=<github-user> \
201+
--docker-password=$(gh auth token)
202+
```
203+
204+
### Step 3: Configure Ingress (Traefik + cert-manager)
205+
```yaml
206+
ingress:
207+
enabled: true
208+
className: traefik # MUST be traefik, not nginx
209+
host: myapp.avixato.com
210+
annotations:
211+
cert-manager.io/cluster-issuer: letsencrypt-prod
212+
tls:
213+
enabled: true
214+
secretName: myapp-tls
215+
```
216+
217+
### Step 4: Add DNS Record
218+
Add A record: `myapp.avixato.com` → `46.224.224.56`
219+
220+
### Step 5: Cross-Namespace SSO Access
221+
```yaml
222+
env:
223+
SSO_URL: http://sso-platform.taskflow.svc.cluster.local:3001
224+
```
225+
226+
## Resource Capacity
227+
228+
```
229+
Total: 8000m CPU | 16384Mi Memory
230+
Used: ~250m CPU | ~1800Mi Memory (3% | 11%)
231+
Available: ~7750m CPU | ~14500Mi Memory
232+
233+
Estimate: Can fit 8-10 more TaskFlow-sized projects
234+
```
235+
236+
## Troubleshooting Playbook
237+
238+
### Pod Not Starting
239+
```bash
240+
# Check events
241+
KUBECONFIG=~/.kube/config-hetzner kubectl describe pod <pod> -n <ns>
242+
243+
# Common causes:
244+
# - ImagePullBackOff: GHCR secret missing or wrong
245+
# - CrashLoopBackOff: Check logs, likely env var or DB connection
246+
# - Pending: Resource limits exceeded
247+
```
248+
249+
### SSL Certificate Not Issuing
250+
```bash
251+
# Check cert-manager
252+
KUBECONFIG=~/.kube/config-hetzner kubectl logs -n cert-manager deploy/cert-manager
253+
254+
# Check challenges
255+
KUBECONFIG=~/.kube/config-hetzner kubectl get challenges -A
256+
KUBECONFIG=~/.kube/config-hetzner kubectl describe challenge <name> -n <ns>
257+
258+
# Common causes:
259+
# - DNS not pointing to 46.224.224.56
260+
# - Wrong ingress class (must be traefik)
261+
# - Rate limited by Let's Encrypt
262+
```
263+
264+
### Dapr Sidecar Issues
265+
```bash
266+
# Check Dapr system
267+
KUBECONFIG=~/.kube/config-hetzner kubectl get pods -n dapr-system
268+
269+
# Check component
270+
KUBECONFIG=~/.kube/config-hetzner kubectl get components -n taskflow
271+
272+
# Common causes:
273+
# - Redis connection failed (check Upstash host/password)
274+
# - Component misconfigured
275+
```
276+
277+
### Service Unreachable
278+
```bash
279+
# Test from inside cluster
280+
KUBECONFIG=~/.kube/config-hetzner kubectl run curl --rm -it --image=curlimages/curl -- \
281+
curl http://taskflow-api.taskflow.svc.cluster.local:8000/health
282+
283+
# Check service endpoints
284+
KUBECONFIG=~/.kube/config-hetzner kubectl get endpoints -n taskflow
285+
```
286+
287+
## SSH Server Access
288+
289+
```bash
290+
# Direct SSH
291+
ssh root@46.224.224.56
292+
293+
# On server, use K3s kubectl
294+
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
295+
kubectl get pods -A
296+
297+
# Server-side Helm (if needed)
298+
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
299+
helm list -A
300+
```
301+
302+
## Backup & Recovery
303+
304+
### Get Current State
305+
```bash
306+
# Export all resources
307+
KUBECONFIG=~/.kube/config-hetzner kubectl get all -n taskflow -o yaml > taskflow-backup.yaml
308+
309+
# Export secrets (sensitive!)
310+
KUBECONFIG=~/.kube/config-hetzner kubectl get secrets -n taskflow -o yaml > taskflow-secrets.yaml
311+
```
312+
313+
### Restore (if needed)
314+
```bash
315+
# Reapply resources
316+
KUBECONFIG=~/.kube/config-hetzner kubectl apply -f taskflow-backup.yaml
317+
```
318+
319+
## Maintenance
320+
321+
### Update Dapr
322+
```bash
323+
KUBECONFIG=~/.kube/config-hetzner helm repo update
324+
KUBECONFIG=~/.kube/config-hetzner helm upgrade dapr dapr/dapr -n dapr-system
325+
```
326+
327+
### Update cert-manager
328+
```bash
329+
KUBECONFIG=~/.kube/config-hetzner kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
330+
```
331+
332+
### K3s Updates (on server)
333+
```bash
334+
ssh root@46.224.224.56
335+
curl -sfL https://get.k3s.io | sh -
336+
```
337+
338+
## Cost Tracking
339+
340+
| Item | Monthly Cost |
341+
|------|-------------|
342+
| Hetzner CAX31 | $13.49 |
343+
| Neon PostgreSQL | $0 (free tier) |
344+
| Upstash Redis | $0 (free tier) |
345+
| Domain (avixato.com) | ~$1 (amortized) |
346+
| **Total** | **~$14.50/mo** |
347+
348+
---
349+
350+
## When User Asks About Hetzner/K3s/Deployment
351+
352+
1. **Always use** `KUBECONFIG=~/.kube/config-hetzner` prefix
353+
2. **Check pods first** - most issues show in pod status
354+
3. **Check logs** - errors are usually in container logs
355+
4. **Ingress = traefik** - never nginx on this cluster
356+
5. **SSL = cert-manager** - with letsencrypt-prod issuer
357+
6. **Secrets via GitHub Actions** - never hardcode in values files

0 commit comments

Comments
 (0)