feat: add default-deny network policies and security hardening#1497
Draft
feat: add default-deny network policies and security hardening#1497
Conversation
- Add Kyverno ClusterPolicy to generate default-deny CiliumNetworkPolicy and allow-dns policy in every namespace (except kube-system, kube-public, kube-node-lease) - Add per-namespace CiliumNetworkPolicies co-located with each controller and app, opening only the specific connections needed - Harden auth-proxy Deployment with runAsNonRoot, capabilities drop, and readOnlyRootFilesystem (fixes kubescape C-0013) - Harden minio Deployment and Job (Docker-only) with non-root security context (fixes kubescape C-0013) - Replace flux-operator's narrow gateway-only networkpolicy with a broader flux-system allow policy covering all Flux controllers Namespaces with network policies: cert-manager, cnpg-system, dex, external-dns, flux-system, headlamp, homepage, keda, kubescape, kyverno, kubelet-serving-cert-approver, longhorn-system, monitoring, oauth2-proxy, opencost, reloader, velero, vertical-pod-autoscaler, wedding-app, whoami Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces namespace-level network isolation across the platform by generating default-deny policies, adding explicit CiliumNetworkPolicies for selected workloads, and tightening a few pod security contexts. It fits the repo’s GitOps/Kustomize layout by wiring the new security resources into base and provider-specific kustomizations.
Changes:
- Adds a Kyverno
ClusterPolicythat generatesdefault-denyand DNS-allow Cilium policies for namespaces. - Adds explicit allow-list
CiliumNetworkPolicymanifests for selected apps and controllers, plus kustomization entries to include them. - Hardens
auth-proxyand Docker-onlyminioworkloads with stricter security contexts.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
k8s/providers/hetzner/infrastructure/controllers/longhorn/networkpolicy.yaml |
Adds Longhorn namespace traffic policy. |
k8s/providers/hetzner/infrastructure/controllers/longhorn/kustomization.yaml |
Includes Longhorn network policy in overlay. |
k8s/providers/hetzner/infrastructure/controllers/kubelet-serving-cert-approver/networkpolicy.yaml |
Adds kubelet cert approver traffic policy. |
k8s/providers/hetzner/infrastructure/controllers/kubelet-serving-cert-approver/kustomization.yaml |
Includes kubelet cert approver policy. |
k8s/providers/hetzner/infrastructure/controllers/external-dns/networkpolicy.yaml |
Adds ExternalDNS traffic policy. |
k8s/providers/hetzner/infrastructure/controllers/external-dns/kustomization.yaml |
Includes ExternalDNS policy. |
k8s/providers/docker/infrastructure/controllers/minio/deployment.yaml |
Hardens local MinIO deployment and bucket-init job. |
k8s/bases/infrastructure/controllers/vertical-pod-autoscaler/networkpolicy.yaml |
Adds VPA namespace traffic policy. |
k8s/bases/infrastructure/controllers/vertical-pod-autoscaler/kustomization.yaml |
Includes VPA policy. |
k8s/bases/infrastructure/controllers/velero/networkpolicy.yaml |
Adds Velero traffic policy. |
k8s/bases/infrastructure/controllers/velero/kustomization.yaml |
Includes Velero policy. |
k8s/bases/infrastructure/controllers/reloader/networkpolicy.yaml |
Adds Reloader traffic policy. |
k8s/bases/infrastructure/controllers/reloader/kustomization.yaml |
Includes Reloader policy. |
k8s/bases/infrastructure/controllers/opencost/networkpolicy.yaml |
Adds OpenCost traffic policy. |
k8s/bases/infrastructure/controllers/opencost/kustomization.yaml |
Includes OpenCost policy. |
k8s/bases/infrastructure/controllers/oauth2-proxy/networkpolicy.yaml |
Adds oauth2-proxy namespace traffic policy. |
k8s/bases/infrastructure/controllers/oauth2-proxy/kustomization.yaml |
Includes oauth2-proxy policy. |
k8s/bases/infrastructure/controllers/kyverno/networkpolicy.yaml |
Adds Kyverno traffic policy. |
k8s/bases/infrastructure/controllers/kyverno/kustomization.yaml |
Includes Kyverno policy. |
k8s/bases/infrastructure/controllers/kubescape/networkpolicy.yaml |
Adds Kubescape traffic policy. |
k8s/bases/infrastructure/controllers/kubescape/kustomization.yaml |
Includes Kubescape policy. |
k8s/bases/infrastructure/controllers/kube-prometheus-stack/networkpolicy.yaml |
Adds monitoring namespace traffic policy. |
k8s/bases/infrastructure/controllers/kube-prometheus-stack/kustomization.yaml |
Includes monitoring policy. |
k8s/bases/infrastructure/controllers/keda/networkpolicy.yaml |
Adds KEDA namespace traffic policy. |
k8s/bases/infrastructure/controllers/keda/kustomization.yaml |
Includes KEDA policy. |
k8s/bases/infrastructure/controllers/flux-operator/networkpolicy.yaml |
Expands Flux namespace policy to broader allow-list rules. |
k8s/bases/infrastructure/controllers/dex/networkpolicy.yaml |
Adds Dex traffic policy. |
k8s/bases/infrastructure/controllers/dex/kustomization.yaml |
Includes Dex policy. |
k8s/bases/infrastructure/controllers/cloudnative-pg/networkpolicy.yaml |
Adds CNPG operator traffic policy. |
k8s/bases/infrastructure/controllers/cloudnative-pg/kustomization.yaml |
Includes CNPG policy. |
k8s/bases/infrastructure/controllers/cert-manager/networkpolicy.yaml |
Adds cert-manager namespace traffic policy. |
k8s/bases/infrastructure/controllers/cert-manager/kustomization.yaml |
Includes cert-manager policy. |
k8s/bases/infrastructure/controllers/auth-proxy/deployment.yaml |
Hardens auth-proxy pod/container security context. |
k8s/bases/infrastructure/cluster-policies/kustomization.yaml |
Registers the new default-deny Kyverno policy. |
k8s/bases/infrastructure/cluster-policies/best-practices/add-default-deny.yaml |
Generates namespace default-deny and DNS allow policies. |
k8s/bases/apps/whoami/networkpolicy.yaml |
Adds whoami app traffic policy. |
k8s/bases/apps/whoami/kustomization.yaml |
Includes whoami policy. |
k8s/bases/apps/wedding-app/networkpolicy.yaml |
Adds wedding-app namespace traffic policy. |
k8s/bases/apps/wedding-app/kustomization.yaml |
Includes wedding-app policy. |
k8s/bases/apps/homepage/networkpolicy.yaml |
Adds homepage app traffic policy. |
k8s/bases/apps/homepage/kustomization.yaml |
Includes homepage policy. |
k8s/bases/apps/headlamp/networkpolicy.yaml |
Adds Headlamp app traffic policy. |
k8s/bases/apps/headlamp/kustomization.yaml |
Includes Headlamp policy. |
Comment on lines
+20
to
+24
| - name: generate-default-deny | ||
| match: | ||
| any: | ||
| - resources: | ||
| kinds: |
Comment on lines
+7
to
+12
| endpointSelector: {} | ||
| ingress: | ||
| # Gateway ingress | ||
| - fromEntities: | ||
| - ingress | ||
| toPorts: |
Comment on lines
+31
to
+35
| # Reach backend services in any namespace | ||
| - toEndpoints: | ||
| - matchExpressions: | ||
| - key: k8s:io.kubernetes.pod.namespace | ||
| operator: Exists |
Adds scan: true and scan-framework: nsa to the ksail-cluster action. Requires devantler-tech/ksail#4620 to be merged and the action SHA bumped — until then the inputs are silently ignored (unknown inputs are allowed by composite actions). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CX23 workers (2 vCPU / 4 GB) are at 90-98% CPU request allocation, blocking FleetDM and other workloads from scheduling. CX33 (4 vCPU / 8 GB) doubles the available resources per worker. Availability check: - fsn1 (Falkenstein): ✅ available - nbg1 (Nuremberg): ❌ resource_unavailable - hel1 (Helsinki): ✅ available Keeping fsn1 as primary location since CX33 is available there. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 45 out of 45 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
k8s/bases/infrastructure/controllers/keda/networkpolicy.yaml:36
- This namespace-wide policy also selects the HTTP add-on’s
interceptorandscalerpods inkeda, but it never allows traffic from otherkedapods. The add-on is deployed as separate interceptor/scaler components in the same namespace, so once default-deny is active their internal calls will be blocked and scale-to-zero HTTP routing will stop working.
endpointSelector: {}
ingress:
# Gateway ingress to interceptor proxy
- fromEntities:
- ingress
toPorts:
- ports:
- port: "8080"
protocol: TCP
# Webhook from kube-apiserver
- fromEntities:
- kube-apiserver
toPorts:
- ports:
- port: "443"
protocol: TCP
# Metrics scraping
- fromEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: monitoring
egress:
# Kube API for watching scalers
- toEntities:
- kube-apiserver
# Reach backend services in any namespace
- toEndpoints:
- matchExpressions:
- key: k8s:io.kubernetes.pod.namespace
operator: Exists
| protocol: TCP | ||
| - port: "4180" | ||
| protocol: TCP | ||
| egress: |
| egress: | ||
| # Kube API for dashboard | ||
| - toEntities: | ||
| - kube-apiserver |
Comment on lines
+26
to
+33
| exclude: | ||
| any: | ||
| - resources: | ||
| names: | ||
| - kube-system | ||
| - kube-public | ||
| - kube-node-lease | ||
| generate: |
| hetzner: | ||
| controlPlaneServerType: cx23 | ||
| workerServerType: cx23 | ||
| workerServerType: cx33 |
Comment on lines
+21
to
+27
| # S3-compatible backup target | ||
| - toEntities: | ||
| - world | ||
| toPorts: | ||
| - ports: | ||
| - port: "443" | ||
| protocol: TCP |
- Update auto-vpa ClusterPolicy to control both CPU and memory (was memory-only), add DaemonSet rule for full workload coverage - Lower LimitRange defaults from 200m/256Mi to 50m/128Mi to prevent over-requesting on new pods before VPA recommendations take effect - Increase ResourceQuota limits to accommodate actual cluster capacity - Enable VPA updater (was 0 replicas) so recommendations are applied continuously via pod eviction - Disable VPA Helm tests (certgen hook can't schedule on loaded nodes) - Remove helm-test label from VPA HelmRelease to prevent Kyverno mutation policy from re-enabling tests Replaces goldilocks VPAs (deleted from cluster) with Kyverno-generated VPAs that actively right-size all workloads. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uth-proxy runAsUser - Fix webhook ingress ports to use pod ports (not service port 443): kyverno=9443, VPA=8000, cert-manager=10250, trust-manager=6443, KEDA=9443+6443, CNPG=9443, kubescape=8443, prometheus-operator=10250 - Add remote-node and host entities to all webhook ingress rules (required for Talos hostNetwork kube-apiserver on Hetzner) - Add DNS egress (kube-dns:53 UDP+TCP) to ALL CiliumNetworkPolicies - Add FleetDM CiliumNetworkPolicy - Fix auth-proxy deployment: add runAsUser: 65532 for traefik container - Add host/remote-node egress for Longhorn iSCSI communication Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines
+27
to
+32
| any: | ||
| - resources: | ||
| names: | ||
| - kube-system | ||
| - kube-public | ||
| - kube-node-lease |
| toPorts: | ||
| - ports: | ||
| - port: "8080" | ||
| protocol: TCP |
Comment on lines
+7
to
+11
| endpointSelector: {} | ||
| ingress: | ||
| # Gateway ingress | ||
| - fromEntities: | ||
| - ingress |
Comment on lines
+17
to
+20
| egress: | ||
| # Kube API for dashboard | ||
| - toEntities: | ||
| - kube-apiserver |
Comment on lines
+21
to
+27
| # S3-compatible backup target | ||
| - toEntities: | ||
| - world | ||
| toPorts: | ||
| - ports: | ||
| - port: "443" | ||
| protocol: TCP |
| hetzner: | ||
| controlPlaneServerType: cx23 | ||
| workerServerType: cx23 | ||
| workerServerType: cx33 |
| toPorts: | ||
| - ports: | ||
| - port: "9443" | ||
| protocol: TCP |
Comment on lines
+27
to
+30
| - toEndpoints: | ||
| - matchExpressions: | ||
| - key: k8s:io.kubernetes.pod.namespace | ||
| operator: Exists |
Comment on lines
+35
to
+39
| # Reach backend services in any namespace | ||
| - toEndpoints: | ||
| - matchExpressions: | ||
| - key: k8s:io.kubernetes.pod.namespace | ||
| operator: Exists |
Comment on lines
+68
to
+71
| # applies them at pod start. Updater evicts pods to apply new resource | ||
| # recommendations continuously. | ||
| vpa_recommender_replicas: "1" | ||
| vpa_updater_replicas: "0" | ||
| vpa_updater_replicas: "1" |
The add-default-deny ClusterPolicy generates CiliumNetworkPolicy resources in namespaces. Kyverno needs list/get/create/update/patch/delete permissions for cilium.io/ciliumnetworkpolicies to fulfill this. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WORKAROUND: The MySQL StatefulSet fails due to a PVC format issue on Longhorn. Suspending the release to unblock the apps kustomization while the root cause is investigated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines
+20
to
+33
| - name: generate-default-deny | ||
| match: | ||
| any: | ||
| - resources: | ||
| kinds: | ||
| - Namespace | ||
| exclude: | ||
| any: | ||
| - resources: | ||
| names: | ||
| - kube-system | ||
| - kube-public | ||
| - kube-node-lease | ||
| generate: |
Comment on lines
+17
to
+25
| egress: | ||
| # Kube API for dashboard | ||
| - toEntities: | ||
| - kube-apiserver | ||
| # DNS resolution | ||
| - toEndpoints: | ||
| - matchLabels: | ||
| k8s:io.kubernetes.pod.namespace: kube-system | ||
| k8s-app: kube-dns |
Comment on lines
+7
to
+17
| endpointSelector: {} | ||
| ingress: | ||
| # Gateway ingress | ||
| - fromEntities: | ||
| - ingress | ||
| toPorts: | ||
| - ports: | ||
| - port: "8080" | ||
| protocol: TCP | ||
| - port: "4180" | ||
| protocol: TCP |
Comment on lines
+21
to
+27
| # S3-compatible backup target | ||
| - toEntities: | ||
| - world | ||
| toPorts: | ||
| - ports: | ||
| - port: "443" | ||
| protocol: TCP |
Comment on lines
+68
to
+71
| # applies them at pod start. Updater evicts pods to apply new resource | ||
| # recommendations continuously. | ||
| vpa_recommender_replicas: "1" | ||
| vpa_updater_replicas: "0" | ||
| vpa_updater_replicas: "1" |
Comment on lines
+8
to
+21
| ingress: | ||
| # Intra-namespace (prometheus → alertmanager, etc) | ||
| - fromEndpoints: | ||
| - matchLabels: | ||
| k8s:io.kubernetes.pod.namespace: monitoring | ||
| # Webhook from kube-apiserver (hostNetwork on control plane nodes) | ||
| - fromEntities: | ||
| - kube-apiserver | ||
| - remote-node | ||
| - host | ||
| toPorts: | ||
| - ports: | ||
| - port: "10250" | ||
| protocol: TCP |
The external-scaler pod needs to reach the interceptor on port 9090 within the keda namespace. Without an intra-namespace ingress rule, the default-deny CiliumNetworkPolicy blocks this communication. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- CNPG operator in cnpg-system needs egress to port 8000 (status) and 5432 (postgres) on managed pods in other namespaces - Wedding-app CNP ingress was referencing wrong namespace (cloudnative-pg → cnpg-system) for the operator Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines
+26
to
+33
| exclude: | ||
| any: | ||
| - resources: | ||
| names: | ||
| - kube-system | ||
| - kube-public | ||
| - kube-node-lease | ||
| generate: |
Comment on lines
+27
to
+37
| egress: | ||
| # Kube API | ||
| - toEntities: | ||
| - kube-apiserver | ||
| # OCI registries (GHCR, etc) | ||
| - toEntities: | ||
| - world | ||
| toPorts: | ||
| - ports: | ||
| - port: "443" | ||
| protocol: TCP |
Comment on lines
+27
to
+30
| # Upstream backends (homepage, etc) | ||
| - toEndpoints: | ||
| - matchLabels: | ||
| k8s:io.kubernetes.pod.namespace: homepage |
Comment on lines
+17
to
+20
| egress: | ||
| # Kube API for dashboard | ||
| - toEntities: | ||
| - kube-apiserver |
Comment on lines
+21
to
+27
| # S3-compatible backup target | ||
| - toEntities: | ||
| - world | ||
| toPorts: | ||
| - ports: | ||
| - port: "443" | ||
| protocol: TCP |
| protocol: TCP | ||
| # Intra-namespace communication (scaler→interceptor:9090, etc.) | ||
| - fromEndpoints: | ||
| - {} |
Comment on lines
+22
to
+26
| # Metrics server | ||
| - toEndpoints: | ||
| - matchLabels: | ||
| k8s:io.kubernetes.pod.namespace: kube-system | ||
| # DNS resolution |
- Add letsencrypt-prod ClusterIssuer with Cloudflare DNS01 solver - Update prod variables to use cert-manager.io/ClusterIssuer - Remove --cloudflare-proxied from external-dns (DNS-only records) - Remove FleetDM suspend workaround - Add 20Gi secondary persistence for FleetDM MySQL Cloudflare Universal SSL only covers *.devantler.tech, not nested *.platform.devantler.tech. Switching to LE with dns01 challenge allows valid browser-trusted TLS for all subdomains via direct LB access. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DNS-01 challenge verification requires querying external authoritative nameservers (e.g. Cloudflare's 108.162.192.142:53) directly. The previous CNP only allowed DNS to cluster kube-dns, causing challenge timeout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Longhorn reports 'insufficient storage' for Redis replica volumes despite available disk space (scheduling issue with 3-replica policy). hcloud block storage is more reliable for persistent data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StatefulSet volumeClaimTemplates are immutable - force: true tells Helm to delete+recreate instead of patch when upgrade would fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HTTPRoute for FleetDM routes through KEDA HTTP interceptor (namespace: keda) for scale-to-zero. FleetDM CNP was only allowing fromEntities: [ingress] but the KEDA pod has a regular identity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production E2E testing identified several issues. This PR adds security hardening:
Changes
Default-deny network policies (Kyverno ClusterPolicy)
add-default-denyClusterPolicy generates adefault-denyCiliumNetworkPolicy (whitelist mode) and anallow-dnsCiliumNetworkPolicy (DNS egress to kube-dns) in every namespace except kube-system/kube-public/kube-node-leasePer-namespace CiliumNetworkPolicies (co-located with controllers/apps)
Security context hardening (kubescape C-0013)
runAsNonRoot,capabilities.drop: ALL,readOnlyRootFilesystemrunAsNonRoot,runAsUser: 1000,capabilities.drop: ALLIssues filed
cluster updatestill missingcluster-autoscaler-configandhcloudsecretsValidation
ksail workload scancompliance: 80% (remaining failures are Docker-local minio or Helm-chart-managed resources)