Skip to content

feat: add default-deny network policies and security hardening#1497

Draft
devantler wants to merge 14 commits intomainfrom
devantler/fix-prod-e2e-testing
Draft

feat: add default-deny network policies and security hardening#1497
devantler wants to merge 14 commits intomainfrom
devantler/fix-prod-e2e-testing

Conversation

@devantler
Copy link
Copy Markdown
Contributor

Summary

Production E2E testing identified several issues. This PR adds security hardening:

Changes

Default-deny network policies (Kyverno ClusterPolicy)

  • New add-default-deny ClusterPolicy generates a default-deny CiliumNetworkPolicy (whitelist mode) and an allow-dns CiliumNetworkPolicy (DNS egress to kube-dns) in every namespace except kube-system/kube-public/kube-node-lease

Per-namespace CiliumNetworkPolicies (co-located with controllers/apps)

  • 20 CiliumNetworkPolicies added, each opening only the specific connections needed
  • Covers: cert-manager, cnpg-system, dex, external-dns, flux-system, headlamp, homepage, keda, kubescape, kyverno, kubelet-serving-cert-approver, longhorn-system, monitoring, oauth2-proxy, opencost, reloader, velero, vertical-pod-autoscaler, wedding-app, whoami

Security context hardening (kubescape C-0013)

  • auth-proxy Deployment: runAsNonRoot, capabilities.drop: ALL, readOnlyRootFilesystem
  • minio Deployment/Job (Docker-only): runAsNonRoot, runAsUser: 1000, capabilities.drop: ALL

Issues filed

  • wedding-app#25 — Admin login broken with multiple replicas (in-memory sessions)
  • ksail#4619cluster update still missing cluster-autoscaler-config and hcloud secrets
  • FleetDM suspended due to insufficient node capacity (blocked on autoscaler)

Validation

  • All kustomize builds pass (prod, local, hetzner/docker providers)
  • ksail workload scan compliance: 80% (remaining failures are Docker-local minio or Helm-chart-managed resources)

- Add Kyverno ClusterPolicy to generate default-deny CiliumNetworkPolicy
  and allow-dns policy in every namespace (except kube-system, kube-public,
  kube-node-lease)
- Add per-namespace CiliumNetworkPolicies co-located with each controller
  and app, opening only the specific connections needed
- Harden auth-proxy Deployment with runAsNonRoot, capabilities drop, and
  readOnlyRootFilesystem (fixes kubescape C-0013)
- Harden minio Deployment and Job (Docker-only) with non-root security
  context (fixes kubescape C-0013)
- Replace flux-operator's narrow gateway-only networkpolicy with a broader
  flux-system allow policy covering all Flux controllers

Namespaces with network policies: cert-manager, cnpg-system, dex,
external-dns, flux-system, headlamp, homepage, keda, kubescape, kyverno,
kubelet-serving-cert-approver, longhorn-system, monitoring, oauth2-proxy,
opencost, reloader, velero, vertical-pod-autoscaler, wedding-app, whoami

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces namespace-level network isolation across the platform by generating default-deny policies, adding explicit CiliumNetworkPolicies for selected workloads, and tightening a few pod security contexts. It fits the repo’s GitOps/Kustomize layout by wiring the new security resources into base and provider-specific kustomizations.

Changes:

  • Adds a Kyverno ClusterPolicy that generates default-deny and DNS-allow Cilium policies for namespaces.
  • Adds explicit allow-list CiliumNetworkPolicy manifests for selected apps and controllers, plus kustomization entries to include them.
  • Hardens auth-proxy and Docker-only minio workloads with stricter security contexts.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
k8s/providers/hetzner/infrastructure/controllers/longhorn/networkpolicy.yaml Adds Longhorn namespace traffic policy.
k8s/providers/hetzner/infrastructure/controllers/longhorn/kustomization.yaml Includes Longhorn network policy in overlay.
k8s/providers/hetzner/infrastructure/controllers/kubelet-serving-cert-approver/networkpolicy.yaml Adds kubelet cert approver traffic policy.
k8s/providers/hetzner/infrastructure/controllers/kubelet-serving-cert-approver/kustomization.yaml Includes kubelet cert approver policy.
k8s/providers/hetzner/infrastructure/controllers/external-dns/networkpolicy.yaml Adds ExternalDNS traffic policy.
k8s/providers/hetzner/infrastructure/controllers/external-dns/kustomization.yaml Includes ExternalDNS policy.
k8s/providers/docker/infrastructure/controllers/minio/deployment.yaml Hardens local MinIO deployment and bucket-init job.
k8s/bases/infrastructure/controllers/vertical-pod-autoscaler/networkpolicy.yaml Adds VPA namespace traffic policy.
k8s/bases/infrastructure/controllers/vertical-pod-autoscaler/kustomization.yaml Includes VPA policy.
k8s/bases/infrastructure/controllers/velero/networkpolicy.yaml Adds Velero traffic policy.
k8s/bases/infrastructure/controllers/velero/kustomization.yaml Includes Velero policy.
k8s/bases/infrastructure/controllers/reloader/networkpolicy.yaml Adds Reloader traffic policy.
k8s/bases/infrastructure/controllers/reloader/kustomization.yaml Includes Reloader policy.
k8s/bases/infrastructure/controllers/opencost/networkpolicy.yaml Adds OpenCost traffic policy.
k8s/bases/infrastructure/controllers/opencost/kustomization.yaml Includes OpenCost policy.
k8s/bases/infrastructure/controllers/oauth2-proxy/networkpolicy.yaml Adds oauth2-proxy namespace traffic policy.
k8s/bases/infrastructure/controllers/oauth2-proxy/kustomization.yaml Includes oauth2-proxy policy.
k8s/bases/infrastructure/controllers/kyverno/networkpolicy.yaml Adds Kyverno traffic policy.
k8s/bases/infrastructure/controllers/kyverno/kustomization.yaml Includes Kyverno policy.
k8s/bases/infrastructure/controllers/kubescape/networkpolicy.yaml Adds Kubescape traffic policy.
k8s/bases/infrastructure/controllers/kubescape/kustomization.yaml Includes Kubescape policy.
k8s/bases/infrastructure/controllers/kube-prometheus-stack/networkpolicy.yaml Adds monitoring namespace traffic policy.
k8s/bases/infrastructure/controllers/kube-prometheus-stack/kustomization.yaml Includes monitoring policy.
k8s/bases/infrastructure/controllers/keda/networkpolicy.yaml Adds KEDA namespace traffic policy.
k8s/bases/infrastructure/controllers/keda/kustomization.yaml Includes KEDA policy.
k8s/bases/infrastructure/controllers/flux-operator/networkpolicy.yaml Expands Flux namespace policy to broader allow-list rules.
k8s/bases/infrastructure/controllers/dex/networkpolicy.yaml Adds Dex traffic policy.
k8s/bases/infrastructure/controllers/dex/kustomization.yaml Includes Dex policy.
k8s/bases/infrastructure/controllers/cloudnative-pg/networkpolicy.yaml Adds CNPG operator traffic policy.
k8s/bases/infrastructure/controllers/cloudnative-pg/kustomization.yaml Includes CNPG policy.
k8s/bases/infrastructure/controllers/cert-manager/networkpolicy.yaml Adds cert-manager namespace traffic policy.
k8s/bases/infrastructure/controllers/cert-manager/kustomization.yaml Includes cert-manager policy.
k8s/bases/infrastructure/controllers/auth-proxy/deployment.yaml Hardens auth-proxy pod/container security context.
k8s/bases/infrastructure/cluster-policies/kustomization.yaml Registers the new default-deny Kyverno policy.
k8s/bases/infrastructure/cluster-policies/best-practices/add-default-deny.yaml Generates namespace default-deny and DNS allow policies.
k8s/bases/apps/whoami/networkpolicy.yaml Adds whoami app traffic policy.
k8s/bases/apps/whoami/kustomization.yaml Includes whoami policy.
k8s/bases/apps/wedding-app/networkpolicy.yaml Adds wedding-app namespace traffic policy.
k8s/bases/apps/wedding-app/kustomization.yaml Includes wedding-app policy.
k8s/bases/apps/homepage/networkpolicy.yaml Adds homepage app traffic policy.
k8s/bases/apps/homepage/kustomization.yaml Includes homepage policy.
k8s/bases/apps/headlamp/networkpolicy.yaml Adds Headlamp app traffic policy.
k8s/bases/apps/headlamp/kustomization.yaml Includes Headlamp policy.

Comment on lines +20 to +24
- name: generate-default-deny
match:
any:
- resources:
kinds:
Comment on lines +7 to +12
endpointSelector: {}
ingress:
# Gateway ingress
- fromEntities:
- ingress
toPorts:
Comment on lines +31 to +35
# Reach backend services in any namespace
- toEndpoints:
- matchExpressions:
- key: k8s:io.kubernetes.pod.namespace
operator: Exists
Adds scan: true and scan-framework: nsa to the ksail-cluster action.
Requires devantler-tech/ksail#4620 to be merged and the action SHA
bumped — until then the inputs are silently ignored (unknown inputs
are allowed by composite actions).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CX23 workers (2 vCPU / 4 GB) are at 90-98% CPU request allocation,
blocking FleetDM and other workloads from scheduling.

CX33 (4 vCPU / 8 GB) doubles the available resources per worker.

Availability check:
- fsn1 (Falkenstein): ✅ available
- nbg1 (Nuremberg):   ❌ resource_unavailable
- hel1 (Helsinki):    ✅ available

Keeping fsn1 as primary location since CX33 is available there.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 20:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 45 out of 45 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

k8s/bases/infrastructure/controllers/keda/networkpolicy.yaml:36

  • This namespace-wide policy also selects the HTTP add-on’s interceptor and scaler pods in keda, but it never allows traffic from other keda pods. The add-on is deployed as separate interceptor/scaler components in the same namespace, so once default-deny is active their internal calls will be blocked and scale-to-zero HTTP routing will stop working.
  endpointSelector: {}
  ingress:
    # Gateway ingress to interceptor proxy
    - fromEntities:
        - ingress
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
    # Webhook from kube-apiserver
    - fromEntities:
        - kube-apiserver
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
    # Metrics scraping
    - fromEndpoints:
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: monitoring
  egress:
    # Kube API for watching scalers
    - toEntities:
        - kube-apiserver
    # Reach backend services in any namespace
    - toEndpoints:
        - matchExpressions:
            - key: k8s:io.kubernetes.pod.namespace
              operator: Exists

protocol: TCP
- port: "4180"
protocol: TCP
egress:
egress:
# Kube API for dashboard
- toEntities:
- kube-apiserver
Comment on lines +26 to +33
exclude:
any:
- resources:
names:
- kube-system
- kube-public
- kube-node-lease
generate:
Comment thread ksail.prod.yaml
hetzner:
controlPlaneServerType: cx23
workerServerType: cx23
workerServerType: cx33
Comment on lines +21 to +27
# S3-compatible backup target
- toEntities:
- world
toPorts:
- ports:
- port: "443"
protocol: TCP
- Update auto-vpa ClusterPolicy to control both CPU and memory (was
  memory-only), add DaemonSet rule for full workload coverage
- Lower LimitRange defaults from 200m/256Mi to 50m/128Mi to prevent
  over-requesting on new pods before VPA recommendations take effect
- Increase ResourceQuota limits to accommodate actual cluster capacity
- Enable VPA updater (was 0 replicas) so recommendations are applied
  continuously via pod eviction
- Disable VPA Helm tests (certgen hook can't schedule on loaded nodes)
- Remove helm-test label from VPA HelmRelease to prevent Kyverno
  mutation policy from re-enabling tests

Replaces goldilocks VPAs (deleted from cluster) with Kyverno-generated
VPAs that actively right-size all workloads.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uth-proxy runAsUser

- Fix webhook ingress ports to use pod ports (not service port 443):
  kyverno=9443, VPA=8000, cert-manager=10250, trust-manager=6443,
  KEDA=9443+6443, CNPG=9443, kubescape=8443, prometheus-operator=10250
- Add remote-node and host entities to all webhook ingress rules
  (required for Talos hostNetwork kube-apiserver on Hetzner)
- Add DNS egress (kube-dns:53 UDP+TCP) to ALL CiliumNetworkPolicies
- Add FleetDM CiliumNetworkPolicy
- Fix auth-proxy deployment: add runAsUser: 65532 for traefik container
- Add host/remote-node egress for Longhorn iSCSI communication

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 22:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 51 changed files in this pull request and generated 10 comments.

Comment on lines +27 to +32
any:
- resources:
names:
- kube-system
- kube-public
- kube-node-lease
toPorts:
- ports:
- port: "8080"
protocol: TCP
Comment on lines +7 to +11
endpointSelector: {}
ingress:
# Gateway ingress
- fromEntities:
- ingress
Comment on lines +17 to +20
egress:
# Kube API for dashboard
- toEntities:
- kube-apiserver
Comment on lines +21 to +27
# S3-compatible backup target
- toEntities:
- world
toPorts:
- ports:
- port: "443"
protocol: TCP
Comment thread ksail.prod.yaml
hetzner:
controlPlaneServerType: cx23
workerServerType: cx23
workerServerType: cx33
toPorts:
- ports:
- port: "9443"
protocol: TCP
Comment on lines +27 to +30
- toEndpoints:
- matchExpressions:
- key: k8s:io.kubernetes.pod.namespace
operator: Exists
Comment on lines +35 to +39
# Reach backend services in any namespace
- toEndpoints:
- matchExpressions:
- key: k8s:io.kubernetes.pod.namespace
operator: Exists
Comment on lines +68 to +71
# applies them at pod start. Updater evicts pods to apply new resource
# recommendations continuously.
vpa_recommender_replicas: "1"
vpa_updater_replicas: "0"
vpa_updater_replicas: "1"
The add-default-deny ClusterPolicy generates CiliumNetworkPolicy
resources in namespaces. Kyverno needs list/get/create/update/patch/delete
permissions for cilium.io/ciliumnetworkpolicies to fulfill this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WORKAROUND: The MySQL StatefulSet fails due to a PVC format issue on
Longhorn. Suspending the release to unblock the apps kustomization
while the root cause is investigated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 23:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated 6 comments.

Comment on lines +20 to +33
- name: generate-default-deny
match:
any:
- resources:
kinds:
- Namespace
exclude:
any:
- resources:
names:
- kube-system
- kube-public
- kube-node-lease
generate:
Comment on lines +17 to +25
egress:
# Kube API for dashboard
- toEntities:
- kube-apiserver
# DNS resolution
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: kube-system
k8s-app: kube-dns
Comment on lines +7 to +17
endpointSelector: {}
ingress:
# Gateway ingress
- fromEntities:
- ingress
toPorts:
- ports:
- port: "8080"
protocol: TCP
- port: "4180"
protocol: TCP
Comment on lines +21 to +27
# S3-compatible backup target
- toEntities:
- world
toPorts:
- ports:
- port: "443"
protocol: TCP
Comment on lines +68 to +71
# applies them at pod start. Updater evicts pods to apply new resource
# recommendations continuously.
vpa_recommender_replicas: "1"
vpa_updater_replicas: "0"
vpa_updater_replicas: "1"
Comment on lines +8 to +21
ingress:
# Intra-namespace (prometheus → alertmanager, etc)
- fromEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: monitoring
# Webhook from kube-apiserver (hostNetwork on control plane nodes)
- fromEntities:
- kube-apiserver
- remote-node
- host
toPorts:
- ports:
- port: "10250"
protocol: TCP
The external-scaler pod needs to reach the interceptor on port 9090
within the keda namespace. Without an intra-namespace ingress rule,
the default-deny CiliumNetworkPolicy blocks this communication.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- CNPG operator in cnpg-system needs egress to port 8000 (status) and
  5432 (postgres) on managed pods in other namespaces
- Wedding-app CNP ingress was referencing wrong namespace
  (cloudnative-pg → cnpg-system) for the operator

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 6, 2026 00:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated 7 comments.

Comment on lines +26 to +33
exclude:
any:
- resources:
names:
- kube-system
- kube-public
- kube-node-lease
generate:
Comment on lines +27 to +37
egress:
# Kube API
- toEntities:
- kube-apiserver
# OCI registries (GHCR, etc)
- toEntities:
- world
toPorts:
- ports:
- port: "443"
protocol: TCP
Comment on lines +27 to +30
# Upstream backends (homepage, etc)
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: homepage
Comment on lines +17 to +20
egress:
# Kube API for dashboard
- toEntities:
- kube-apiserver
Comment on lines +21 to +27
# S3-compatible backup target
- toEntities:
- world
toPorts:
- ports:
- port: "443"
protocol: TCP
protocol: TCP
# Intra-namespace communication (scaler→interceptor:9090, etc.)
- fromEndpoints:
- {}
Comment on lines +22 to +26
# Metrics server
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: kube-system
# DNS resolution
devantler and others added 5 commits May 6, 2026 08:18
- Add letsencrypt-prod ClusterIssuer with Cloudflare DNS01 solver
- Update prod variables to use cert-manager.io/ClusterIssuer
- Remove --cloudflare-proxied from external-dns (DNS-only records)
- Remove FleetDM suspend workaround
- Add 20Gi secondary persistence for FleetDM MySQL

Cloudflare Universal SSL only covers *.devantler.tech, not nested
*.platform.devantler.tech. Switching to LE with dns01 challenge allows
valid browser-trusted TLS for all subdomains via direct LB access.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DNS-01 challenge verification requires querying external authoritative
nameservers (e.g. Cloudflare's 108.162.192.142:53) directly. The previous
CNP only allowed DNS to cluster kube-dns, causing challenge timeout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Longhorn reports 'insufficient storage' for Redis replica volumes
despite available disk space (scheduling issue with 3-replica policy).
hcloud block storage is more reliable for persistent data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StatefulSet volumeClaimTemplates are immutable - force: true tells
Helm to delete+recreate instead of patch when upgrade would fail.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HTTPRoute for FleetDM routes through KEDA HTTP interceptor
(namespace: keda) for scale-to-zero. FleetDM CNP was only allowing
fromEntities: [ingress] but the KEDA pod has a regular identity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🫴 Ready

Development

Successfully merging this pull request may close these issues.

2 participants