Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ on:

env:
GO_VERSION: '1.24.9'
CERT_MANAGER_VERSION: 'v1.16.2'

jobs:
detect-noop:
Expand Down Expand Up @@ -125,6 +126,7 @@ jobs:
PROPERTY_PROVIDER: 'azure'
RESOURCE_SNAPSHOT_CREATION_MINIMUM_INTERVAL: ${{ matrix.resource-snapshot-creation-minimum-interval }}
RESOURCE_CHANGES_COLLECTION_DURATION: ${{ matrix.resource-changes-collection-duration }}
CERT_MANAGER_VERSION: ${{ env.CERT_MANAGER_VERSION }}

- name: Collect logs
if: always()
Expand Down
97 changes: 96 additions & 1 deletion charts/hub-agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,33 @@

## Install Chart

### Default Installation (Self-Signed Certificates)

```console
# Helm install with fleet-system namespace already created
helm install hub-agent ./charts/hub-agent/
```

### Installation with cert-manager

When using cert-manager for certificate management, install cert-manager as a prerequisite first:

```console
# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.16.2 \
--set crds.enabled=true

# Then install hub-agent with cert-manager enabled
helm install hub-agent ./charts/hub-agent --set useCertManager=true --set enableWorkload=true --set enableWebhook=true
```

This configures cert-manager to manage webhook certificates.

## Upgrade Chart

```console
Expand All @@ -32,6 +54,14 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
| `affinity` | Node affinity for hub-agent pods | `{}` |
| `tolerations` | Tolerations for hub-agent pods | `[]` |
| `logVerbosity` | Log level (klog V logs) | `5` |
| `enableWebhook` | Enable webhook server | `true` |
| `webhookServiceName` | Webhook service name | `fleetwebhook` |
| `enableGuardRail` | Enable guard rail webhook configurations | `true` |
| `webhookClientConnectionType` | Connection type for webhook client (service or url) | `service` |
| `useCertManager` | Use cert-manager for webhook certificate management (requires `enableWebhook=true` and `enableWorkload=true`) | `false` |
| `webhookCertDir` | Directory where webhook certificates are stored/mounted | `/tmp/k8s-webhook-server/serving-certs` |
| `webhookCertName` | Name of the Certificate resource created by cert-manager | `fleet-webhook-server-cert` |
| `webhookCertSecretName` | Name of the Secret containing webhook certificates | `fleet-webhook-server-cert` |
| `enableV1Beta1APIs` | Watch for v1beta1 APIs | `true` |
| `hubAPIQPS` | QPS for fleet-apiserver (not including events/node heartbeat) | `250` |
| `hubAPIBurst` | Burst for fleet-apiserver (not including events/node heartbeat) | `1000` |
Expand All @@ -41,4 +71,69 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
| `MaxFleetSizeSupported` | Max number of member clusters supported | `100` |
| `resourceSnapshotCreationMinimumInterval` | The minimum interval at which resource snapshots could be created. | `30s` |
| `resourceChangesCollectionDuration` | The duration for collecting resource changes into one snapshot. | `15s` |
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |

## Certificate Management

The hub-agent supports two modes for webhook certificate management:

### Automatic Certificate Generation (Default)

By default, the hub-agent generates certificates automatically at startup. This mode:
- Requires no external dependencies
- Works out of the box
- Certificates are valid for 10 years

### cert-manager (Optional)

When `useCertManager=true`, certificates are managed by cert-manager. This mode:
- Requires cert-manager to be installed as a prerequisite
- Requires `enableWorkload=true` to allow cert-manager pods to run in the hub cluster (without this, pod creation would be blocked by the webhook)
- Requires `enableWebhook=true` because cert-manager is only used for webhook certificate management
- Handles certificate rotation automatically (90-day certificates)
- Follows industry-standard certificate management practices
- Suitable for production environments

To switch to cert-manager mode:
```console
# Install cert-manager first
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.16.2 \
--set crds.enabled=true

# Then install hub-agent with cert-manager enabled
helm install hub-agent ./charts/hub-agent --set useCertManager=true --set enableWorkload=true --set enableWebhook=true
```

### Certificate Directory Configuration

The `webhookCertDir` parameter allows you to customize where webhook certificates are stored:
- Default: `/tmp/k8s-webhook-server/serving-certs`
- Must match the volumeMount path when using cert-manager
- Configurable via both Helm values and `--webhook-cert-dir` flag

The `webhookCertName` parameter specifies the Certificate resource name:
- Default: `fleet-webhook-server-cert`
- When using cert-manager, this is the name of the Certificate resource
- Referenced in the `cert-manager.io/inject-ca-from` annotation
- Configurable via both Helm values and `--webhook-cert-name` flag

The `webhookCertSecretName` parameter specifies the Secret name for webhook certificates:
- Default: `fleet-webhook-server-cert`
- When using cert-manager, this is the Secret name created by the Certificate resource
- Must match the secretName in the Certificate spec
- Configurable via both Helm values and `--webhook-cert-secret-name` flag

Example with custom certificate directory and names:
```console
helm install hub-agent ./charts/hub-agent \
--set useCertManager=true \
--set enableWorkload=true \
--set webhookCertDir=/custom/cert/path \
--set webhookCertName=my-webhook-certificate \
--set webhookCertSecretName=my-webhook-secret
```
62 changes: 62 additions & 0 deletions charts/hub-agent/templates/certificate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{{- if and .Values.enableWebhook .Values.useCertManager }}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: {{ .Values.webhookCertName }}
namespace: {{ .Values.namespace }}
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
# Secret name where cert-manager will store the certificate
secretName: {{ .Values.webhookCertSecretName }}

# Certificate duration (90 days is cert-manager's default and recommended)
duration: 2160h # 90 days

# Renew certificate 30 days before expiry
renewBefore: 720h # 30 days

# Subject configuration
subject:
organizations:
- KubeFleet

# Common name
commonName: fleet-webhook.{{ .Values.namespace }}.svc

# DNS names for the certificate
dnsNames:
- {{ .Values.webhookServiceName }}
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc.cluster.local

# Issuer reference - using self-signed issuer
issuerRef:
name: fleet-selfsigned-issuer
kind: Issuer
group: cert-manager.io

# Private key configuration
privateKey:
algorithm: ECDSA
size: 256

# Key usages
usages:
- digital signature
- key encipherment
- server auth
---
# Self-signed issuer for generating the certificate
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: fleet-selfsigned-issuer
namespace: {{ .Values.namespace }}
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
selfSigned: {}
{{- end }}
23 changes: 23 additions & 0 deletions charts/hub-agent/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
{{- if and (not .Values.useCertManager) (gt (.Values.replicaCount | int) 1) }}
{{- fail "ERROR: replicaCount > 1 requires useCertManager=true (self-signed certificates cannot be shared across replicas)" }}
{{- end }}
apiVersion: apps/v1
kind: Deployment
metadata:
Expand All @@ -6,6 +9,7 @@ metadata:
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "hub-agent.selectorLabels" . | nindent 6 }}
Expand All @@ -25,6 +29,10 @@ spec:
- --webhook-service-name={{ .Values.webhookServiceName }}
- --enable-guard-rail={{ .Values.enableGuardRail }}
- --enable-workload={{ .Values.enableWorkload }}
- --use-cert-manager={{ .Values.useCertManager }}
- --webhook-cert-dir={{ .Values.webhookCertDir }}
- --webhook-cert-name={{ .Values.webhookCertName }}
- --webhook-cert-secret-name={{ .Values.webhookCertSecretName }}
- --whitelisted-users=system:serviceaccount:fleet-system:hub-agent-sa
- --webhook-client-connection-type={{.Values.webhookClientConnectionType}}
- --v={{ .Values.logVerbosity }}
Expand Down Expand Up @@ -73,6 +81,21 @@ spec:
fieldPath: metadata.namespace
resources:
{{- toYaml .Values.resources | nindent 12 }}
{{- if .Values.useCertManager }}
volumeMounts:
- name: webhook-cert
mountPath: {{ .Values.webhookCertDir }}
readOnly: true
{{- end }}
{{- if .Values.useCertManager }}
volumes:
- name: webhook-cert
secret:
secretName: {{ .Values.webhookCertSecretName }}
# defaultMode 0444 (read for all) allows the container process to read the certs
# regardless of the user/group it runs as
defaultMode: 0444
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
Expand Down
17 changes: 15 additions & 2 deletions charts/hub-agent/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,26 @@ webhookServiceName: fleetwebhook
enableGuardRail: true
webhookClientConnectionType: service
enableWorkload: false
# useCertManager enables cert-manager for webhook certificate management
# When enabled, cert-manager must be installed as a prerequisite (it is not installed automatically by this chart)
# and a Certificate resource will be created
useCertManager: false
# webhookCertDir is the directory where webhook certificates are mounted
# This is configurable via --webhook-cert-dir flag and must match the volumeMount path in deployment
webhookCertDir: /tmp/k8s-webhook-server/serving-certs
# webhookCertName is the name of the Certificate resource created by cert-manager
# This is referenced in the cert-manager.io/inject-ca-from annotation
webhookCertName: fleet-webhook-server-cert
# webhookCertSecretName is the name of the Secret containing webhook certificates
# When using cert-manager, this is the Secret name created by the Certificate resource
webhookCertSecretName: fleet-webhook-server-cert

forceDeleteWaitTime: 15m0s
clusterUnhealthyThreshold: 3m0s
resourceSnapshotCreationMinimumInterval: 30s
resourceChangesCollectionDuration: 15s

namespace:
fleet-system
namespace: fleet-system

resources:
limits:
Expand Down
39 changes: 27 additions & 12 deletions cmd/hubagent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"flag"
"fmt"
"math"
"net/http"
"os"
"strings"
"sync"
Expand Down Expand Up @@ -65,8 +66,7 @@ var (
)

const (
FleetWebhookCertDir = "/tmp/k8s-webhook-server/serving-certs"
FleetWebhookPort = 9443
FleetWebhookPort = 9443
)

func init() {
Expand Down Expand Up @@ -121,7 +121,7 @@ func main() {
},
WebhookServer: ctrlwebhook.NewServer(ctrlwebhook.Options{
Port: FleetWebhookPort,
CertDir: FleetWebhookCertDir,
CertDir: opts.WebhookCertDir,
}),
}
if opts.EnablePprof {
Expand Down Expand Up @@ -158,11 +158,25 @@ func main() {

if opts.EnableWebhook {
whiteListedUsers := strings.Split(opts.WhiteListedUsers, ",")
if err := SetupWebhook(mgr, options.WebhookClientConnectionType(opts.WebhookClientConnectionType), opts.WebhookServiceName, whiteListedUsers,
opts.EnableGuardRail, opts.EnableV1Beta1APIs, opts.DenyModifyMemberClusterLabels, opts.EnableWorkload, opts.NetworkingAgentsEnabled); err != nil {
webhookConfig, err := SetupWebhook(mgr, options.WebhookClientConnectionType(opts.WebhookClientConnectionType), opts.WebhookServiceName, whiteListedUsers,
opts.EnableGuardRail, opts.EnableV1Beta1APIs, opts.DenyModifyMemberClusterLabels, opts.EnableWorkload, opts.NetworkingAgentsEnabled, opts.UseCertManager, opts.WebhookCertDir, opts.WebhookCertName, opts.WebhookCertSecretName)
if err != nil {
klog.ErrorS(err, "unable to set up webhook")
exitWithErrorFunc()
}

// When using cert-manager, add a readiness check to ensure CA bundles are injected before marking ready.
// This prevents the pod from accepting traffic before cert-manager has populated the webhook CA bundles,
// which would cause webhook calls to fail.
if opts.UseCertManager {
if err := mgr.AddReadyzCheck("cert-manager-ca-injection", func(req *http.Request) error {
return webhookConfig.CheckCAInjection(req.Context())
}); err != nil {
klog.ErrorS(err, "unable to set up cert-manager CA injection readiness check")
exitWithErrorFunc()
}
klog.V(2).InfoS("Added cert-manager CA injection readiness check")
}
}

ctx := ctrl.SetupSignalHandler()
Expand Down Expand Up @@ -201,21 +215,22 @@ func main() {
}

// SetupWebhook generates the webhook cert and then set up the webhook configurator.
// Returns the webhook Config so it can be used for readiness checks.
func SetupWebhook(mgr manager.Manager, webhookClientConnectionType options.WebhookClientConnectionType, webhookServiceName string,
whiteListedUsers []string, enableGuardRail, isFleetV1Beta1API bool, denyModifyMemberClusterLabels bool, enableWorkload bool, networkingAgentsEnabled bool) error {
// Generate self-signed key and crt files in FleetWebhookCertDir for the webhook server to start.
w, err := webhook.NewWebhookConfig(mgr, webhookServiceName, FleetWebhookPort, &webhookClientConnectionType, FleetWebhookCertDir, enableGuardRail, denyModifyMemberClusterLabels, enableWorkload)
whiteListedUsers []string, enableGuardRail, isFleetV1Beta1API bool, denyModifyMemberClusterLabels bool, enableWorkload bool, networkingAgentsEnabled bool, useCertManager bool, webhookCertDir string, webhookCertName string, webhookCertSecretName string) (*webhook.Config, error) {
// Generate self-signed key and crt files in webhookCertDir for the webhook server to start.
w, err := webhook.NewWebhookConfig(mgr, webhookServiceName, FleetWebhookPort, &webhookClientConnectionType, webhookCertDir, enableGuardRail, denyModifyMemberClusterLabels, enableWorkload, useCertManager, webhookCertName, webhookCertSecretName)
if err != nil {
klog.ErrorS(err, "fail to generate WebhookConfig")
return err
return nil, err
}
if err = mgr.Add(w); err != nil {
klog.ErrorS(err, "unable to add WebhookConfig")
return err
return nil, err
}
if err = webhook.AddToManager(mgr, whiteListedUsers, denyModifyMemberClusterLabels, networkingAgentsEnabled); err != nil {
klog.ErrorS(err, "unable to register webhooks to the manager")
return err
return nil, err
}
return nil
return w, nil
}
16 changes: 16 additions & 0 deletions cmd/hubagent/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,18 @@ type Options struct {
// EnableWorkload enables workload resources (pods and replicasets) to be created in the hub cluster.
// When set to true, the pod and replicaset validating webhooks are disabled.
EnableWorkload bool
// UseCertManager indicates whether to use cert-manager for webhook certificate management.
// When enabled, webhook certificates are managed by cert-manager instead of self-signed generation.
UseCertManager bool
// WebhookCertDir is the directory where webhook certificates are stored/mounted.
// This must match the mountPath in the Helm chart deployment when using cert-manager.
WebhookCertDir string
// WebhookCertName is the name of the Certificate resource created by cert-manager.
// This is referenced in the cert-manager.io/inject-ca-from annotation.
WebhookCertName string
// WebhookCertSecretName is the name of the Secret containing webhook certificates.
// When using cert-manager, this is the Secret name created by the Certificate resource.
WebhookCertSecretName string
// ResourceSnapshotCreationMinimumInterval is the minimum interval at which resource snapshots could be created.
// Whether the resource snapshot is created or not depends on the both ResourceSnapshotCreationMinimumInterval and ResourceChangesCollectionDuration.
ResourceSnapshotCreationMinimumInterval time.Duration
Expand Down Expand Up @@ -185,6 +197,10 @@ func (o *Options) AddFlags(flags *flag.FlagSet) {
flags.IntVar(&o.PprofPort, "pprof-port", 6065, "The port for pprof profiling.")
flags.BoolVar(&o.DenyModifyMemberClusterLabels, "deny-modify-member-cluster-labels", false, "If set, users not in the system:masters cannot modify member cluster labels.")
flags.BoolVar(&o.EnableWorkload, "enable-workload", false, "If set, workloads (pods and replicasets) can be created in the hub cluster. This disables the pod and replicaset validating webhooks.")
flags.BoolVar(&o.UseCertManager, "use-cert-manager", false, "If set, cert-manager will be used for webhook certificate management instead of self-signed certificates.")
flags.StringVar(&o.WebhookCertDir, "webhook-cert-dir", "/tmp/k8s-webhook-server/serving-certs", "The directory where webhook certificates are stored. Must match the volumeMount path in deployment when using cert-manager.")
flags.StringVar(&o.WebhookCertName, "webhook-cert-name", "fleet-webhook-server-cert", "The name of the Certificate resource created by cert-manager. Referenced in cert-manager.io/inject-ca-from annotation.")
flags.StringVar(&o.WebhookCertSecretName, "webhook-cert-secret-name", "fleet-webhook-server-cert", "The name of the Secret containing webhook certificates. Must match the secretName in deployment and Certificate resource when using cert-manager.")
flags.DurationVar(&o.ResourceSnapshotCreationMinimumInterval, "resource-snapshot-creation-minimum-interval", 30*time.Second, "The minimum interval at which resource snapshots could be created.")
flags.DurationVar(&o.ResourceChangesCollectionDuration, "resource-changes-collection-duration", 15*time.Second,
"The duration for collecting resource changes into one snapshot. The default is 15 seconds, which means that the controller will collect resource changes for 15 seconds before creating a resource snapshot.")
Expand Down
Loading
Loading