Problem
In a homelab cluster (v0.19.0, meta+tunnel mode, one external-jacaudi-dev Gateway with three attached HTTPRoutes), the httproute-source controller's workqueue p95 dwell sits at ~1 s under default settings:
controller_runtime_reconcile_total{controller="httproute-source",result="success"} 245
controller_runtime_reconcile_total{controller="httproute-source",result="requeue_after"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="httproute-source",le="0.005"} ≈ p95
workqueue_queue_duration_seconds_bucket{controller="httproute-source",le="1.0"} ≈ p95
- 245 reconciles in ~11 h, all event-driven (zero
requeue_after) — the watch path is hot
- Each reconcile completes in ~5 ms (p95) — work itself is fast
- But workqueue dwell is ~1 s p95 — events sit in the queue waiting for the single reconcile worker
Concrete impact: applying a new HTTPRoute via Flux took ~1m40s from HTTPRoute creation to cloudflared connector receiving Updated to new configuration. The operator-side share of that gap (reconciles + CF API push) is ~1.6 s; the remaining 98 s is Cloudflare control-plane → connector propagation, which is out of our hands. But the 1 s dwell on httproute-source is the operator-side component we can tighten — and it matters more when several HTTPRoutes change in the same window (e.g. a multi-app Flux apply).
Root cause: each source controller in internal/controller/tunnel/setup.go is built with ctrl.NewControllerManagedBy(mgr).Named(...).For(...).Complete(...) — no .WithOptions(controllerpkg.Options{MaxConcurrentReconciles: N}), so controller-runtime defaults to 1 concurrent reconcile per controller. Concurrent HTTPRoute events serialize.
Suggested path
Expose MaxConcurrentReconciles per source controller via a new typed ConcurrencyOptions struct on tunnel.Options. Zero value passes through to controller-runtime's default of 1, preserving existing behavior.
Per-controller rather than a single global knob because:
- Load profiles differ —
httproute-source is hot in this cluster; service-source and gateway-source are cooler; tlsroute-source is often disabled.
- Per go-standards §1 ("Compile-time over runtime — prefer named types and enums over stringly-typed APIs"), a
map[string]int keyed by controller name would invite typos. A struct of named fields is type-safe and IDE-discoverable.
internal/controller/tunnel/setup.go
import (
// …existing imports…
controllerpkg "sigs.k8s.io/controller-runtime/pkg/controller"
)
// ConcurrencyOptions controls MaxConcurrentReconciles per controller in the
// tunnel bundle. A zero value passes through to controller-runtime's default
// of 1, preserving pre-feature behavior. Raise per controller to drain
// event-driven workqueues faster when many objects change concurrently.
//
// Setting any field > 1 makes the reconcilers in that controller execute in
// parallel, which is safe — controller-runtime serializes per-object
// reconciles via the workqueue regardless of MaxConcurrentReconciles.
type ConcurrencyOptions struct {
// Tunnel sets MaxConcurrentReconciles for the CloudflareTunnel controller.
Tunnel int
// Service sets MaxConcurrentReconciles for the service-source controller.
Service int
// Gateway sets MaxConcurrentReconciles for the gateway-source controller.
Gateway int
// HTTPRoute sets MaxConcurrentReconciles for the httproute-source controller.
HTTPRoute int
// TLSRoute sets MaxConcurrentReconciles for the tlsroute-source controller.
TLSRoute int
}
type Options struct {
// …existing fields unchanged…
// Concurrency configures MaxConcurrentReconciles per controller. Zero
// values defer to controller-runtime's default of 1.
Concurrency ConcurrencyOptions
}
In each ctrl.NewControllerManagedBy(...) chain in AddToManager:
// --- CloudflareTunnel reconciler ----------------------------------------
if err := ctrl.NewControllerManagedBy(mgr).
WithOptions(controllerpkg.Options{MaxConcurrentReconciles: opts.Concurrency.Tunnel}).
For(&v2alpha1.CloudflareTunnel{}).
Complete(tunnelR); err != nil {
return fmt.Errorf("setup CloudflareTunnel: %w", err)
}
// --- ServiceSource reconciler -------------------------------------------
if err := ctrl.NewControllerManagedBy(mgr).
Named("service-source").
WithOptions(controllerpkg.Options{MaxConcurrentReconciles: opts.Concurrency.Service}).
For(&corev1.Service{}).
Owns(&v2alpha1.CloudflareDNSRecord{}).
Watches(&v2alpha1.CloudflareTunnel{}, handler.EnqueueRequestsFromMapFunc(tunnelToServices(mgr))).
Complete(svcR); err != nil {
return fmt.Errorf("setup ServiceSource: %w", err)
}
// …same WithOptions(...) on GatewaySource, HTTPRouteSource, TLSRouteSource…
No change needed to applyOptionDefaults — zero values are valid pass-through.
cmd/manager/main.go
Five new fields on the Options struct + five fs.Int flags + plumb through to runTunnel's tunnel.AddToManager(..., tunnel.Options{Concurrency: tunnel.ConcurrencyOptions{...}}) call:
tcTunnel := fs.Int("tunnel-concurrency-tunnel", 0, "MaxConcurrentReconciles for the CloudflareTunnel controller (0 = controller-runtime default of 1)")
tcService := fs.Int("tunnel-concurrency-service", 0, "MaxConcurrentReconciles for the service-source controller (0 = controller-runtime default of 1)")
tcGateway := fs.Int("tunnel-concurrency-gateway", 0, "MaxConcurrentReconciles for the gateway-source controller (0 = controller-runtime default of 1)")
tcHTTPRoute := fs.Int("tunnel-concurrency-httproute", 0, "MaxConcurrentReconciles for the httproute-source controller (0 = controller-runtime default of 1)")
tcTLSRoute := fs.Int("tunnel-concurrency-tlsroute", 0, "MaxConcurrentReconciles for the tlsroute-source controller (0 = controller-runtime default of 1)")
internal/bootstrap/{config.go,deployments.go}
Add the same five fields to bootstrap.Config, append --tunnel-concurrency-<x>=N to the spawned tunnel deployment's args only when positive (so the deployment spec stays clean at defaults — same pattern used for --tunnel-replicas etc.):
if c.TunnelConcurrencyHTTPRoute > 0 {
args = append(args, fmt.Sprintf("--tunnel-concurrency-httproute=%d", c.TunnelConcurrencyHTTPRoute))
}
// …same for tunnel, service, gateway, tlsroute…
chart/values.yaml
controllers:
tunnel:
# -- Per-controller MaxConcurrentReconciles. Zero (the default) defers to
# controller-runtime's default of 1. Raise per controller to drain
# event-driven workqueues faster when many objects of that kind change
# concurrently. The `httpRoute` knob is the most likely to need raising
# in clusters with many HTTPRoutes — measured workqueue p95 dwell on
# `httproute-source` can reach ~1s under the default. Other source
# controllers and the tunnel reconciler itself are typically fast enough
# at the default; raise only on evidence.
concurrency:
tunnel: 0
service: 0
gateway: 0
httpRoute: 0
tlsRoute: 0
Meta-operator chart template
Append five conditional args mirroring the existing --tunnel-replicas / --tunnel-log-level shape. with is false-equivalent on integer 0, so this naturally omits the flag at the default:
{{- with .Values.controllers.tunnel.concurrency.tunnel }}
- --tunnel-concurrency-tunnel={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.service }}
- --tunnel-concurrency-service={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.gateway }}
- --tunnel-concurrency-gateway={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.httpRoute }}
- --tunnel-concurrency-httproute={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.tlsRoute }}
- --tunnel-concurrency-tlsroute={{ . }}
{{- end }}
Tests
Three new unit tests, matching the existing table-driven style in internal/controller/tunnel/*_test.go and cmd/manager/main_test.go:
TestAddToManager_AppliesConcurrency — verifies non-zero Concurrency.HTTPRoute propagates as MaxConcurrentReconciles on the built controller (introspect the resulting *controller.Controller's field via the fake manager already used elsewhere in setup_test.go).
TestParseFlags_TunnelConcurrency — table-driven coverage of defaults / positive values. Negative values pass through to controller-runtime (which rejects them) — preserve that, don't pre-validate in the flag parser.
TestTunnelDeploymentArgs_Concurrency — verifies --tunnel-concurrency-httproute=N appears in the spawned deployment's args only when the config field is positive.
Standards adherence
| go-standards reference |
How |
| §1 Compile-time over runtime |
Named struct fields per controller, not map[string]int |
| §1 Convention over magic |
Field names mirror existing controller Named() identifiers |
§4 Errors with %w wrapping |
All fmt.Errorf calls preserved as-is |
| §8 Table-driven tests |
All three new tests structured as {name, in, want} |
| §8.5 golangci-lint |
No new lint exceptions; unused is clean (zero-value fields are read in every builder) |
| Existing-project override |
Matches the established Options-struct-of-named-fields pattern (DefaultImage, DefaultConnector, etc.) |
Variants
If this is too much surface area, the minimal variant is HTTPRoute only — one field on ConcurrencyOptions, one flag, one chart value. Same code shape, just narrower. Happy to scope down before sending a PR if preferred.
Default-bump alternative
Worth considering as a separate question: should the defaults change too? Setting httpRoute: 2 or 3 as the chart default (not the Go zero-value default) would benefit every user without forcing them to read the metrics. Conservative path: keep all defaults at 0 (= CR default 1), document in the chart README that homelab/medium-cluster operators may want to raise httpRoute. Happy to do whichever path you prefer.
Recommendation
Happy to send the PR if this direction looks right — let me know which variant (full vs HTTPRoute-only) and whether you'd want a default bump alongside.
Measurement context: cluster running v0.19.0 with meta+tunnel mode, one tunnel CR (network-external-jacaudi-dev), three attached HTTPRoutes (https-redirect, jellyfin-jellyfin-jacaudi, weatherstar4000-external). Metrics pulled from controller_runtime_reconcile_* and workqueue_* series on the tunnel controller pod over an 11h window.
Problem
In a homelab cluster (v0.19.0, meta+tunnel mode, one
external-jacaudi-devGateway with three attachedHTTPRoutes), thehttproute-sourcecontroller's workqueue p95 dwell sits at ~1 s under default settings:requeue_after) — the watch path is hotConcrete impact: applying a new
HTTPRoutevia Flux took ~1m40s from HTTPRoute creation to cloudflared connector receivingUpdated to new configuration. The operator-side share of that gap (reconciles + CF API push) is ~1.6 s; the remaining 98 s is Cloudflare control-plane → connector propagation, which is out of our hands. But the 1 s dwell onhttproute-sourceis the operator-side component we can tighten — and it matters more when several HTTPRoutes change in the same window (e.g. a multi-app Flux apply).Root cause: each source controller in
internal/controller/tunnel/setup.gois built withctrl.NewControllerManagedBy(mgr).Named(...).For(...).Complete(...)— no.WithOptions(controllerpkg.Options{MaxConcurrentReconciles: N}), so controller-runtime defaults to 1 concurrent reconcile per controller. Concurrent HTTPRoute events serialize.Suggested path
Expose
MaxConcurrentReconcilesper source controller via a new typedConcurrencyOptionsstruct ontunnel.Options. Zero value passes through to controller-runtime's default of 1, preserving existing behavior.Per-controller rather than a single global knob because:
httproute-sourceis hot in this cluster;service-sourceandgateway-sourceare cooler;tlsroute-sourceis often disabled.map[string]intkeyed by controller name would invite typos. A struct of named fields is type-safe and IDE-discoverable.internal/controller/tunnel/setup.goIn each
ctrl.NewControllerManagedBy(...)chain inAddToManager:No change needed to
applyOptionDefaults— zero values are valid pass-through.cmd/manager/main.goFive new fields on the
Optionsstruct + fivefs.Intflags + plumb through torunTunnel'stunnel.AddToManager(..., tunnel.Options{Concurrency: tunnel.ConcurrencyOptions{...}})call:internal/bootstrap/{config.go,deployments.go}Add the same five fields to
bootstrap.Config, append--tunnel-concurrency-<x>=Nto the spawned tunnel deployment's args only when positive (so the deployment spec stays clean at defaults — same pattern used for--tunnel-replicasetc.):chart/values.yamlMeta-operator chart template
Append five conditional args mirroring the existing
--tunnel-replicas/--tunnel-log-levelshape.withis false-equivalent on integer0, so this naturally omits the flag at the default:Tests
Three new unit tests, matching the existing table-driven style in
internal/controller/tunnel/*_test.goandcmd/manager/main_test.go:TestAddToManager_AppliesConcurrency— verifies non-zeroConcurrency.HTTPRoutepropagates asMaxConcurrentReconcileson the built controller (introspect the resulting*controller.Controller's field via the fake manager already used elsewhere insetup_test.go).TestParseFlags_TunnelConcurrency— table-driven coverage of defaults / positive values. Negative values pass through to controller-runtime (which rejects them) — preserve that, don't pre-validate in the flag parser.TestTunnelDeploymentArgs_Concurrency— verifies--tunnel-concurrency-httproute=Nappears in the spawned deployment's args only when the config field is positive.Standards adherence
map[string]intNamed()identifiers%wwrappingfmt.Errorfcalls preserved as-is{name, in, want}unusedis clean (zero-value fields are read in every builder)Options-struct-of-named-fields pattern (DefaultImage,DefaultConnector, etc.)Variants
If this is too much surface area, the minimal variant is
HTTPRouteonly — one field onConcurrencyOptions, one flag, one chart value. Same code shape, just narrower. Happy to scope down before sending a PR if preferred.Default-bump alternative
Worth considering as a separate question: should the defaults change too? Setting
httpRoute: 2or3as the chart default (not the Go zero-value default) would benefit every user without forcing them to read the metrics. Conservative path: keep all defaults at 0 (= CR default 1), document in the chart README that homelab/medium-cluster operators may want to raisehttpRoute. Happy to do whichever path you prefer.Recommendation
Happy to send the PR if this direction looks right — let me know which variant (full vs
HTTPRoute-only) and whether you'd want a default bump alongside.Measurement context: cluster running v0.19.0 with meta+tunnel mode, one tunnel CR (
network-external-jacaudi-dev), three attachedHTTPRoutes (https-redirect,jellyfin-jellyfin-jacaudi,weatherstar4000-external). Metrics pulled fromcontroller_runtime_reconcile_*andworkqueue_*series on the tunnel controller pod over an 11h window.