Skip to content

tunnel: expose per-controller MaxConcurrentReconciles for source reconcilers #134

@jacaudi

Description

@jacaudi

Problem

In a homelab cluster (v0.19.0, meta+tunnel mode, one external-jacaudi-dev Gateway with three attached HTTPRoutes), the httproute-source controller's workqueue p95 dwell sits at ~1 s under default settings:

controller_runtime_reconcile_total{controller="httproute-source",result="success"} 245
controller_runtime_reconcile_total{controller="httproute-source",result="requeue_after"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="httproute-source",le="0.005"} ≈ p95
workqueue_queue_duration_seconds_bucket{controller="httproute-source",le="1.0"} ≈ p95
  • 245 reconciles in ~11 h, all event-driven (zero requeue_after) — the watch path is hot
  • Each reconcile completes in ~5 ms (p95) — work itself is fast
  • But workqueue dwell is ~1 s p95 — events sit in the queue waiting for the single reconcile worker

Concrete impact: applying a new HTTPRoute via Flux took ~1m40s from HTTPRoute creation to cloudflared connector receiving Updated to new configuration. The operator-side share of that gap (reconciles + CF API push) is ~1.6 s; the remaining 98 s is Cloudflare control-plane → connector propagation, which is out of our hands. But the 1 s dwell on httproute-source is the operator-side component we can tighten — and it matters more when several HTTPRoutes change in the same window (e.g. a multi-app Flux apply).

Root cause: each source controller in internal/controller/tunnel/setup.go is built with ctrl.NewControllerManagedBy(mgr).Named(...).For(...).Complete(...) — no .WithOptions(controllerpkg.Options{MaxConcurrentReconciles: N}), so controller-runtime defaults to 1 concurrent reconcile per controller. Concurrent HTTPRoute events serialize.

Suggested path

Expose MaxConcurrentReconciles per source controller via a new typed ConcurrencyOptions struct on tunnel.Options. Zero value passes through to controller-runtime's default of 1, preserving existing behavior.

Per-controller rather than a single global knob because:

  • Load profiles differ — httproute-source is hot in this cluster; service-source and gateway-source are cooler; tlsroute-source is often disabled.
  • Per go-standards §1 ("Compile-time over runtime — prefer named types and enums over stringly-typed APIs"), a map[string]int keyed by controller name would invite typos. A struct of named fields is type-safe and IDE-discoverable.

internal/controller/tunnel/setup.go

import (
    // …existing imports…
    controllerpkg "sigs.k8s.io/controller-runtime/pkg/controller"
)

// ConcurrencyOptions controls MaxConcurrentReconciles per controller in the
// tunnel bundle. A zero value passes through to controller-runtime's default
// of 1, preserving pre-feature behavior. Raise per controller to drain
// event-driven workqueues faster when many objects change concurrently.
//
// Setting any field > 1 makes the reconcilers in that controller execute in
// parallel, which is safe — controller-runtime serializes per-object
// reconciles via the workqueue regardless of MaxConcurrentReconciles.
type ConcurrencyOptions struct {
    // Tunnel sets MaxConcurrentReconciles for the CloudflareTunnel controller.
    Tunnel int
    // Service sets MaxConcurrentReconciles for the service-source controller.
    Service int
    // Gateway sets MaxConcurrentReconciles for the gateway-source controller.
    Gateway int
    // HTTPRoute sets MaxConcurrentReconciles for the httproute-source controller.
    HTTPRoute int
    // TLSRoute sets MaxConcurrentReconciles for the tlsroute-source controller.
    TLSRoute int
}

type Options struct {
    // …existing fields unchanged…

    // Concurrency configures MaxConcurrentReconciles per controller. Zero
    // values defer to controller-runtime's default of 1.
    Concurrency ConcurrencyOptions
}

In each ctrl.NewControllerManagedBy(...) chain in AddToManager:

// --- CloudflareTunnel reconciler ----------------------------------------
if err := ctrl.NewControllerManagedBy(mgr).
    WithOptions(controllerpkg.Options{MaxConcurrentReconciles: opts.Concurrency.Tunnel}).
    For(&v2alpha1.CloudflareTunnel{}).
    Complete(tunnelR); err != nil {
    return fmt.Errorf("setup CloudflareTunnel: %w", err)
}

// --- ServiceSource reconciler -------------------------------------------
if err := ctrl.NewControllerManagedBy(mgr).
    Named("service-source").
    WithOptions(controllerpkg.Options{MaxConcurrentReconciles: opts.Concurrency.Service}).
    For(&corev1.Service{}).
    Owns(&v2alpha1.CloudflareDNSRecord{}).
    Watches(&v2alpha1.CloudflareTunnel{}, handler.EnqueueRequestsFromMapFunc(tunnelToServices(mgr))).
    Complete(svcR); err != nil {
    return fmt.Errorf("setup ServiceSource: %w", err)
}

// …same WithOptions(...) on GatewaySource, HTTPRouteSource, TLSRouteSource…

No change needed to applyOptionDefaults — zero values are valid pass-through.

cmd/manager/main.go

Five new fields on the Options struct + five fs.Int flags + plumb through to runTunnel's tunnel.AddToManager(..., tunnel.Options{Concurrency: tunnel.ConcurrencyOptions{...}}) call:

tcTunnel    := fs.Int("tunnel-concurrency-tunnel",    0, "MaxConcurrentReconciles for the CloudflareTunnel controller (0 = controller-runtime default of 1)")
tcService   := fs.Int("tunnel-concurrency-service",   0, "MaxConcurrentReconciles for the service-source controller (0 = controller-runtime default of 1)")
tcGateway   := fs.Int("tunnel-concurrency-gateway",   0, "MaxConcurrentReconciles for the gateway-source controller (0 = controller-runtime default of 1)")
tcHTTPRoute := fs.Int("tunnel-concurrency-httproute", 0, "MaxConcurrentReconciles for the httproute-source controller (0 = controller-runtime default of 1)")
tcTLSRoute  := fs.Int("tunnel-concurrency-tlsroute",  0, "MaxConcurrentReconciles for the tlsroute-source controller (0 = controller-runtime default of 1)")

internal/bootstrap/{config.go,deployments.go}

Add the same five fields to bootstrap.Config, append --tunnel-concurrency-<x>=N to the spawned tunnel deployment's args only when positive (so the deployment spec stays clean at defaults — same pattern used for --tunnel-replicas etc.):

if c.TunnelConcurrencyHTTPRoute > 0 {
    args = append(args, fmt.Sprintf("--tunnel-concurrency-httproute=%d", c.TunnelConcurrencyHTTPRoute))
}
// …same for tunnel, service, gateway, tlsroute…

chart/values.yaml

controllers:
  tunnel:
    # -- Per-controller MaxConcurrentReconciles. Zero (the default) defers to
    # controller-runtime's default of 1. Raise per controller to drain
    # event-driven workqueues faster when many objects of that kind change
    # concurrently. The `httpRoute` knob is the most likely to need raising
    # in clusters with many HTTPRoutes — measured workqueue p95 dwell on
    # `httproute-source` can reach ~1s under the default. Other source
    # controllers and the tunnel reconciler itself are typically fast enough
    # at the default; raise only on evidence.
    concurrency:
      tunnel: 0
      service: 0
      gateway: 0
      httpRoute: 0
      tlsRoute: 0

Meta-operator chart template

Append five conditional args mirroring the existing --tunnel-replicas / --tunnel-log-level shape. with is false-equivalent on integer 0, so this naturally omits the flag at the default:

{{- with .Values.controllers.tunnel.concurrency.tunnel }}
- --tunnel-concurrency-tunnel={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.service }}
- --tunnel-concurrency-service={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.gateway }}
- --tunnel-concurrency-gateway={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.httpRoute }}
- --tunnel-concurrency-httproute={{ . }}
{{- end }}
{{- with .Values.controllers.tunnel.concurrency.tlsRoute }}
- --tunnel-concurrency-tlsroute={{ . }}
{{- end }}

Tests

Three new unit tests, matching the existing table-driven style in internal/controller/tunnel/*_test.go and cmd/manager/main_test.go:

  1. TestAddToManager_AppliesConcurrency — verifies non-zero Concurrency.HTTPRoute propagates as MaxConcurrentReconciles on the built controller (introspect the resulting *controller.Controller's field via the fake manager already used elsewhere in setup_test.go).
  2. TestParseFlags_TunnelConcurrency — table-driven coverage of defaults / positive values. Negative values pass through to controller-runtime (which rejects them) — preserve that, don't pre-validate in the flag parser.
  3. TestTunnelDeploymentArgs_Concurrency — verifies --tunnel-concurrency-httproute=N appears in the spawned deployment's args only when the config field is positive.

Standards adherence

go-standards reference How
§1 Compile-time over runtime Named struct fields per controller, not map[string]int
§1 Convention over magic Field names mirror existing controller Named() identifiers
§4 Errors with %w wrapping All fmt.Errorf calls preserved as-is
§8 Table-driven tests All three new tests structured as {name, in, want}
§8.5 golangci-lint No new lint exceptions; unused is clean (zero-value fields are read in every builder)
Existing-project override Matches the established Options-struct-of-named-fields pattern (DefaultImage, DefaultConnector, etc.)

Variants

If this is too much surface area, the minimal variant is HTTPRoute only — one field on ConcurrencyOptions, one flag, one chart value. Same code shape, just narrower. Happy to scope down before sending a PR if preferred.

Default-bump alternative

Worth considering as a separate question: should the defaults change too? Setting httpRoute: 2 or 3 as the chart default (not the Go zero-value default) would benefit every user without forcing them to read the metrics. Conservative path: keep all defaults at 0 (= CR default 1), document in the chart README that homelab/medium-cluster operators may want to raise httpRoute. Happy to do whichever path you prefer.

Recommendation

Happy to send the PR if this direction looks right — let me know which variant (full vs HTTPRoute-only) and whether you'd want a default bump alongside.


Measurement context: cluster running v0.19.0 with meta+tunnel mode, one tunnel CR (network-external-jacaudi-dev), three attached HTTPRoutes (https-redirect, jellyfin-jellyfin-jacaudi, weatherstar4000-external). Metrics pulled from controller_runtime_reconcile_* and workqueue_* series on the tunnel controller pod over an 11h window.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions