Skip to content

bug: fluent operator is consuming lot of cpu cores (~27 of the 32 available cores on a Node) #1990

@btalakola

Description

@btalakola

Describe the issue

When deploying Fluent Operator on a Kubernetes cluster, the operator process consumes a very high number of CPU cores (~27 out of 32 available cores on a single Node). This is far beyond expected usage and causes resource contention.

Deleting the fluent pod does not change the high cpu behavior once new pod is recreated. Need assistance identifying cause of the high cpu load, thanks.

fluent-operator-df4555f76-ht6f2.log

2025-09-15T04:49:19Z INFO setup starting manager
2025-09-15T04:49:19Z INFO controller-runtime.metrics Starting metrics server
2025-09-15T04:49:19Z INFO starting server {"name": "health probe", "addr": "[::]:8081"}
2025-09-15T04:49:19Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}
2025-09-15T04:49:19Z INFO Starting EventSource {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBit"}

Trace[833216769]: [208.550494ms] [208.550494ms] END
I0915 04:49:45.438100 1 trace.go:236] Trace[442998739]: "DeltaFIFO Pop Process" ID:XXXXX,Depth:16,Reason:slow event handlers blocking the queue (15-Sep-2025 04:49:45.204) (total time: 186ms):
Trace[442998739]: [186.678635ms] [186.678635ms] END
I0915 04:49:45.830454 1 trace.go:236] Trace[989326455]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.30.3/tools/cache/reflector.go:232 (15-Sep-2025 04:49:19.799) (total time: 25965ms):
Trace[989326455]: ---"Objects listed" error: 25832ms (04:49:45.631)
Trace[989326455]: [25.965573843s] [25.965573843s] END
2025-09-15T04:57:02Z INFO controllers.FluentBitConfig Fluent Bit main configuration has updated {"logging-control-plane": "monitoring", "fluentbitconfig": "fluent-bit-config", "secret": "fluent-bit-config"}

Deployed fluent-operator 3.4.0 in "operators" namespace using fluent-operator helm charts. Below is the configurations:

Fluent Operator configurations

fluent-operator:
  enabled: true
  fluent-operator:
    containerRuntime: containerd
    operator:
       # The init container is to get the actual storage path of the docker log files so that it can be mounted to collect the logs.
       # see https://github.com/fluent/fluent-operator/blob/master/manifests/setup/fluent-operator-deployment.yaml#L26
      logPath:
        # The operator currently assumes a Docker container runtime path for the logs as the default, for other container runtimes you can set the location explicitly below.
        containerd: /var/log/containers/*.log
      disableComponentControllers: "fluentd"
      

    fluentbit:
       # Installs a sub chart carrying the CRDs for the fluent-bit controller. The sub chart is enabled by default.
      crdsEnable: true
      enable: false

    fluentd:
       # Installs a sub chart carrying the CRDs for the fluentd controller. The sub chart is enabled by default.
      crdsEnable: false

Deployed fluent-bit in "monitoring" namespace using fluent-operator helm charts. Below is the configurations:

Fluent bit configurations

fluent-operator:
  containerRuntime: containerd
  operator:
    enable: false
    disableComponentControllers: "fluentd"
      

    fluentbit:
       # Installs a sub chart carrying the CRDs for the fluent-bit controller. The sub chart is enabled by default.
      enable: true

Please let me if you want me provide any other details.
Old bug ref : #1717

To Reproduce

Deploy Fluent Operator version v3.4.0 on a Kubernetes cluster.

Observe CPU usage on the Node where the operator is running.

Notice that the operator consumes ~27 cores out of 32.

Expected behavior

Fluent Operator should run normally with minimal CPU usage, not consuming nearly all available cores.

Your Environment

Fluent Operator version: 3.4.0

Container Runtime: containerd

Operating System: Ubuntu 22.04



Kubernetes version: 1.35

How did you install fluent operator?

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions