Skip to content

[πŸ› Bug]: ScaledJob default scalingStrategy "accurate" (since chart 0.56.0) causes runaway node job creation; "default" no longer selectable on KEDA 2.20+Β #3167

Description

@KyriosGN0

What happened?

Chart 0.56.0 (commit 9cc943b1, #3149 β€” "KEDA 2.20.1+ unsupported job scale strategy 'default'") changed the default autoscaling.scaledJobOptions.scalingStrategy.strategy from default to accurate.

accurate has a long-known over-provisioning / "node pods stay without active session" problem for scalingType: job (#2133, #2068, #1904, kedacore/keda#4833). The historical guidance in those issues was to set strategy: default. That escape hatch is now gone: KEDA 2.20 removed default from the ScaledJob CRD scalingStrategy.strategy enum (it is now only custom | accurate | eager), so the previous fix can no longer be applied on KEDA β‰₯ 2.20.

Net effect: upgrading to chart 0.56.0 (which also moves to the KEDA 2.20.1-based image) silently switches every node ScaledJob to accurate and reintroduces the runaway, with no default fallback.

Observed in production (KEDA 2.20.1, chart 0.56.0, scalingType: job): chrome node jobs run away to maxReplicaCount (we hit our cap of 1000 pods) and never scale back to the idle baseline. During multi-hour windows with 0 active grid sessions, ~254 node pods stayed Running. Before the 0.56.0 rollout β€” same config, strategy: default β€” node counts reliably returned to a small idle baseline after each burst.

Root cause (KEDA v2.20.1 pkg/scaling/executor/scale_jobs.go)

  • default: effectiveMaxScale = maxScale βˆ’ runningJobCount
  • accurate: effectiveMaxScale = maxScale βˆ’ pendingJobCount (ignores running jobs)

pendingJobCount only counts jobs whose pod has not yet reached Running. A Selenium node pod reaches Running well before it registers with the Distributor and claims its queued session (node registration + browser startup). During that window:

  • the session is still in the New Session Queue β†’ maxScale still counts it, but
  • the job is no longer pending (pod is Running) β†’ pendingJobCount does not count it.

So accurate computes maxScale βˆ’ ~0 and, at a fast pollingInterval, KEDA keeps creating a fresh duplicate job for the same still-queued session on every poll β†’ pile-up to maxReplicaCount. default never did this because it subtracts runningJobCount, which includes those Running-but-not-yet-registered pods.

This is aggravated by large node pods on autoprovisioned nodes (long Pending β†’ Running β†’ registered path) and a small pollingInterval, but the underlying mismatch is that the Selenium scaler's queue metric is not compatible with accurate's "subtract only pending jobs" assumption β€” which is exactly the "calculation problem" the old # Change this to "accurate" when the calculation problem is fixed values comment referred to.

Suggested fix

Because default is no longer a valid enum value on KEDA β‰₯ 2.20, default the chart to custom reproducing default's formula:

scaledJobOptions:
  scalingStrategy:
    strategy: custom
    customScalingQueueLengthDeduction: 0
    customScalingRunningJobPercentage: "1"   # maxScale - 0 - runningJobCount*1.0 == maxScale - runningJobCount

custom is accepted by every supported KEDA version and, with these parameters, is byte-for-byte equivalent to the old default behavior. At minimum, the README / values comment should document this as the replacement for the previous strategy: default guidance, since existing users upgrading past KEDA 2.20 will otherwise silently regress.

Relevant log output

KEDA scaleexecutor repeatedly creating jobs while sessions are already being served; node pods remain Running with 0 active sessions.

Environment

  • Chart: selenium-grid 0.56.0 (image 4.45.0-20260606)
  • KEDA: 2.20.1
  • scalingType: job, SE_NODE_MAX_SESSIONS=1, SE_DRAIN_AFTER_SESSION_COUNT=1
  • Kubernetes: GKE 1.3x

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions