/area autoscale
/area API
/kind spec
What version of Knative?
Knative Serving v1.20.0 (observed). Root-cause code is unchanged on main and in the latest
release knative-v1.22.1, so all released versions through v1.22.1 are affected.
Related (not duplicates): #11531, #2674, #11373.
Expected Behavior
Per the scale-bounds docs,
initial-scale is "the initial target scale a Revision must reach ... before it is marked as Ready."
A new Revision should not become Ready, should not be promoted to
Configuration.status.latestReadyRevisionName, and should not receive route traffic until it reaches
its initial-scale. Traffic should stay on the previous, fully-scaled Revision until then.
Actual Behavior
A new Revision can be marked Ready=True and latched as latestReadyRevisionName while still far below
its initial-scale, so the Route shifts 100% of traffic to an under-provisioned Revision. Because
latestReadyRevisionName is monotonic, the route never reverts — even after the Revision later flips
Ready=False (e.g. ProgressDeadlineExceeded). In production this abandoned a healthy fully-scaled
Revision for a new one running at ~15% of target replicas, which then returned 503/504s.
Root cause (same on release-1.20 and main):
-
Revision readiness is not gated on initial-scale —
pkg/apis/serving/v1/revision_lifecycle.go:
revisionCondSet = apis.NewLivingConditionSet(ResourcesAvailable, ContainerHealthy) (no
ScaleTargetInitialized/Active term).
-
ResourcesAvailable comes from the Deployment's Progressing condition, not Available —
pkg/reconciler/revision/reconcile_resources.go PropagateDeploymentStatus →
TransformDeploymentStatus reads {Progressing, ReplicaSetReady}, never appsv1.DeploymentAvailable.
So ResourcesAvailable=True at any replica count, and ContainerHealthy=True once ReadyReplicas > 0.
-
The only step that holds Ready back below initial-scale (PropagateAutoscalerStatus in reconcilePA,
which pulls ResourcesAvailable back to Unknown while !IsScaleTargetInitialized()) is skipped
when an earlier phase errors. The reconcile is an early-return phase loop
(pkg/reconciler/revision/revision.go):
for _, phase := range []func(context.Context, *v1.Revision) error{
c.reconcileDeployment, // sets ResourcesAvailable=True (Progressing) + ContainerHealthy=True
c.reconcileImageCache, // if this errors, the loop returns...
c.reconcilePA, // ...so reconcilePA (which would reset ResourcesAvailable) is SKIPPED
} {
if err := phase(ctx, rev); err != nil { return err }
}
A transient reconcileImageCache error (e.g. createImageCache AlreadyExists from image-lister lag,
common under heavy concurrent-write churn) persists the below-initial-scale Ready=True. Then
pkg/reconciler/configuration/configuration.go findAndSetLatestReadyRevision advances
latestReadyRevisionName to it and never reverts (monotonic); the Route follows.
(Note: a reconcileDeployment Update conflict does NOT trigger this — it returns before
PropagateDeploymentStatus, so ResourcesAvailable is never set True that cycle.)
Steps to Reproduce the Problem
In production this fires stochastically under churn. To make it deterministic, the steps below
(a) wedge a new Revision below initial-scale and (b) force reconcileImageCache to error via a
ResourceQuota (a stand-in for the transient AlreadyExists). Requires image caching enabled
(caching.internal.knative.dev Image CRD + controller).
- Label 3 schedulable nodes and create a namespace:
kubectl label node repro=true
kubectl create ns repro
- Deploy a healthy baseline Revision at scale 1, pinned to the labeled nodes; wait until Ready:
apiVersion: serving.knative.dev/v1
kind: Service
metadata: { name: rt, namespace: repro }
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/initial-scale: "1"
autoscaling.knative.dev/min-scale: "1"
autoscaling.knative.dev/max-scale: "1"
spec:
nodeSelector: { repro: "true" }
containers:
- image: ghcr.io/knative/helloworld-go:latest
env: [{ name: TARGET, value: "v1" }]
- Cap the namespace's Image count at its current value so the next Revision's image cache can't be created:
kubectl -n repro create quota capimg
--hard=count/images.caching.internal.knative.dev=$(kubectl -n repro get images.caching.internal.knative.dev --no-headers | wc -l)
- Roll a new Revision that wants initial-scale=6 but can only schedule 2 pods (one per labeled node;
one node is held by the baseline) — wedging it at 2/6:
re-apply the same Service with:
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/initial-scale: "6"
autoscaling.knative.dev/min-scale: "6"
autoscaling.knative.dev/max-scale: "6"
spec:
nodeSelector: { repro: "true" }
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels: { serving.knative.dev/service: rt }
containers:
- image: ghcr.io/knative/helloworld-go:latest
env: [{ name: TARGET, value: "v2" }] # forces a new Revision
5. Observe the new Revision become Ready=True and capture the route while below initial-scale:
kubectl -n repro get revision # new rev READY=True
kubectl -n repro get pods # 2 Running, 4 Pending (wedged at 2/6)
kubectl -n repro get images.caching.internal.knative.dev # only the baseline's image exists
kubectl -n repro get configuration rt -o jsonpath='{.status.latestReadyRevisionName}' # -> new rev
kubectl -n repro get route rt -o jsonpath='{.status.traffic}' # -> 100% new rev
- The new Revision is Ready=True at 2/6 replicas with no image cache; latestReadyRevisionName and the
Route both point at it, and never revert to the fully-scaled baseline.
Suggested fix
Make readiness reflect replica sufficiency: include the Deployment's Available condition in
TransformDeploymentStatus, or gate Revision Ready / findAndSetLatestReadyRevision advancement on the
PodAutoscaler's ScaleTargetInitialized (enforcing the documented initial-scale semantics). Either
prevents latestReadyRevisionName from advancing to an under-scaled Revision.
/area autoscale
/area API
/kind spec
What version of Knative?
Knative Serving v1.20.0 (observed). Root-cause code is unchanged on
mainand in the latestrelease
knative-v1.22.1, so all released versions through v1.22.1 are affected.Related (not duplicates): #11531, #2674, #11373.
Expected Behavior
Per the scale-bounds docs,
initial-scaleis "the initial target scale a Revision must reach ... before it is marked as Ready."A new Revision should not become
Ready, should not be promoted toConfiguration.status.latestReadyRevisionName, and should not receive route traffic until it reachesits initial-scale. Traffic should stay on the previous, fully-scaled Revision until then.
Actual Behavior
A new Revision can be marked
Ready=Trueand latched aslatestReadyRevisionNamewhile still far belowits initial-scale, so the Route shifts 100% of traffic to an under-provisioned Revision. Because
latestReadyRevisionNameis monotonic, the route never reverts — even after the Revision later flipsReady=False(e.g.ProgressDeadlineExceeded). In production this abandoned a healthy fully-scaledRevision for a new one running at ~15% of target replicas, which then returned 503/504s.
Root cause (same on
release-1.20andmain):Revision readiness is not gated on initial-scale —
pkg/apis/serving/v1/revision_lifecycle.go:revisionCondSet = apis.NewLivingConditionSet(ResourcesAvailable, ContainerHealthy)(noScaleTargetInitialized/Activeterm).ResourcesAvailablecomes from the Deployment's Progressing condition, not Available —pkg/reconciler/revision/reconcile_resources.goPropagateDeploymentStatus→TransformDeploymentStatusreads{Progressing, ReplicaSetReady}, neverappsv1.DeploymentAvailable.So
ResourcesAvailable=Trueat any replica count, andContainerHealthy=TrueonceReadyReplicas > 0.The only step that holds Ready back below initial-scale (
PropagateAutoscalerStatusinreconcilePA,which pulls
ResourcesAvailableback toUnknownwhile!IsScaleTargetInitialized()) is skippedwhen an earlier phase errors. The reconcile is an early-return phase loop
(
pkg/reconciler/revision/revision.go):Steps to Reproduce the Problem
In production this fires stochastically under churn. To make it deterministic, the steps below
(a) wedge a new Revision below initial-scale and (b) force reconcileImageCache to error via a
ResourceQuota (a stand-in for the transient AlreadyExists). Requires image caching enabled
(caching.internal.knative.dev Image CRD + controller).
kubectl label node repro=true
kubectl create ns repro
apiVersion: serving.knative.dev/v1
kind: Service
metadata: { name: rt, namespace: repro }
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/initial-scale: "1"
autoscaling.knative.dev/min-scale: "1"
autoscaling.knative.dev/max-scale: "1"
spec:
nodeSelector: { repro: "true" }
containers:
- image: ghcr.io/knative/helloworld-go:latest
env: [{ name: TARGET, value: "v1" }]
kubectl -n repro create quota capimg
--hard=count/images.caching.internal.knative.dev=$(kubectl -n repro get images.caching.internal.knative.dev --no-headers | wc -l)
one node is held by the baseline) — wedging it at 2/6:
re-apply the same Service with:
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/initial-scale: "6"
autoscaling.knative.dev/min-scale: "6"
autoscaling.knative.dev/max-scale: "6"
spec:
nodeSelector: { repro: "true" }
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels: { serving.knative.dev/service: rt }
containers:
- image: ghcr.io/knative/helloworld-go:latest
env: [{ name: TARGET, value: "v2" }] # forces a new Revision
5. Observe the new Revision become Ready=True and capture the route while below initial-scale:
kubectl -n repro get revision # new rev READY=True
kubectl -n repro get pods # 2 Running, 4 Pending (wedged at 2/6)
kubectl -n repro get images.caching.internal.knative.dev # only the baseline's image exists
kubectl -n repro get configuration rt -o jsonpath='{.status.latestReadyRevisionName}' # -> new rev
kubectl -n repro get route rt -o jsonpath='{.status.traffic}' # -> 100% new rev
Route both point at it, and never revert to the fully-scaled baseline.
Suggested fix
Make readiness reflect replica sufficiency: include the Deployment's Available condition in
TransformDeploymentStatus, or gate Revision Ready / findAndSetLatestReadyRevision advancement on the
PodAutoscaler's ScaleTargetInitialized (enforcing the documented initial-scale semantics). Either
prevents latestReadyRevisionName from advancing to an under-scaled Revision.