Spot/preemptible node recreation causes scaled-to-zero services to restart (re: #12538)

## Description

When a spot/preemptible GPU node is preempted and GKE creates a replacement, **all** Knative services with existing revisions attempt to schedule pods — including services that were scaled to zero before the preemption event.

This is a continuation of #12538 (closed as stale, never fixed).

## Environment

- GKE regional cluster (us-central1) with spot GPU node pool (max 1 node)
- Knative Serving (latest)
- Multiple Knative services sharing a single GPU node via custom extended resource (`yubot.io/gpu-mem`)
- Services configured with `min-scale: "0"`, `initial-scale: "0"`

## Steps to Reproduce

1. Deploy multiple Knative services on a spot GPU node, each requesting a portion of GPU memory
2. Some services are actively serving (pods running), others are scaled to zero (no pods)
3. Wait for GKE to preempt the spot node
4. GKE provisions a replacement spot node
5. **All** services — including those that were at zero replicas — attempt to schedule pods simultaneously

## Expected Behavior

Services that were scaled to zero (no pods) before preemption should remain at zero after the replacement node comes up. Only services that had active pods should be restored.

## Actual Behavior

All services with existing revisions attempt to create pods, regardless of their pre-preemption replica count. This causes resource contention — on a shared GPU node, low-priority services (e.g., trainers) can grab GPU memory slots before high-priority services (e.g., inference), leaving critical services stuck in `Pending`.

## Evidence

After node recreation, all pods started at the exact same timestamp:
```
image-trainer-00004   2026-03-11T04:51:27Z  (was scaled to 0 before preemption)
llm-trainer-00004     2026-03-11T04:51:27Z  (was scaled to 0 before preemption)
stt-00005             2026-03-11T04:51:27Z  (was running before preemption)
tts-00006             2026-03-11T04:51:27Z  (was running before preemption)
llm-00008             Pending               (was running, now blocked by trainers)
```

All trainer revisions were created with `initial-scale: "0"` and had no active pods before the preemption. They should not have restarted.

## Workarounds

- **PriorityClass**: Assign higher priority to critical services so K8s preempts low-priority pods. This mitigates but doesn't prevent the unnecessary scheduling.
- **Delete ksvc when not in use**: Services with no ksvc object can't be resurrected. But this defeats the purpose of scale-to-zero.

## Root Cause Hypothesis

Knative's reconciler does not persist or respect the "scaled to zero" state across node disruptions. When pods are evicted, the revision controller sees "desired replicas > 0" (from the last reconciliation before scale-down) and recreates pods, rather than checking the autoscaler's current decision.

/kind bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spot/preemptible node recreation causes scaled-to-zero services to restart (re: #12538) #16449

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence

Workarounds

Root Cause Hypothesis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spot/preemptible node recreation causes scaled-to-zero services to restart (re: #12538) #16449

Description

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence

Workarounds

Root Cause Hypothesis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions