Model-level circuit breaker for media generation

## Problem

When a provider's model becomes unhealthy (stuck predictions, repeated timeouts, infrastructure issues), every subsequent request to that model will hang for up to 10 minutes before timing out. This wastes time, API credits, and provides a poor user experience — especially when the failure is systemic and predictable after the first few attempts.

Recent debugging of a stuck Replicate image generation request revealed that:
- A prediction stayed in "processing" state indefinitely on Replicate's side
- The system polled for 300+ attempts with no timeout enforcement
- No error was recorded, no feedback was given to the user
- Every subsequent request to the same model would have hit the same issue

We've since improved error classification, timeout handling, and provider error tracking (see recent commits). But we're still missing the ability to **fail fast** when a model is known to be unhealthy.

## Proposal: Model-level circuit breaker

### Why model-level?

- **Provider-level is too broad.** `flux-1.1-pro` being stuck on Replicate shouldn't block `minimax/video-01` which also runs on Replicate.
- **Key-level is too narrow.** The existing auto-disable system handles bad credentials. A circuit breaker is about operational health — the model infrastructure is misbehaving regardless of which key is used.
- **Model mapping is the natural unit.** Requests route through model alias → model mapping → provider + model ID. This is the granularity at which failures are correlated.

### Circuit breaker states

```
Closed (healthy)
  │
  ├─ failure threshold exceeded
  ▼
Open (failing fast)
  │
  ├─ cooldown elapsed
  ▼
Half-Open (probing)
  │
  ├─ probe succeeds → Closed
  └─ probe fails → Open (reset cooldown)
```

### Proposed behavior

| State | Behavior |
|-------|----------|
| **Closed** | Requests proceed normally. Failures are counted. |
| **Open** | Requests fail immediately with HTTP 503: "Model temporarily unavailable due to repeated failures. Retry after {cooldown}s." No polling, no API calls, no wasted credits. |
| **Half-Open** | One probe request is allowed through. If it succeeds, circuit closes. If it fails, circuit reopens with reset cooldown. |

### Suggested defaults (configurable)

| Parameter | Image Generation | Video Generation |
|-----------|-----------------|-----------------|
| Failure threshold | 3 failures in 5 minutes | 2 failures in 10 minutes |
| Open duration (cooldown) | 30 seconds | 60 seconds |
| Half-open max probes | 1 | 1 |
| Success threshold to close | 1 | 1 |

### Failure criteria

A "failure" for circuit breaker purposes includes:
- `RequestTimeoutException` (polling timeout exceeded)
- `ServiceUnavailableException` (provider 503)
- `LLMCommunicationException` with 5xx status codes
- Network errors (`HttpRequestException`)

**Not** counted as failures (these are request-specific, not model health issues):
- `InvalidRequestException` (bad prompt, content policy)
- `RateLimitExceededException` (handled by backoff/retry)
- `InvalidApiKey` / `InsufficientBalance` (handled by key auto-disable)
- `ModelNotFoundException` (configuration issue)

## Implementation notes

### State storage

Use `IDistributedCache` (Redis) so circuit state is shared across scaled instances. Follows existing cache key conventions:

```
circuit:model:{mappingId}:state     → "closed" | "open" | "half-open"
circuit:model:{mappingId}:failures  → failure count with sliding window
circuit:model:{mappingId}:opened_at → timestamp when circuit opened
```

### Integration point

The natural check point is during model mapping lookup — every request already goes through `GetMappingByModelAliasAsync`. The circuit breaker check can be added to `CachedModelProviderMappingService` or as a wrapper that checks circuit state before returning the mapping.

Alternatively, it could live in the controller layer (ImagesController, VideosController) after mapping resolution but before client creation — this keeps the mapping service clean and makes the circuit breaker explicit.

### Existing infrastructure to leverage

- **`RedisCircuitBreaker`** — already exists for cache operations, could inform the pattern
- **`CacheKeys`** — established key naming conventions
- **`ProviderErrorTrackingService`** — already tracks errors per provider/key; circuit breaker could consume these events
- **`OperationTimeoutProvider`** — already has per-operation timeout configuration; circuit breaker config could follow the same pattern
- **`ExceptionToResponseMapper`** — already maps `ServiceUnavailableException` to HTTP 503

### Related: reduce `MaxPollingDuration` for image generation

The current 10-minute `MaxPollingDuration` in `ReplicateClient` applies to all prediction types. Image generation should have a much shorter timeout (60 seconds is reasonable — most image models complete in 10-30 seconds). Video generation legitimately takes longer. This could be:
- A parameter passed to `PollPredictionUntilCompletedAsync`
- Derived from `OperationTimeoutProvider` configuration
- Set per media type in the orchestrator/controller

## Open questions

1. **Should the circuit breaker emit events?** Publishing a `ModelCircuitOpened` / `ModelCircuitClosed` event via MassTransit would allow the WebAdmin to show real-time model health status and could trigger notifications.
2. **Admin override?** Should there be an Admin API endpoint to manually close/open a circuit (e.g., after a known provider outage is resolved)?
3. **Metrics?** Should circuit state changes be recorded as Prometheus metrics for Grafana dashboards?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model-level circuit breaker for media generation #898

Problem

Proposal: Model-level circuit breaker

Why model-level?

Circuit breaker states

Proposed behavior

Suggested defaults (configurable)

Failure criteria

Implementation notes

State storage

Integration point

Existing infrastructure to leverage

Related: reduce `MaxPollingDuration` for image generation

Open questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

State	Behavior
Closed	Requests proceed normally. Failures are counted.
Open	Requests fail immediately with HTTP 503: "Model temporarily unavailable due to repeated failures. Retry after {cooldown}s." No polling, no API calls, no wasted credits.
Half-Open	One probe request is allowed through. If it succeeds, circuit closes. If it fails, circuit reopens with reset cooldown.

Parameter	Image Generation	Video Generation
Failure threshold	3 failures in 5 minutes	2 failures in 10 minutes
Open duration (cooldown)	30 seconds	60 seconds
Half-open max probes	1	1
Success threshold to close	1	1

Model-level circuit breaker for media generation #898

Description

Problem

Proposal: Model-level circuit breaker

Why model-level?

Circuit breaker states

Proposed behavior

Suggested defaults (configurable)

Failure criteria

Implementation notes

State storage

Integration point

Existing infrastructure to leverage

Related: reduce MaxPollingDuration for image generation

Open questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Related: reduce `MaxPollingDuration` for image generation