Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
288 changes: 259 additions & 29 deletions content/en/containers/autoscaling/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,35 +219,265 @@

- Deploy a `DatadogPodAutoscaler` custom resource.

Use your existing deploy process to target and configure Autoscaling for your workload.

{{% collapse-content title="Example DatadogPodAutoscaler CRD" level="h4" expanded=false id="id-for-anchoring" %}}
```yaml
apiVersion: datadoghq.com/v1alpha2
kind: DatadogPodAutoscaler
metadata:
name: <name, usually same as Deployment object name>
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <your Deployment name>
constraints:
# Adjust constraints as safeguards
maxReplicas: 50
minReplicas: 1
owner: Local
applyPolicy:
mode: Apply
objectives:
- type: PodResource
podResource:
name: cpu
value:
type: Utilization
utilization: 75
```
{{% /collapse-content %}}
Use your existing deploy process to target and configure Autoscaling for your workload. See the [example configurations](#example-datadogpodautoscaler-configurations) below.

### Example DatadogPodAutoscaler configurations

The following examples demonstrate common `DatadogPodAutoscaler` configurations for different scaling strategies. You can use these as starting points and adjust the values to match your workload's requirements.

{{< tabs >}}
{{% tab "Optimize Cost" %}}

The **Optimize Cost** profile uses multidimensional scaling to aggressively reduce resource waste. It sets a high CPU utilization target (85%), allows scaling down to a single replica, and uses aggressive scale-down rules for fast response to reduced load.

Check notice on line 231 in content/en/containers/autoscaling/_index.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

```yaml
apiVersion: datadoghq.com/v1alpha2
kind: DatadogPodAutoscaler
metadata:
name: <WORKLOAD_NAME>
namespace: <NAMESPACE>
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <WORKLOAD_NAME>
owner: Local
applyPolicy:
mode: Apply
scaleDown:
rules:
# Aggressive: allow 50% reduction every 2 minutes
- periodSeconds: 120
type: Percent
value: 50
stabilizationWindowSeconds: 300
strategy: Max
scaleUp:
rules:
- periodSeconds: 120
type: Percent
value: 50
stabilizationWindowSeconds: 300
strategy: Max
update:
strategy: Auto
constraints:
containers:
- enabled: true
name: '*'
maxReplicas: 100
# Allow scaling down to 1 replica for maximum savings
minReplicas: 1
objectives:
# High utilization target to maximize cost efficiency
- type: PodResource
podResource:
name: cpu
value:
type: Utilization
utilization: 85
fallback:
horizontal:
direction: ScaleUp
enabled: true
objectives:
- type: PodResource
podResource:
name: cpu
value:
type: Utilization
utilization: 85
triggers:
staleRecommendationThresholdSeconds: 600
```

{{% /tab %}}
{{% tab "Optimize Balance" %}}

The **Optimize Balance** profile provides a middle ground between cost optimization and stability. It uses a moderate CPU utilization target (70%), maintains at least 2 replicas, and applies conservative scale-down rules to avoid disruptive scaling events.

```yaml
apiVersion: datadoghq.com/v1alpha2
kind: DatadogPodAutoscaler
metadata:
name: <WORKLOAD_NAME>
namespace: <NAMESPACE>
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <WORKLOAD_NAME>
owner: Local
applyPolicy:
mode: Apply
scaleDown:
rules:
# Conservative: allow only 20% reduction every 20 minutes
- periodSeconds: 1200
type: Percent
value: 20
stabilizationWindowSeconds: 600
strategy: Max
scaleUp:
rules:
- periodSeconds: 120
type: Percent
value: 50
stabilizationWindowSeconds: 600
strategy: Max
update:
strategy: Auto
constraints:
containers:
- enabled: true
name: '*'
maxReplicas: 100
# Maintain at least 2 replicas for availability
minReplicas: 2
objectives:
# Moderate utilization target balances cost and performance
- type: PodResource
podResource:
name: cpu
value:
type: Utilization
utilization: 70
fallback:
horizontal:
direction: ScaleUp
enabled: true
objectives:
- type: PodResource
podResource:
name: cpu
value:
type: Utilization
utilization: 70
triggers:
staleRecommendationThresholdSeconds: 600
```

{{% /tab %}}
{{% tab "Vertical CPU and Memory" %}}

The **Vertical only** profile scales by adjusting CPU and memory requests and limits on existing pods, without changing the replica count. This is useful for workloads that cannot be horizontally scaled, or where you want to rightsize individual pods.

```yaml
apiVersion: datadoghq.com/v1alpha2
kind: DatadogPodAutoscaler
metadata:
name: <WORKLOAD_NAME>
namespace: <NAMESPACE>
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <WORKLOAD_NAME>
owner: Local
applyPolicy:
mode: Apply
# Horizontal scaling disabled; only vertical resizing
scaleDown:
strategy: Disabled
scaleUp:
strategy: Disabled
update:
strategy: Auto
constraints:
containers:
- enabled: true
name: '*'
maxReplicas: 100
minReplicas: 2
# No objectives defined — Datadog recommends CPU and memory
# values based on observed usage patterns
fallback:
horizontal:
direction: ScaleUp
# Horizontal fallback disabled for vertical-only scaling
enabled: false
objectives:
- type: PodResource
podResource:
name: cpu
value:
type: Utilization
utilization: 70
triggers:
staleRecommendationThresholdSeconds: 600
```

{{% /tab %}}
{{% tab "Horizontal Custom Query" %}}

The **Horizontal only with Custom Query** profile scales replica count based on a custom Datadog metric query instead of CPU or memory utilization. This is useful for workloads where application-level metrics (such as queue depth, request latency, or throughput) are better indicators of scaling need.

```yaml
apiVersion: datadoghq.com/v1alpha2
kind: DatadogPodAutoscaler
metadata:
name: <WORKLOAD_NAME>
namespace: <NAMESPACE>
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <WORKLOAD_NAME>
owner: Local
applyPolicy:
mode: Apply
scaleDown:
rules:
- periodSeconds: 1200
type: Percent
value: 20
stabilizationWindowSeconds: 600
strategy: Min
scaleUp:
rules:
- periodSeconds: 120
type: Percent
value: 50
stabilizationWindowSeconds: 600
strategy: Min
# Vertical updates disabled — horizontal only
update:
strategy: Disabled
constraints:
maxReplicas: 100
minReplicas: 2
objectives:
- type: CustomQuery
customQuery:
# Replace with your own Datadog metric query
request:
formula: usage
queries:
- name: usage
source: Metrics
metrics:
query: avg:redis.info.latency_ms{kube_cluster_name:<CLUSTER_NAME>,kube_namespace:<NAMESPACE>,kube_deployment:<WORKLOAD_NAME>}
value:
type: AbsoluteValue
absoluteValue: 500M
window: 5m0s
fallback:
horizontal:
direction: ScaleUp
enabled: false
objectives:
- type: PodResource
podResource:
name: cpu
value:
type: Utilization
utilization: 70
triggers:
staleRecommendationThresholdSeconds: 600
```

{{% /tab %}}
{{< /tabs >}}

### Deploy recommendations manually

Expand Down
Loading