diff --git a/content/en/containers/autoscaling/_index.md b/content/en/containers/autoscaling/_index.md index f71da88bd01..38b8837439b 100644 --- a/content/en/containers/autoscaling/_index.md +++ b/content/en/containers/autoscaling/_index.md @@ -219,35 +219,265 @@ When you are ready to proceed with enabling Autoscaling for a workload, you have - Deploy a `DatadogPodAutoscaler` custom resource. - Use your existing deploy process to target and configure Autoscaling for your workload. - - {{% collapse-content title="Example DatadogPodAutoscaler CRD" level="h4" expanded=false id="id-for-anchoring" %}} - ```yaml - apiVersion: datadoghq.com/v1alpha2 - kind: DatadogPodAutoscaler - metadata: - name: - spec: - targetRef: - apiVersion: apps/v1 - kind: Deployment - name: - constraints: - # Adjust constraints as safeguards - maxReplicas: 50 - minReplicas: 1 - owner: Local - applyPolicy: - mode: Apply - objectives: - - type: PodResource - podResource: - name: cpu - value: - type: Utilization - utilization: 75 - ``` - {{% /collapse-content %}} + Use your existing deploy process to target and configure Autoscaling for your workload. See the [example configurations](#example-datadogpodautoscaler-configurations) below. + +### Example DatadogPodAutoscaler configurations + +The following examples demonstrate common `DatadogPodAutoscaler` configurations for different scaling strategies. You can use these as starting points and adjust the values to match your workload's requirements. + +{{< tabs >}} +{{% tab "Optimize Cost" %}} + +The **Optimize Cost** profile uses multidimensional scaling to aggressively reduce resource waste. It sets a high CPU utilization target (85%), allows scaling down to a single replica, and uses aggressive scale-down rules for fast response to reduced load. + +```yaml +apiVersion: datadoghq.com/v1alpha2 +kind: DatadogPodAutoscaler +metadata: + name: + namespace: +spec: + targetRef: + apiVersion: apps/v1 + kind: Deployment + name: + owner: Local + applyPolicy: + mode: Apply + scaleDown: + rules: + # Aggressive: allow 50% reduction every 2 minutes + - periodSeconds: 120 + type: Percent + value: 50 + stabilizationWindowSeconds: 300 + strategy: Max + scaleUp: + rules: + - periodSeconds: 120 + type: Percent + value: 50 + stabilizationWindowSeconds: 300 + strategy: Max + update: + strategy: Auto + constraints: + containers: + - enabled: true + name: '*' + maxReplicas: 100 + # Allow scaling down to 1 replica for maximum savings + minReplicas: 1 + objectives: + # High utilization target to maximize cost efficiency + - type: PodResource + podResource: + name: cpu + value: + type: Utilization + utilization: 85 + fallback: + horizontal: + direction: ScaleUp + enabled: true + objectives: + - type: PodResource + podResource: + name: cpu + value: + type: Utilization + utilization: 85 + triggers: + staleRecommendationThresholdSeconds: 600 +``` + +{{% /tab %}} +{{% tab "Optimize Balance" %}} + +The **Optimize Balance** profile provides a middle ground between cost optimization and stability. It uses a moderate CPU utilization target (70%), maintains at least 2 replicas, and applies conservative scale-down rules to avoid disruptive scaling events. + +```yaml +apiVersion: datadoghq.com/v1alpha2 +kind: DatadogPodAutoscaler +metadata: + name: + namespace: +spec: + targetRef: + apiVersion: apps/v1 + kind: Deployment + name: + owner: Local + applyPolicy: + mode: Apply + scaleDown: + rules: + # Conservative: allow only 20% reduction every 20 minutes + - periodSeconds: 1200 + type: Percent + value: 20 + stabilizationWindowSeconds: 600 + strategy: Max + scaleUp: + rules: + - periodSeconds: 120 + type: Percent + value: 50 + stabilizationWindowSeconds: 600 + strategy: Max + update: + strategy: Auto + constraints: + containers: + - enabled: true + name: '*' + maxReplicas: 100 + # Maintain at least 2 replicas for availability + minReplicas: 2 + objectives: + # Moderate utilization target balances cost and performance + - type: PodResource + podResource: + name: cpu + value: + type: Utilization + utilization: 70 + fallback: + horizontal: + direction: ScaleUp + enabled: true + objectives: + - type: PodResource + podResource: + name: cpu + value: + type: Utilization + utilization: 70 + triggers: + staleRecommendationThresholdSeconds: 600 +``` + +{{% /tab %}} +{{% tab "Vertical CPU and Memory" %}} + +The **Vertical only** profile scales by adjusting CPU and memory requests and limits on existing pods, without changing the replica count. This is useful for workloads that cannot be horizontally scaled, or where you want to rightsize individual pods. + +```yaml +apiVersion: datadoghq.com/v1alpha2 +kind: DatadogPodAutoscaler +metadata: + name: + namespace: +spec: + targetRef: + apiVersion: apps/v1 + kind: Deployment + name: + owner: Local + applyPolicy: + mode: Apply + # Horizontal scaling disabled; only vertical resizing + scaleDown: + strategy: Disabled + scaleUp: + strategy: Disabled + update: + strategy: Auto + constraints: + containers: + - enabled: true + name: '*' + maxReplicas: 100 + minReplicas: 2 + # No objectives defined — Datadog recommends CPU and memory + # values based on observed usage patterns + fallback: + horizontal: + direction: ScaleUp + # Horizontal fallback disabled for vertical-only scaling + enabled: false + objectives: + - type: PodResource + podResource: + name: cpu + value: + type: Utilization + utilization: 70 + triggers: + staleRecommendationThresholdSeconds: 600 +``` + +{{% /tab %}} +{{% tab "Horizontal Custom Query" %}} + +The **Horizontal only with Custom Query** profile scales replica count based on a custom Datadog metric query instead of CPU or memory utilization. This is useful for workloads where application-level metrics (such as queue depth, request latency, or throughput) are better indicators of scaling need. + +```yaml +apiVersion: datadoghq.com/v1alpha2 +kind: DatadogPodAutoscaler +metadata: + name: + namespace: +spec: + targetRef: + apiVersion: apps/v1 + kind: Deployment + name: + owner: Local + applyPolicy: + mode: Apply + scaleDown: + rules: + - periodSeconds: 1200 + type: Percent + value: 20 + stabilizationWindowSeconds: 600 + strategy: Min + scaleUp: + rules: + - periodSeconds: 120 + type: Percent + value: 50 + stabilizationWindowSeconds: 600 + strategy: Min + # Vertical updates disabled — horizontal only + update: + strategy: Disabled + constraints: + maxReplicas: 100 + minReplicas: 2 + objectives: + - type: CustomQuery + customQuery: + # Replace with your own Datadog metric query + request: + formula: usage + queries: + - name: usage + source: Metrics + metrics: + query: avg:redis.info.latency_ms{kube_cluster_name:,kube_namespace:,kube_deployment:} + value: + type: AbsoluteValue + absoluteValue: 500M + window: 5m0s + fallback: + horizontal: + direction: ScaleUp + enabled: false + objectives: + - type: PodResource + podResource: + name: cpu + value: + type: Utilization + utilization: 70 + triggers: + staleRecommendationThresholdSeconds: 600 +``` + +{{% /tab %}} +{{< /tabs >}} ### Deploy recommendations manually