You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add measured pods feature for accurate resource measurement on isolated nodes
Implements DPTP-4613 to address the issue where pod-scaler recommendations are
skewed by node contention. When multiple pods with poor CPU configurations are
scheduled on the same node, CPU is maxed out but pods finish eventually. The
pod-scaler observes low CPU utilization (due to node contention) and incorrectly
concludes requests should not be increased, leading to a cycle of reduced limits
and tighter packing.
This change introduces a measured pods system:
- Pods are classified as 'normal' or 'measured' based on whether they need fresh
measurement data (measured if last measurement >10 days ago or never measured)
- Measured pods use podAntiAffinity rules to run on isolated nodes with no other
CI workloads, ensuring accurate CPU/memory utilization measurement
- BigQuery integration queries and caches max CPU/memory utilization from measured
pod runs, refreshing daily to keep data current
- Resource recommendations are applied only to the longest-running container
in each pod, using actual measured utilization data instead of Prometheus
metrics that may be skewed by node contention
The feature is opt-in via --enable-measured-pods flag and requires BigQuery
configuration (--bigquery-project-id and --bigquery-dataset-id).
fs.Int64Var(&o.cpuCap, "cpu-cap", 10, "The maximum CPU request value, ex: 10")
90
97
fs.StringVar(&o.memoryCap, "memory-cap", "20Gi", "The maximum memory request value, ex: '20Gi'")
91
98
fs.Int64Var(&o.cpuPriorityScheduling, "cpu-priority-scheduling", 8, "Pods with CPU requests at, or above, this value will be admitted with priority scheduling")
99
+
fs.BoolVar(&o.enableMeasuredPods, "enable-measured-pods", false, "Enable measured pods feature. When enabled, pods are classified as 'normal' or 'measured' and measured pods run on isolated nodes to get accurate CPU/memory utilization data.")
100
+
fs.StringVar(&o.bigQueryProjectID, "bigquery-project-id", "", "Google Cloud project ID for BigQuery queries (required if enable-measured-pods is true)")
101
+
fs.StringVar(&o.bigQueryDatasetID, "bigquery-dataset-id", "", "BigQuery dataset ID for pod metrics (required if enable-measured-pods is true)")
102
+
fs.StringVar(&o.bigQueryCredentialsFile, "bigquery-credentials-file", "", "Path to Google Cloud credentials file for BigQuery access")
0 commit comments