Skip to content

ci: use pod-level Kubernetes resource variables#11836

Draft
orioly13 wants to merge 2 commits into
masterfrom
ciexe/enable-pod-level-resources
Draft

ci: use pod-level Kubernetes resource variables#11836
orioly13 wants to merge 2 commits into
masterfrom
ciexe/enable-pod-level-resources

Conversation

@orioly13

@orioly13 orioly13 commented Jul 2, 2026

Copy link
Copy Markdown

Summary

Migrates dd-trace-java GitLab CI from per-container Kubernetes resource
variables to pod-level variables, as part of CIEXE-2021
(epic: CIEXE-2150).

Reference: DataDog/datadog-static-analyzer#924

What changed

  • Replaced KUBERNETES_CPU_REQUEST / KUBERNETES_MEMORY_REQUEST /
    KUBERNETES_MEMORY_LIMIT in .tier_m and .tier_l anchors with
    KUBERNETES_POD_CPU_REQUEST, KUBERNETES_POD_CPU_LIMIT,
    KUBERNETES_POD_MEMORY_REQUEST, KUBERNETES_POD_MEMORY_LIMIT.
  • Updated two native-image Gradle builds (quarkus-native,
    spring-boot-3.0-native) to read the renamed env var.
  • Budgets unchanged: tier_m = 6 CPU / 16 Gi, tier_l = 10 CPU / 20 Gi.

Behavior changes

  • Pod-level budget covers the full pod (build + helper + init containers
    share one quota) instead of stacking per-container reservations.
  • KUBERNETES_POD_CPU_LIMIT is new — no CPU limit existed before.
    Jobs lose burst headroom in exchange for tighter scheduling isolation.

Rollout order (flag must precede merge)

  1. Enable ci.gitlab-runner.enable-pod-level-resources rule v3 for
    DataDog/dd-trace-java at https://mosaic.us1.ddbuild.io/feature-flags/ci.gitlab-runner.enable-pod-level-resources?targeting-rule=v3
  2. Trigger draft-branch pipeline; inspect one tier_m + tier_l + arm64
    pod spec — confirm spec.resources shows pod-level budget, containers
    show empty resources.
  3. Mark PR ready → review → merge.

Rollback

Revert merge commit first, then disable flag — not the other way
around. Flag-off while YAML is on master = jobs run with scheduler
defaults = OOM risk on tier_l native-image jobs.

@orioly13 orioly13 added tag: no release notes Changes to exclude from release notes type: refactoring labels Jul 2, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 14.06 s 14.04 s [-0.8%; +1.1%] (no difference)
startup:insecure-bank:tracing:Agent 12.91 s 13.03 s [-1.6%; -0.2%] (maybe better)
startup:petclinic:appsec:Agent 16.87 s 16.77 s [-0.3%; +1.6%] (no difference)
startup:petclinic:iast:Agent 16.82 s 16.78 s [-0.7%; +1.2%] (no difference)
startup:petclinic:profiling:Agent 16.30 s 16.83 s [-7.6%; +1.2%] (no difference)
startup:petclinic:sca:Agent 16.30 s 16.66 s [-6.5%; +2.2%] (no difference)
startup:petclinic:tracing:Agent 16.13 s 15.62 s [-0.9%; +7.4%] (no difference)

Commit: dfb8bc9a · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@orioly13

orioly13 commented Jul 2, 2026

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

Reviewed commit: 7e69ce22c7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

orioly13 added 2 commits July 2, 2026 16:33
Replace per-container KUBERNETES_CPU_REQUEST / KUBERNETES_MEMORY_*
with pod-level KUBERNETES_POD_* vars in both tier_m and tier_l
anchors. Two native-image Gradle builds are updated to read the
renamed env var for CPU parallelism sizing.

Behavior changes:
- Resources now budget the full pod (build + helper + init
  containers share a single quota) instead of only the build
  container, reducing the effective per-job cluster footprint.
- KUBERNETES_POD_CPU_LIMIT added (no CPU limit existed before);
  jobs lose burst headroom in exchange for tighter scheduling
  isolation. Tier_m: 6 CPU / 16Gi. Tier_l: 10 CPU / 20Gi.

Feature flag ci.gitlab-runner.enable-pod-level-resources (rule v3)
must be enabled for this repo before merging to master. Draft PR
opened for pod-spec validation on the flag-enabled branch first.

Refs: CIEXE-2021, CIEXE-2150
Runner requires string type for KUBERNETES_POD_* vars.
Match exact format used in reference PR924.
@orioly13 orioly13 force-pushed the ciexe/enable-pod-level-resources branch from 7e69ce2 to dfb8bc9 Compare July 2, 2026 14:34
@PerfectSlayer PerfectSlayer added the comp: tooling Build & Tooling label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: tooling Build & Tooling tag: no release notes Changes to exclude from release notes type: refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants