Skip to content

Fix Cilium e2e metric readiness#23638

Draft
nubtron wants to merge 1 commit intomasterfrom
enrico-donnici_ddog/cilium-repro-20260506111628
Draft

Fix Cilium e2e metric readiness#23638
nubtron wants to merge 1 commit intomasterfrom
enrico-donnici_ddog/cilium-repro-20260506111628

Conversation

@nubtron
Copy link
Copy Markdown
Contributor

@nubtron nubtron commented May 8, 2026

What does this PR do?

Adds an adaptive readiness gate to the Cilium e2e fixture. After the existing Cilium API limiter warmup (cilium endpoint list), the fixture now polls Cilium's own metric registry until the raw metric families backing the historically missing Datadog metrics are available.

This targets intermittent Test Agent release failures like:

Needed at least 1 candidates for 'cilium.forward_bytes.count', got 0

Motivation

Kubernetes pod/deployment readiness is not enough for this fixture: Cilium can be Ready while event-driven metric families such as cilium_forward_bytes_total have not yet appeared. The recent CI path removed incidental warmup delay, making this race easier to hit.

Rather than adding a fixed sleep (sensitive to noisy CI runners) or introducing a larger workload/traffic fixture, this waits directly on Cilium's metrics surface using:

cilium metrics list --match-pattern <family>

Testing

Ran formatting/lint:

cd ddev
hatch run -- ddev --no-interactive test -fs cilium

Reproduced the failure locally before the fix with warm --new-env and rc.3 agent. Verified after the fix:

hatch run -- ddev --no-interactive env test \
  --new-env \
  --agent registry.datadoghq.com/agent:7.79.0-rc.3 \
  cilium py3.13-1.11 \
  -- -k 'test_check_ok and not fips'

Additional local verification:

  • py3.13-1.11: 3/3 warm attempts passed
  • py3.13-1.10: 1/1 warm attempt passed
  • py3.13-1.9: 1/1 warm attempt passed

@nubtron nubtron added the qa/skip-qa Automatically skip this PR for the next QA label May 8, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 8, 2026

Validation Report

All 20 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@datadog-prod-us1-5
Copy link
Copy Markdown

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 16.67%
Overall Coverage: 74.90% (-12.35%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 5941beb | Docs | Datadog PR Page | Give us feedback!

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 16.66667% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.01%. Comparing base (891aa96) to head (5941beb).
⚠️ Report is 16 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration/cilium qa/skip-qa Automatically skip this PR for the next QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant