[n8n] Fix metric mappings and add full v2 metric coverage by AAraKKe · Pull Request #23635 · DataDog/integrations-core

AAraKKe · 2026-05-08T10:53:57Z

What does this PR do?

Overhauls the n8n integration's metric mapping, test environment, fixtures, and public documentation so the check matches what n8n actually emits across both tested major versions.

Metric map and metadata: fixes the incorrect workflow_executions_duration_seconds mapping, removes mappings that n8n does not emit, keeps valid runtime/queue metrics, and adds the missing families verified against n8n 1.118.1 and n8n 2.19.5.
n8n 2.x metric coverage: adds the 2.x-only families for workflow_execution_duration_seconds, audit_workflow_*, embed_login_*, token_exchange_*, process_pss_bytes, and the optional workflow_statistics_* gauges. The metadata descriptions call out metrics that require n8n 2.x or an opt-in n8n flag.
Worker scrape coverage: adds a worker instance to the test environment so worker-only families such as node_started, node_finished, queue_job_dequeued, and runner_task_requested are covered. Main and worker instances are tagged with n8n_process:main and n8n_process:worker.
Queue-mode test environment: runs n8n in queue mode with Redis and validates both n8n 1.118.1 and 2.19.5. The compose file pulls n8nio/n8n:${N8N_VERSION} directly, and host ports are allocated dynamically to avoid CI/local port conflicts.
Stable E2E setup: imports workflows with stable IDs, activates them, generates traffic, and waits for workflow metrics during docker_run setup conditions. This keeps the dynamic port configuration intact and avoids running setup work again during teardown.
Rare-event metric handling: keeps real but timing/event-dependent metrics in mapping and metadata, including auth failure counters and the libuv nodejs.active.requests gauge, while excluding them from live symmetric assertions that cannot reliably force those events at scrape time. Unit fixtures include synthetic samples for these metrics.
Readiness behavior: continues to emit n8n.readiness.check, but no longer gates the OpenMetrics scrape on the readiness endpoint. This preserves metric flow when readiness degrades while the OpenMetrics health service check still reports scrape failures.
Documentation: updates the public README with customer-facing n8n configuration guidance, queue-mode worker scraping instructions, required n8n environment variables, and version-specific metric notes.

Motivation

Issue #23633 reported that the integration exposed the wrong Datadog metric name for n8n workflow execution duration. Validating the integration against live n8n containers showed a broader gap: some mapped metrics were invented or stale, several real metrics were missing, and the test environment did not exercise queue mode, worker metrics, or version-specific metric differences.

This PR makes the integration empirically grounded and keeps coverage for both the older supported n8n line and the current 2.x line.

Validation

ddev test -fs n8n
ddev validate config -s n8n
ddev validate models -s n8n
ddev validate metadata n8n
ddev validate readmes n8n
ddev --no-interactive test n8n
ddev env test --dev n8n py3.13-1
ddev env test --dev n8n py3.13-2
ddev validate all n8n was also run. It did not report n8n-owned validation failures; the remaining failures were unrelated global labeler state for rate_limiter and a network timeout during licenses.

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

- Drop fabricated metric names that n8n never emitted; map only what is empirically present. - Add the n8n 2.x metric families: workflow.execution.duration histogram, audit.workflow.*, embed.login.*, token.exchange.*, process.pss.bytes, runner.task.requested, and the workflow_statistics gauges. - Add worker-only families (node.started, node.finished, queue.job.dequeued, runner.task.requested) by introducing a worker-scrape instance. - Stop gating the OpenMetrics scrape on /healthz/readiness; emit n8n.readiness.check unconditionally so metrics still flow when the readiness endpoint is unhealthy. - Replace the custom Dockerfile with a direct n8nio/n8n image reference and parameterise the version via hatch.toml so the test matrix can run against both 1.118.1 and 2.19.5. - Allocate free host ports via datadog_checks.dev.utils.find_free_ports and forward them through docker_run env_vars to avoid port collisions on re-runs.

dd-octo-sts · 2026-05-08T10:55:29Z

⚠️ The qa/skip-qa label has been added with shippable changes

The following files, which will be shipped with the agent, were modified in this PR and
the qa/skip-qa label has been added.

You can ignore this if you are sure the changes in this PR do not require QA. Otherwise,
consider removing the label.

List of modified files that will be shipped with the agent

n8n/changelog.d/23635.added
n8n/datadog_checks/n8n/check.py
n8n/datadog_checks/n8n/data/conf.yaml.example
n8n/datadog_checks/n8n/metrics.py
n8n/hatch.toml

codecov · 2026-05-08T11:07:37Z

Codecov Report

❌ Patch coverage is 94.91525% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.03%. Comparing base (1befb90) to head (43e7fc8).

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…REFIX

datadog-prod-us1-3 · 2026-05-08T12:37:01Z

✨ Fix all issues with BitsAI or with Cursor

⚠️ Warnings

❄️ 2 New flaky tests detected

test_all_metadata_metrics_emitted from test_integration.py (Fix with Cursor)
workflow_started_total never went non-zero
test_readiness_check_metric from test_integration.py (Fix with Cursor)
workflow_started_total never went non-zero

ℹ️ Info

No other issues found (see more)

🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 94.92%
• Overall Coverage: 95.85% (+8.60%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 43e7fc8 | Docs | Datadog PR Page | Give us feedback!}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1be3b3dc6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T12:50:09Z

    'queue_job_completed': 'queue.job.completed',
-    'queue_job_delayed_total': 'queue.job.delayed.total',
    'queue_job_dequeued': 'queue.job.dequeued',
    'queue_job_enqueued': 'queue.job.enqueued',
    'queue_job_failed': 'queue.job.failed',


Add the stalled queue counter mapping

In n8n 2.x queue mode, a stalled job event is emitted as n8n.queue.job.stalled, which the Prometheus service exposes as n8n_queue_job_stalled_total; with this map limited to completed/dequeued/enqueued/failed, that counter is silently ignored even when message event bus metrics are enabled. Since this block adds queue-job coverage, include queue_job_stalled (and corresponding metadata) so stalled jobs are collected.

Useful? React with 👍 / 👎.

A long-running n8n simulation that layers on top of the integration test environment so a real Datadog Agent can ship metrics to a Datadog org for dashboard / monitor iteration. - tests/lab/workflows/: five lab-only workflow JSONs covering distinct shapes (fast, slow Wait node, always-fail Code, flaky 30%, four-step chain). - tests/lab/traffic_generator.py: click CLI (start/generate/stop) that runs ddev env start --base, copies + imports + activates the lab workflows, restarts n8n, and drives a configurable async traffic mix against the webhooks and REST API. - tests/lab/config.yaml: webhook + REST probabilities and tick / reload intervals; hot-reloaded while the generator runs. - tests/lab/.ddev.toml: pins the lab to an `n8nlab` ddev org. - tests/lab/run_lab.sh: bash entrypoint with an EXIT trap so Ctrl+C always runs lab:stop. - hatch.toml: new [envs.lab] env with click/httpx/pyyaml/rich and start/generate/stop scripts.

dd-octo-sts · 2026-05-08T13:12:48Z

Validation Report

All 20 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and Codecov settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 8, 2026

AAraKKe requested review from a team as code owners May 8, 2026 10:53

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 8, 2026

dd-octo-sts Bot added documentation integration/n8n team/agent-integrations team/documentation labels May 8, 2026

Add changelog for PR #23635

2991295

AAraKKe marked this pull request as draft May 8, 2026 11:07

AAraKKe added 2 commits May 8, 2026 14:27

Refine n8n metric coverage and e2e setup

12f3122

Document raw_metric_prefix requirement when customizing N8N_METRICS_P…

8523188

…REFIX

Reformat changelog so towncrier renders sub-bullets correctly

1be3b3d

AAraKKe marked this pull request as ready for review May 8, 2026 12:44

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

AAraKKe marked this pull request as draft May 8, 2026 12:59

Add missing n8n event metric mappings

43e7fc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[n8n] Fix metric mappings and add full v2 metric coverage#23635

[n8n] Fix metric mappings and add full v2 metric coverage#23635
AAraKKe wants to merge 7 commits intomasterfrom
aarakke/fix-n8n-metrics

AAraKKe commented May 8, 2026 •

edited

Loading

Uh oh!

dd-octo-sts Bot commented May 8, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 8, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-3 Bot commented May 8, 2026 •

edited by datadog-prod-us1-4 Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

dd-octo-sts Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AAraKKe commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Validation

Review checklist

Uh oh!

dd-octo-sts Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

datadog-prod-us1-3 Bot commented May 8, 2026 • edited by datadog-prod-us1-4 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented May 8, 2026

Validation Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AAraKKe commented May 8, 2026 •

edited

Loading

dd-octo-sts Bot commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented May 8, 2026 •

edited by datadog-prod-us1-4 Bot

Loading