Parallelize microbenchmarks and run them more times#5313
Conversation
|
Thank you for updating Change log entry section 👏 Visited at: 2026-02-20 09:48:24 UTC |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 61a7f68 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
BenchmarksBenchmark execution time: 2026-03-03 11:20:54 Comparing candidate commit 61a7f68 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 0 unstable metrics.
|
|
I think the PR seems in good shape? My only question is -- I see the new jobs coming up in GitLab -- is there a way to check that the way we're exposing the files is still correct? E.g. how do we test that we haven't broken our benchmark reporting code? |
|
Hey @ivoanjo and @p-datadog, thank you for the reviews! I'll answer Oleg on the conversation thread. To answer Ivo:
Great point, and while reports on the BP UI are working as expected, PR comments from one microbenchmarking job will overwrite the other. Results have to be combined somehow. |
Like on other benchmarks
4fe6f54 to
c7a672e
Compare
|
Hi! For visibility, I re-requested reviews from all that have reviewed, since you have pointed out towards different fixes. |
ivoanjo
left a comment
There was a problem hiding this comment.
👍 LGTM
We're in the middle of a release so merging to master is blocked, it should be unblocked later today/by Monday.
At this point I don't see any reason to not to give this a try and then if there's some extra adjustment needed we'll do at as a follow-up PR.
|
#5313 (comment) is awesome. We're comparing benchmarks from this branch against master. Since there were no code changes that should impact performance, we would indeed expect to see 0 improvements/regression. Corroborates with Ruby microbenchmark stability experiments showing that 10 reps took out all flakiness. 6 reps seems to be sufficient. |
|
I asked claude to review this PR and it came up with a couple of pages of feedback, I DM'd @igoragoli to go over it since I don't understand for about half of it whether it addresses real issues or just theoretical ones. I wanted to see about DRYing the file names (which take up a large part of the diff) but claude also spotted some missing validation which seems to be at least plausible and improvements in error reporting. |
To be clear, is any of that feedback blocking? In this age of "feedback from AI is free" I think it's even more important for the line between "hey maybe this would a cool improvement" and "hey we should really fix this, it's a confusing/dangerous/weird footgun" to be clear for folks :) |
|
I am not sure - that's what i wanted to find out. I think we are all in the summit anyway at the moment. |
Adds a safety check to ensure BENCHMARKS variable is populated before
proceeding with benchmark execution. Without this validation, if the
GROUP variable is invalid or misspelled, the job would silently run
with zero benchmarks and report success.
This change prevents silent failures by:
- Validating that the evaluated BENCHMARKS_${GROUP} variable is non-empty
- Providing a clear error message identifying the invalid group name
- Failing the CI job early before attempting to run bp-runner
Example failure message: "Error: No benchmarks defined for group 'typo'"
Related to PR #5313 - Parallelize microbenchmarks and run them more times
Validates that the GROUP variable contains only alphanumeric characters and underscores before using it in an eval statement. This prevents: - Shell injection risks if GROUP contains special characters - Cryptic eval errors from malformed variable names - Better error messages for configuration mistakes Example: If someone adds GROUP: "my-new-group" (with hyphens) to the parallel matrix, the job will now fail with a clear error message explaining the format requirement, rather than succeeding with zero benchmarks or producing an eval syntax error. Related to PR #5313 - Parallelize microbenchmarks and run them more times
Ensures the DD_API_KEY is successfully retrieved from AWS SSM Parameter Store before proceeding with benchmark execution. Without this check, if the AWS command fails (due to permissions, network issues, or missing parameter), the variable would be set to an empty string and the job would continue, causing silent failures when attempting to upload results. This prevents: - Benchmark results being lost due to failed uploads - Jobs appearing successful when API key retrieval failed - Difficult debugging of upload failures The job now fails early with a clear error message if the API key cannot be retrieved. Related to PR #5313 - Parallelize microbenchmarks and run them more times
Validates that the artifacts directory is successfully created before proceeding with benchmark execution. This change addresses three issues: 1. ddprof-benchmark job: Removes dangerous `|| :` pattern that suppressed all mkdir errors, which could hide real failures like permission issues or disk full errors. 2. microbenchmarks job: Adds explicit validation that directory creation succeeded. 3. microbenchmarks-pr-comment job: Adds explicit validation that directory creation succeeded. Without this validation, if directory creation fails, the job would continue and benchmark results would be lost. The job might appear successful even though no artifacts were collected. Now the jobs fail early with a clear error message if the artifacts directory cannot be created. Related to PR #5313 - Parallelize microbenchmarks and run them more times
Validates that CI_JOB_TOKEN is set before using it in git URL configuration. If the token is empty or undefined, git config would succeed but create a malformed URL, leading to authentication failures during git clone with misleading error messages. This affects two jobs: - microbenchmarks: Uses token to clone benchmarking-platform - microbenchmarks-pr-comment: Uses token to clone benchmarking-platform Without this check, an empty CI_JOB_TOKEN would cause: - Malformed git URLs like "https://gitlab-ci-token:@gitlab.ddbuild.io/..." - Cryptic authentication errors instead of clear token validation errors - Difficult debugging of CI configuration issues The jobs now fail early with a clear error message if the token is missing. Related to PR #5313 - Parallelize microbenchmarks and run them more times
Improves error handling for git clone operations by:
1. Providing explicit error messages when clones fail
2. Separating git clone from cd command for clearer error messages
3. Including branch name in error message for easier debugging
Previously, when git clone failed, the error would appear to be about
the subsequent 'cd' command ("cd: platform: No such file or directory"),
which is misleading. The actual issue was the clone failure, not the cd.
This affects four jobs:
- .macrobenchmarks (clones ruby/gitlab branch)
- ddprof-benchmark (clones ruby/ddprof-benchmark branch)
- microbenchmarks (clones dd-trace-rb branch)
- microbenchmarks-pr-comment (clones dd-trace-rb branch)
Benefits:
- Clear error messages identifying clone failures
- Branch name in error helps identify wrong branch configurations
- Easier debugging of repository access or branch name issues
- No more misleading "directory not found" errors
Related to PR #5313 - Parallelize microbenchmarks and run them more times
Validates that CI_COMMIT_SHA is set before executing the ddprof-benchmark job. This variable is used as LATEST_COMMIT_ID to tag benchmark results in the monitoring system. While GitLab CI normally always sets CI_COMMIT_SHA automatically, this validation provides defense in depth against: - Manual job execution without proper CI context - Broken CI configurations - Edge cases in CI platform behavior Without this check, if CI_COMMIT_SHA were somehow empty, benchmark results would be tagged with an empty commit SHA, making them: - Impossible to correlate with specific commits - Orphaned in the monitoring system - Useless for tracking performance over time The job now fails early with a clear error message if the commit SHA is missing, rather than proceeding with invalid metadata. Related to PR #5313 - Parallelize microbenchmarks and run them more times
Rewrites all validation checks from the `|| (echo "..." && exit 1)` pattern to explicit if statements with proper stderr redirection. Issues with the previous pattern: 1. Parentheses create a subshell - exit 1 only exits the subshell, not the main script in some contexts 2. Error messages went to stdout instead of stderr Changes: - All validations now use `if ! command` or `if [ -z "$VAR" ]` - All error messages redirect to stderr with `>&2` - Uses multi-line YAML blocks (`|`) for readability - Eliminates subshell exit issues Affects all validation checks added in previous commits: - CI_COMMIT_SHA validation - ARTIFACTS_DIR creation validation - DD_API_KEY retrieval validation - GROUP variable format validation - BENCHMARKS variable validation - CI_JOB_TOKEN validation - Git clone error handling Related to PR #5313 - Parallelize microbenchmarks and run them more times
What does this PR do?
REPETITIONSonbenchmarks/execution.yml) to reduce inter-run variability.benchmarks/execution.ymlCPUS_PER_BENCHMARKon thebenchmarks/execution.ymljob).Motivation:
https://datadoghq.atlassian.net/browse/APMSP-2544
Change log entry
None.
Additional Notes:
How to test the change?
Execution and reporting
Reducing flakiness
The effect of multiple repetitions and CPU isolation on result variability was tested and reported in this document: https://datadoghq.atlassian.net/wiki/x/egJ3cAE
25 out of ~45 scenarios were flaky before fixes, 0 are flaky after fixes.
These tests used 10 repetitions. While this PR introduces 6 repetitions, it should already bring the flakiness down.