Skip to content

perf(profiling): separate queue and waking thread#3611

Draft
morrisonlevi wants to merge 6 commits intomasterfrom
levi/messages
Draft

perf(profiling): separate queue and waking thread#3611
morrisonlevi wants to merge 6 commits intomasterfrom
levi/messages

Conversation

@morrisonlevi
Copy link
Collaborator

@morrisonlevi morrisonlevi commented Jan 30, 2026

Description

On master, we send every sample to another thread and attempt to wake that thread to process the sample. The problem is the syscall here is almost as expensive as collecting the sample if the other thread is asleep, which is often the case in NTS builds. Here's a profile showing this (but note it's not always this close):

Screenshot 2026-01-30 at 4 30 44 PM

This git branch avoids that. Instead it writes to a queue and samples are handled when that background thread wakes up for other reasons. Notably one of those reasons is every 10ms when wall-time is enabled (default) so probably this should work pretty well without filling the queues. The queues are larger on ZTS builds, though by a fixed multiplier (16).


Unfortunately, thus far I haven't been able to improve things. On my tests, I'm getting an average of 129.07% CPU vs the baseline version of the tracer having 127% CPU on the same benchmark. Using the ebpf full host profiler, the profiles show reduced time spent in prepare_and_send_message, from 191ms to 14ms. There's no obvious place that the CPU time shifted to. Here are some places I checked (the data is from one run, I did multiple runs in the past, this isn't an isolate result):

  • TimeCollector::run, down from 1.08s to 845ms.
  • collect_stack_sample, down from 260ms to 140ms.

The PR barely touches anything else.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

On every sample we send data to another thread and attempt to wake that
thread. The problem is the syscall here is almost as expensive as
collecting the sample if the other thread is asleep, which is often the
case.

This branch avoids that. Instead it writes to a queue and samples are
handled when that background thread wakes up for other reasons. Notably
one of those reasons is every 10ms when wall-time is enabled (default) so
_probably_ this should work pretty well without filling the queues.
@morrisonlevi morrisonlevi added cat:performance profiling Relates to the Continuous Profiler labels Jan 30, 2026
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jan 30, 2026

⚠️ Tests

Fix all issues with Cursor

⚠️ Warnings

🧪 1024 Tests failed

    testSearchPhpBinaries from integration.DDTrace\Tests\Integration\PHPInstallerTest (Fix with Cursor)

    testSimplePushAndProcess from laravel-58-test.DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest (Fix with Cursor)

testSimplePushAndProcess from laravel-8x-test.DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 698105fa000000003f67fb15bab07bb3
tid: 698105fa00000000
hexProcessTraceId: 3f67fb15bab07bb3
hexProcessSpanId: e83a07cc71a0ac0a
processTraceId: 4568896417735670707
processSpanId: 16733695940109577226
View all

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a401846 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.20%. Comparing base (f7f9271) to head (a401846).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3611      +/-   ##
==========================================
- Coverage   62.24%   62.20%   -0.05%     
==========================================
  Files         141      141              
  Lines       13387    13387              
  Branches     1753     1753              
==========================================
- Hits         8333     8327       -6     
- Misses       4255     4263       +8     
+ Partials      799      797       -2     

see 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7f9271...a401846. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Under sustained load in ZTS, this could otherwise run forever.
@pr-commenter
Copy link

pr-commenter bot commented Jan 30, 2026

Benchmarks [ profiler ]

Benchmark execution time: 2026-02-02 20:19:12

Comparing candidate commit a401846 in PR branch levi/messages with baseline commit f7f9271 in branch master.

Found 10 performance improvements and 1 performance regressions! Performance is the same for 20 metrics, 5 unstable metrics.

scenario:php-profiler-exceptions-control

  • 🟥 cpu_user_time [+3.424ms; +7.658ms] or [+6.406%; +14.330%]

scenario:php-profiler-exceptions-with-profiler

  • 🟩 cpu_usage_percentage [-4.014%; -3.529%]

scenario:php-profiler-exceptions-with-profiler-and-timeline

  • 🟩 cpu_usage_percentage [-3.961%; -3.320%]

scenario:php-profiler-timeline-memory-with-profiler

  • 🟩 cpu_system_time [-230.609ms; -207.922ms] or [-79.231%; -71.437%]
  • 🟩 cpu_usage_percentage [-50.569%; -49.011%]
  • 🟩 cpu_user_time [-459.730ms; -417.027ms] or [-39.400%; -35.740%]
  • 🟩 execution_time [-190.703ms; -171.048ms] or [-20.381%; -18.281%]

scenario:php-profiler-timeline-memory-with-profiler-and-timeline

  • 🟩 cpu_system_time [-397.609ms; -368.814ms] or [-78.364%; -72.689%]
  • 🟩 cpu_usage_percentage [-57.514%; -56.134%]
  • 🟩 cpu_user_time [-738.219ms; -716.610ms] or [-46.735%; -45.367%]
  • 🟩 execution_time [-370.376ms; -359.978ms] or [-30.268%; -29.419%]

This reverts commit 07df26e.

This was just an experiment, but it turns out that yes, the run time
cache is valuable. We had not run it on doe before, since doe is a
newer tool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:performance profiling Relates to the Continuous Profiler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants