Pool memqueue producer response channels via sync.Pool by Copilot · Pull Request #49337 · elastic/beats

Copilot · 2026-03-07T00:32:40Z

Summary

Pool and reuse chan queue.EntryID response channels in the memqueue producer with sync.Pool, matching the existing batchPool pattern in broker.go. This eliminates the per-publish channel allocation that dominated the allocation profile (~96% of alloc_space on BenchmarkProducerThroughput).

Changes

Add respChanPool (sync.Pool) producing make(chan queue.EntryID, 1)
makePushRequest acquires from pool via respChanPool.Get().(chan queue.EntryID)
publish and tryPublish return channels via defer respChanPool.Put(req.resp) — safe because handlePendingResponse fully drains the channel before returning

Benchmark

BenchmarkProducerThroughput  (before)  ~10,007 allocs/op  ~1,120,530 B/op
BenchmarkProducerThroughput  (after)        ~5 allocs/op       ~230 B/op

Test plan

All existing memqueue tests pass
All queue tests pass (memqueue + diskqueue)

🤖 Generated with Claude Code

mergify · 2026-03-07T00:33:23Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @Copilot? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
backport-active-all is the label that automatically backports to all active branches.
backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

coderabbitai · 2026-03-07T00:54:18Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces a package-level sync.Pool that provides reusable buffered chan queue.EntryID channels. Producers (forgetfulProducer and ackProducer) obtain response channels from the pool instead of allocating per request. openState.publish and openState.tryPublish defer returning non-nil req.resp to the pool, relying on existing draining behavior. Channel lifecycle and reuse are centralized around the open-state publish paths. No public APIs or exported signatures were changed.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	All coding requirements from issue `#49192` are met: sync.Pool pooling implemented, response channels reused across producers, defer-based lifecycle management in publish/tryPublish, and benchmarks confirm 99.95% allocation reduction.
Out of Scope Changes check	✅ Passed	All changes are scoped to the memqueue producer channel pooling objective. No unrelated modifications detected in the affected file.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch copilot/reduce-memqueue-allocations

🛠️ Update Documentation: Commit on current branch
🛠️ Update Documentation: Create PR

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libbeat/publisher/queue/memqueue/produce.go`:
- Around line 151-152: The publish function currently defers
putRespChan(req.resp) immediately, which can return the response channel to the
pool while the queue goroutine may still send to it (race between st.events <-
req and handlePendingResponse), causing a pooled channel to receive a stale
EntryID; remove the early defer in openState.publish and instead call
putRespChan(req.resp) only on the error/early-return paths where the request is
not enqueued (i.e., when you know the queue goroutine will not send), and let
the consumer of the request (the queue goroutine/handler) be responsible for
returning req.resp to the pool after it has sent or closed the response; update
paths that call handlePendingResponse and any return branches to avoid returning
the channel to the pool when st.events <- req succeeded.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8403320f-eba1-45f2-99e6-674c31af592e

📥 Commits

Reviewing files that changed from the base of the PR and between 13108e8 and e98267e.

📒 Files selected for processing (1)

libbeat/publisher/queue/memqueue/produce.go

strawgate · 2026-03-14T00:46:46Z

/test

strawgate · 2026-03-26T16:17:39Z

Profiling / Benchmarking confirm that the allocation is gone. In our Filebeat perf benchmark of about 10m events, it decreases total allocations by a little over 1%. https://buildkite.com/elastic/filebeat-benchmark/builds/2282

In the baseline makePushRequest shows up as 2,576,698 allocations (1.05%) — in the PR version it's completely gone (0 flat allocs).

strawgate · 2026-03-27T18:55:15Z

/test

elasticmachine · 2026-03-28T15:00:28Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

PR elastic#49337 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mergify · 2026-04-10T13:30:45Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b copilot/reduce-memqueue-allocations upstream/copilot/reduce-memqueue-allocations
git merge upstream/main
git push upstream copilot/reduce-memqueue-allocations

Rebase onto current main (post queue-generics refactor #49954) and adapt the respChanPool optimization to the new generic types. No logic change: respChanPool.Get/Put wraps chan queue.EntryID exactly as before; makePushRequest on both forgetfulProducer[T] and ackProducer[T] now calls getRespChan() and publish/tryPublish defer respChanPool.Put. Benchmarks confirm: 10009 → 7 allocs/op on BenchmarkProducerThroughput (Apple M2 Pro, arm64). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

strawgate · 2026-04-17T14:54:56Z

Rebased onto current main to resolve the merge conflict from the queue generics refactor (#49954). The pool optimization required adapting all type references to use [T any] — forgetfulProducer[T], ackProducer[T], openState[T], pushRequest[T] — but the logic is identical. Re-ran BenchmarkProducerThroughput to confirm the win still holds on the rebased branch: 10,009 → 7 allocs/op (Apple M2 Pro, arm64). The optimization is still valid.

Two tests and one benchmark addition: - TestRespChanAlwaysEmptyOnAcquire: verifies the safety invariant that channels returned to the pool by a completed publish are always empty when subsequently acquired. A stale buffered EntryID would corrupt the return value of the next publish that reuses the channel. - TestRespChanPoolNoAllocs: verifies 0 heap allocs per getRespChan/Put cycle after warmup, confirming channels are genuinely reused. GC is disabled during measurement (debug.SetGCPercent(-1)) to make the result deterministic. - BenchmarkProducerThroughput: add b.ReportAllocs() to surface the alloc reduction in CI output. With pool: 6-7 allocs/op for 10,000 events; without pool: ~10,009 allocs/op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

strawgate · 2026-04-17T15:09:20Z

Added tests for the pool optimization:

TestRespChanAlwaysEmptyOnAcquire — correctness invariant: a channel returned to the pool by a completed publish must always be empty on next acquisition. A stale buffered EntryID would silently corrupt the return value of the next publish reusing that channel.
TestRespChanPoolNoAllocs — directly proves pool reuse: getRespChan() + pool.Put produces 0 heap allocs after warmup. GC is disabled during measurement (debug.SetGCPercent(-1)) to make the result deterministic. Without the pool this would be 1 alloc per call.
BenchmarkProducerThroughput — added b.ReportAllocs(). CI will now surface the alloc numbers: 6–7 allocs/op for a full 10,000-event batch vs. ~10,009 allocs/op on main.

The previous two tests were testing pool mechanics in isolation (raw Get/Put and empty-channel checks) rather than the actual publish path. TestRespChanAlwaysEmptyOnAcquire in particular was unreliable: getRespChan() after a publish might return a freshly allocated channel via New rather than a recycled one, making it trivially pass even with a broken pool. Replace both with TestPublishPoolNoAllocsInSteadyState, which runs testing.AllocsPerRun on the real p.Publish path (with a live queue and background consumer). This directly validates the PR's claim: with the pool warm and GC disabled, Publish allocates 0 channels per op. Without the pool, this would be 1 alloc/op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-17T15:37:49Z

TL;DR

golangci-lint failed in all matrix jobs (ubuntu, macos, windows) on the same two testifylint findings in memqueue tests. Update those two assertions, then re-run CI.

Remediation

In libbeat/publisher/queue/memqueue/produce_pool_test.go:95, replace the float direct-compare assertion with assert.InEpsilon (or assert.InDelta) to satisfy testifylint float-compare.
In libbeat/publisher/queue/memqueue/queue_test.go:316, fix the require.Nilf format call: the message has %v but no arg. Add the missing index argument (or remove %v).
Push the fix commit and re-run the PR checks (golangci-lint workflow).

Investigation details

Root Cause

The golangci-lint step fails due to two deterministic testifylint violations in test code; this is not platform-specific, which is why all three OS jobs fail identically.

Evidence

Workflow: https://github.com/elastic/beats/actions/runs/24572208781
Job/step: lint (ubuntu-latest) → golangci-lint (same errors also in lint (macos-latest) and lint (windows-latest))
Key log excerpt:
- libbeat/publisher/queue/memqueue/produce_pool_test.go:95:2: float-compare: use assert.InEpsilon (or InDelta) (testifylint)
- libbeat/publisher/queue/memqueue/queue_test.go:316:3: formatter: require.Nilf format %v reads arg #1, but call has 0 args (testifylint)

Validation

Not run locally in this workflow context (read-only detective run).

Follow-up

After the two assertion fixes, re-trigger CI; if lint still fails, capture the updated golangci-lint output and compare against this report.

What is this? | From workflow: PR Actions Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Copilot AI assigned Copilot and strawgate Mar 7, 2026

Copilot started work on behalf of strawgate March 7, 2026 00:32 View session

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 7, 2026

Copilot AI changed the title ~~[WIP] Reduce memqueue producer allocations by pooling response channels~~ Pool memqueue producer response channels via sync.Pool Mar 7, 2026

Copilot finished work on behalf of strawgate March 7, 2026 00:39

strawgate marked this pull request as ready for review March 7, 2026 00:50

strawgate requested a review from a team as a code owner March 7, 2026 00:50

strawgate requested review from khushijain21 and leehinman March 7, 2026 00:50

strawgate added the skip-changelog label Mar 7, 2026

coderabbitai bot reviewed Mar 7, 2026

View reviewed changes

Comment thread libbeat/publisher/queue/memqueue/produce.go Outdated

leehinman requested review from faec and removed request for leehinman March 9, 2026 14:33

strawgate enabled auto-merge (squash) March 26, 2026 22:01

pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Mar 28, 2026

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 28, 2026

faec reviewed Mar 31, 2026

View reviewed changes

Comment thread libbeat/publisher/queue/memqueue/produce.go Outdated

Comment thread libbeat/publisher/queue/memqueue/produce.go

Comment thread libbeat/publisher/queue/memqueue/produce.go

Comment thread libbeat/publisher/queue/memqueue/produce.go Outdated

strawgate mentioned this pull request Mar 31, 2026

[performance-profiler] Reduce memqueue producer allocations by pooling response channels #49192

Closed

strawgate force-pushed the copilot/reduce-memqueue-allocations branch from e0bb981 to c7de87a Compare April 9, 2026 15:03

strawgate added a commit to strawgate/beats that referenced this pull request Apr 9, 2026

perf: pool memqueue producer response channels via sync.Pool

855232e

PR elastic#49337 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

strawgate mentioned this pull request Apr 9, 2026

[dnm] perf: combined processor, encoding, and pipeline optimizations #49723

Draft

strawgate force-pushed the copilot/reduce-memqueue-allocations branch from c7de87a to c18400f Compare April 17, 2026 14:54

strawgate requested a review from faec April 17, 2026 15:00

Conversation

Copilot AI commented Mar 7, 2026 • edited by strawgate Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Benchmark

Test plan

Uh oh!

mergify bot commented Mar 7, 2026

Uh oh!

coderabbitai bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

strawgate commented Mar 14, 2026

Uh oh!

strawgate commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strawgate commented Mar 27, 2026

Uh oh!

elasticmachine commented Mar 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Apr 10, 2026

Uh oh!

strawgate commented Apr 17, 2026

Uh oh!

strawgate commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

TL;DR

Remediation

Root Cause

Evidence

Validation

Follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Copilot AI commented Mar 7, 2026 •

edited by strawgate

Loading

coderabbitai bot commented Mar 7, 2026 •

edited

Loading

strawgate commented Mar 26, 2026 •

edited

Loading