Skip to content

perf: pool FetchResponse objects to reduce consumer GC pressure#581

Closed
thomhurst wants to merge 5 commits intomainfrom
perf/pool-fetch-response-objects
Closed

perf: pool FetchResponse objects to reduce consumer GC pressure#581
thomhurst wants to merge 5 commits intomainfrom
perf/pool-fetch-response-objects

Conversation

@thomhurst
Copy link
Owner

Summary

  • Pool FetchResponse, FetchResponseTopic, and FetchResponsePartition using ConcurrentStack<T>-based pools (matching the existing RecordBatch pooling pattern), eliminating per-fetch-cycle allocations of these envelope objects
  • Cascade return-to-pool from FetchResponse.ReturnToPool() through topics and partitions, clearing all fields to prevent reference leaks while preserving Records ownership transfer to PendingFetchData
  • Add unit tests verifying pool reuse, field clearing, cascading return, overflow handling, and concurrent thread-safety

Pool sizes: 64 for FetchResponse, 256 for FetchResponseTopic, 1024 for FetchResponsePartition -- scaled to typical consumer cardinality per fetch cycle.

Test plan

  • Unit tests pass: FetchResponsePoolingTests (10 tests covering rent/return, field clearing, cascading, overflow, concurrency)
  • Integration tests pass with Docker (consumer fetch cycles reuse pooled objects correctly)
  • Verify no regression in existing protocol encoding tests

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Pool FetchResponse Objects to Reduce Consumer GC Pressure

Overall, the approach is sound and follows the established RecordBatch pooling pattern. The cascade return-to-pool design and the use of ConcurrentStack<T> + soft-limit counter are consistent with the existing codebase. However, there are several correctness and safety issues that should be addressed before merging.


Critical: No Exception Safety — Pool Objects Can Leak or Return Early

Both PrefetchFromBrokerAsync and FetchFromBrokerAsync call response.ReturnToPool() inline, with no try/finally guard. If any exception is thrown between the response = assignment and the ReturnToPool() call (e.g., from _prefetchChannel.Writer.WriteAsync, UpdateWatermarksFromFetchResponse, or channel backpressure OperationCanceledException), the pooled object is never returned to the pool. This is a resource leak on exception paths.

The fix is to wrap the body in try/finally:

// Correct pattern:
try
{
    // ... process response ...
}
finally
{
    response.ReturnToPool();
}

PrefetchFromBrokerWithErrorHandlingAsync (line 927) catches all exceptions except OperationCanceledException when wakeup-requested, so the leak is real and not just theoretical.


Correctness: Responses = [] in ReturnToPool() Allocates on Every Return

In FetchResponse.ReturnToPool() (line 78) and FetchResponseTopic.ReturnToPool() (line 175), assigning Responses = [] and Partitions = [] will compile to Array.Empty<T>() (which is a cached singleton for the array, but the collection expression [] targeting IReadOnlyList<T> emits a new empty array instance in .NET 10, not Array.Empty<T>()).

Check whether this is Array.Empty<T>() or new T[0] at the IL level. If it is not the singleton, replace with the explicit singleton:

Responses = Array.Empty<FetchResponseTopic>();
Partitions = Array.Empty<FetchResponsePartition>();

The field initializers (= []) on lines 41 and 144 have the same concern for the initial object construction path.


Concern: ReturnToPool() Called Before Channel Write Completes

In PrefetchFromBrokerAsync, response.ReturnToPool() is called at line 1095, but the channel writes happen after this (lines 1107–1109). Because Records ownership was transferred to PendingFetchData beforehand, this is safe for records themselves. However, the AbortedTransactions list from partitionResponse.AbortedTransactions is also transferred directly to PendingFetchData (line 1063) — and ReturnToPool() sets AbortedTransactions = null on the partition objects before those PendingFetchData items are written to the channel. Verify that PendingFetchData takes a copy/ownership of AbortedTransactions at construction time (i.e., that it stores the reference, not reads through the partition) — if the partition is returned to the pool and AbortedTransactions is cleared before the channel consumer reads from PendingFetchData, there is no corruption only if PendingFetchData holds the reference directly.

This appears to be correct based on the constructor call at line 1059-1063, but it is a subtle invariant that should be documented with a comment in FetchResponsePartition.ReturnToPool().


Design: Breaking init API Contract Without a Clear Replacement Guarantee

All init properties on FetchResponse, FetchResponseTopic, and FetchResponsePartition are changed to internal set. The init modifier previously provided a compiler-enforced guarantee that these objects were immutable after construction, making them safe to inspect at any point. The internal set replacement is weaker:

  • The objects are now mutable at any point from within the assembly (not just during Read()).
  • Any internal consumer code that holds a FetchResponseTopic or FetchResponsePartition reference for longer than the fetch cycle (e.g., caching a reference for logging) will observe silently cleared/mutated fields after ReturnToPool().

Consider documenting the expected lifetime contract clearly, or use a marker (e.g., a bool _returned guard in debug builds) to detect use-after-return.


Design: FetchResponsePartition Pool Max Size (1024) May Be Misleading

The comment says "typically 1-50" partitions, but the pool size is set to 1024. The existing RecordBatch pool uses 2048 but the pool holds batches, not partitions. A consumer with 50 partitions fetching from 2 brokers concurrently needs at most 100 pooled partitions. A pool of 1024 wastes ~80KB of GC roots for no benefit. Consider aligning the pool size with the realistic cardinality, or at minimum document the reasoning for the 1024 cap.


Minor: s_poolCount Tracking Is Cosmetic for ConcurrentStack<T>

The pool uses both s_pool.Count (via Volatile.Read(ref s_poolCount)) and ConcurrentStack<T> which has its own O(n) Count property. Tracking s_poolCount separately with Interlocked is the correct approach (avoiding O(n) count), and this matches the RecordBatch pattern — this is fine. The comment "Soft limit: intentionally non-atomic check-then-act" correctly documents why a TOCTOU race is acceptable here. No action needed.


Minor: [AggressiveInlining] on RentFromPool() Is Questionable

RentFromPool() contains a branch (TryPop success vs. new()). [AggressiveInlining] for a two-branch method that invokes ConcurrentStack.TryPop (which is not trivially inlinable itself) is unlikely to provide measurable benefit and inflates the caller's code size. This matches the RecordBatch pattern, so it's consistent, but both could be revisited.


Tests: Good Coverage, One Gap

The 10 unit tests cover the key scenarios well (rent/return, field clearing, cascading, overflow, concurrency). One gap: there is no test verifying behavior when ReturnToPool() is called twice on the same object (double-return). A re-rented object's fields look cleared but the object could appear twice in the pool stack if the caller has a bug. A debug-mode ObjectDisposedException or similar guard would make this detectable.


Summary

Issue Severity
No try/finally around ReturnToPool() — pool objects leaked on exception High
Responses = [] / Partitions = [] may allocate on every return Medium
AbortedTransactions lifetime invariant not documented Low
initinternal set contract weakening not documented Low
Pool size 1024 for FetchResponsePartition oversized Low
No double-return detection in tests Low

The try/finally issue is the most important to fix before merging. The PR achieves meaningful GC pressure reduction on the consumer path and the core design is correct.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: All Previous Concerns Addressed

The author has responded to my earlier review with three follow-up commits. This is an assessment of the final state of the PR at HEAD (6c3bbcbf).


Resolved: All High/Medium Issues from Previous Review

try/finally around ReturnToPool() — Fixed. Both PrefetchFromBrokerAsync and FetchFromBrokerAsync now wrap the response-processing loop in try/finally, ensuring response.ReturnToPool() is called even when exceptions occur. This is the correct pattern.

Array.Empty() field initializers and ReturnToPool() cleanup — Fixed. The field initializers and all ReturnToPool() clearing assignments now use Array.Empty<FetchResponseTopic>() / Array.Empty<FetchResponsePartition>() explicitly, guaranteeing zero allocation via the singleton.

Double-return guard — Fixed. The _pooled flag prevents duplicate pool entries. ReturnToPool() silently no-ops on a second call, and the property guards detect use-after-return.

Use-after-return detection — Added. Responses, Partitions, and Records properties now throw ObjectDisposedException when _pooled == true. This turns silent data corruption into a loud, immediately detectable error — the right tradeoff for a pooled object.

Flaky IsSameReferenceAs test assertions — Fixed. The three ReturnAndRent_* tests no longer assert object identity; they only assert field clearing, which is the actual behavioral contract. This is correct per the CLAUDE.md guidance: pool identity is non-deterministic under parallel test execution.


Remaining Observations (Non-Blocking)

_pooled is a plain bool, not volatile

_pooled is read and written from ReturnToPool() and the property getters without any memory fence. In the normal consumer usage pattern this is single-threaded (one fetch loop per broker), so this is not a practical correctness issue. However, if a future consumer implementation ever shares a FetchResponsePartition reference across threads, the non-volatile flag could be missed. This is consistent with the RecordBatch pattern in the codebase (which also uses a plain bool for similar guards), so this is acceptable as-is.

ReturnToPool() iterates _responses (the backing field) rather than going through the guarded property

This is correct. If ReturnToPool() read through the Responses property it would throw ObjectDisposedException if called twice. Using _responses directly is the right choice.

FetchResponse_ReturnToPool_ClearsNestedTopicsAndPartitions test has a subtle pool-sharing assumption

The test rents a FetchResponse, FetchResponseTopic, and FetchResponsePartition, returns them via response.ReturnToPool(), then immediately re-rents three objects and asserts their fields are cleared. Under parallel test execution, the three re-rented objects are not guaranteed to be the same instances that were just returned — another test thread could drain one or more slots from the pool. The field-clearing assertions are correct regardless of object identity, but the test implicitly assumes it gets the same three objects back. If a different (freshly constructed) object is returned on re-rent, the field assertions still pass trivially (a new object always has cleared fields). The test does not actually prove that the cascading return clears fields on the pooled instances. To make it airtight, capture references to the inner objects before returning and assert on those specific references — but since the use-after-return ObjectDisposedException tests already cover the cascading behavior, this is low priority.


Summary

The implementation is correct, safe, and well-tested. The core design is sound: the ConcurrentStack<T> + soft-limit counter + _pooled guard + try/finally return pattern matches the established RecordBatch pooling approach. The ObjectDisposedException use-after-return guards are a good addition that makes lifecycle violations visible immediately.

The remaining items above are all non-blocking. This PR is ready to merge.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-Up Review: Pool FetchResponse Objects (Updated PR)

The three follow-up commits address all the critical and medium-severity issues from the previous review. The PR is substantially improved. Here is an updated assessment of what remains.


What Was Fixed (Good)

  • try/finally around ReturnToPool() — Both PrefetchFromBrokerAsync and FetchFromBrokerAsync now use try/finally to guarantee pool return on exception paths. This was the most important fix.
  • Array.Empty<T>() instead of [] — The field initializers and ReturnToPool() reset paths now use the explicit singleton, eliminating the allocation-per-return concern.
  • ObjectDisposedException guard on Responses, Partitions, and Records — Use-after-return is now detectable at runtime, not silent data corruption.
  • Double-return guard via _pooled flag — Idempotent ReturnToPool() prevents pool corruption from accidental double-return.
  • Tests for use-after-return and double-return — Coverage is now comprehensive.

Remaining Concern: _pooled Flag Has a Non-Atomic Read-Then-Use Race

The _pooled field is a plain bool, not volatile, not protected by any synchronization. The guard in the Responses / Partitions / Records getters reads _pooled without a memory barrier:

get
{
    if (_pooled) throw new ObjectDisposedException(nameof(FetchResponse));
    return _responses;
}

Meanwhile, ReturnToPool() sets _pooled = true and then clears _responses without any barrier either:

_pooled = true;

if (Volatile.Read(ref s_poolCount) < MaxPoolSize)
{
    s_pool.Push(this);
    ...
}

If two threads race — one reading Responses while another calls ReturnToPool() — the getter can observe _pooled == false (stale read due to lack of volatile), bypass the guard, and then read a partially cleared _responses. This is particularly relevant since PrefetchFromBrokerAsync runs on a background thread and ReturnToPool() is in the finally of the same method, but if a consumer thread also holds a reference (e.g., from logging or error handling), the race window exists.

The fix is to mark _pooled as volatile:

private volatile bool _pooled;

The RecordBatch pool uses the same pattern — if that field is also non-volatile, both should be fixed together.


Remaining Concern: Double-Return Test Is Now a No-Op

Commit 6c3bbcbf0d removed the IsSameReferenceAs assertions from the double-return tests with the stated reason "flaky". The resulting tests are:

response.ReturnToPool();
response.ReturnToPool();   // <-- idempotent guard prevents double-push

var first = FetchResponse.RentFromPool();
var second = FetchResponse.RentFromPool();

if (ReferenceEquals(first, response) && ReferenceEquals(second, response))
{
    Assert.Fail("...");
}

first.ReturnToPool();
second.ReturnToPool();

This test can never fail, even if the double-return guard is removed. The pool is a shared static ConcurrentStack. When first and second are rented after a double-return of response, there may be other objects in the pool from other tests (since this is a static pool with no reset between tests). The condition ReferenceEquals(first, response) && ReferenceEquals(second, response) requires the same instance to be popped twice consecutively, which is impossible unless the stack is corrupted — but the stack being corrupted by double-push would produce the same object at two different stack positions, not necessarily at the top two positions.

The original intent (verify idempotency) is sound, but the test as written does not actually verify what it claims. The flakiness in the original version stemmed from the shared pool state. A correct, deterministic test would isolate pool state or verify the _pooled flag directly:

var response = FetchResponse.RentFromPool();
response.ReturnToPool();

// Second return should be silently ignored — no exception, _pooled guard fires
response.ReturnToPool(); // Should not throw

// Verify the object is not rented out twice by checking pool count via a known drain

Consider either reverting to a meaningful test or documenting that the double-return guard is tested via the ObjectDisposedException use-after-return tests (which indirectly cover it).


Remaining Minor: AbortedTransactions Lifetime Invariant Still Undocumented

FetchResponsePartition.ReturnToPool() sets AbortedTransactions = null, but PendingFetchData was constructed with the reference already captured before ReturnToPool() is called. This is safe because PendingFetchData stores the reference directly — but this invariant is not documented anywhere. A future refactor that changes when ReturnToPool() is called (e.g., moving it before PendingFetchData construction) would silently introduce a null dereference.

A comment in ReturnToPool() like // Safe: AbortedTransactions reference already captured by PendingFetchData before this point would make the invariant explicit and reviewable.


Remaining Minor: Pool Size 1024 for FetchResponsePartition Is Unjustified

The comment still says "typically 1-50" partitions while the pool size is 1024. A consumer with 50 partitions across 2 brokers needs at most ~100 pooled partitions. 1024 means ~900 objects are perpetually allocated but never used. This is GC roots that work against the goal of reducing GC pressure. Either reduce the cap to ~256 (matching FetchResponseTopic's 256) or add a comment explaining the reasoning for the larger cap.


Summary

Issue Severity Status
try/finally around ReturnToPool() High Fixed
Array.Empty<T>() allocation on return Medium Fixed
_pooled flag not volatile — potential race Medium Open
Double-return test is a no-op after commit 6c3bbcbf Medium Open
AbortedTransactions lifetime invariant undocumented Low Open
FetchResponsePartition pool size 1024 unjustified Low Open

The core design is correct and the critical exception-safety issue is resolved. The volatile concern and the test no-op are the items worth addressing before merging.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

…ion objects

Reduce GC pressure in the consumer fetch cycle by pooling the response
envelope objects instead of allocating new ones each cycle.
…Array.Empty, double-return guard

- Wrap response processing in try/finally in both prefetch and non-prefetch
  paths in KafkaConsumer.cs to ensure ReturnToPool() is always called even
  if an exception occurs during response iteration.
- Replace [] with Array.Empty<T>() in FetchResponse.cs field initializers
  and ReturnToPool() methods to guarantee zero allocation.
- Add _pooled flag to FetchResponse, FetchResponseTopic, and
  FetchResponsePartition to guard against double-return corrupting the pool.
- Add double-return detection tests for all three pooled types.
Convert Responses, Partitions, and Records properties from auto-properties
to backing fields with a guard that throws ObjectDisposedException when
accessed after ReturnToPool(). This prevents silent use of zeroed/nulled
fields when code holds a stale reference to a pooled object.

The _pooled flag (already present for double-return detection) is reused
for the guard check. ReturnToPool() methods now use backing fields directly
to avoid triggering the guard during cleanup.
Pool identity assertions are non-deterministic under parallel test
execution — another test can drain the pool between ReturnToPool()
and RentFromPool(). Assert field clearing instead, which is the
actual contract being tested.
@thomhurst thomhurst force-pushed the perf/pool-fetch-response-objects branch from 6c3bbcb to 10a264f Compare March 25, 2026 14:12
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-Up Review (Round 4): Status of Outstanding Issues

This review checks the current diff against the open items from the previous round.


Open: _pooled Flag Is Still Not volatile

The _pooled field across all three classes is still declared as private bool _pooled, without volatile. This was raised as a medium-severity issue in the previous review and has not been addressed.

The concern: ReturnToPool() sets _pooled = true and then clears backing fields without a memory fence. The Responses/Partitions/Records property getters read _pooled without a fence. In the normal single-threaded per-broker fetch loop this is fine in practice, but the absence of volatile is a correctness hazard that grows as the codebase evolves.

The fix is a one-line change per class:

private volatile bool _pooled;

This is the same fix needed in RecordBatch if that class uses the same non-volatile pattern. The cost is negligible (one extra memory fence on pool return, which is off the hot path), and the benefit is a documented, correct happens-before relationship between the writer and any reader on a different thread.


Open: Double-Return Test Is Still a No-Op

The FetchResponse_DoubleReturn_DoesNotDuplicateInPool test (and equivalents for Topic/Partition) has this structure:

// Drain the pool first to get a clean state
while (FetchResponse.RentFromPool() is not null)
{
    // RentFromPool always returns non-null (creates new if pool empty), so just rent a few
    break;  // ← immediately breaks, drains nothing
}

var response = FetchResponse.RentFromPool();
response.ReturnToPool();
response.ReturnToPool(); // second call is a no-op due to _pooled guard

var first = FetchResponse.RentFromPool();
var second = FetchResponse.RentFromPool();

if (ReferenceEquals(first, response) && ReferenceEquals(second, response))
{
    Assert.Fail("...");
}

Two problems:

  1. The "drain" loop immediately breaks and drains nothing, so the comment is misleading.
  2. The _pooled guard prevents the second ReturnToPool() from pushing to the stack at all. The Assert.Fail condition (first === response AND second === response) is therefore structurally impossible — response can appear at most once in the pool, so second can never be the same object as response if first already was. The test cannot fail regardless of whether the guard works.

The test as written does not verify the double-return guard. What it does demonstrate is that calling ReturnToPool() twice does not throw. That behavior is already covered by the use-after-return tests (the second ReturnToPool() will trigger the _pooled early-return path silently).

To actually test the guard, check the _pooled field state directly (requires making it accessible to tests, e.g., via internal + InternalsVisibleTo), or verify via the ObjectDisposedException behavior: after the first return, accessing a guarded property should throw; after the second return, accessing that same property should still throw (not reset to accessible). The latter is already covered by the existing use-after-return tests.

The cleanest fix is to either remove the duplicate tests (since the coverage they intended is already provided by the use-after-return tests) or replace them with a meaningful assertion:

var response = FetchResponse.RentFromPool();
response.ReturnToPool();

// Second return should not throw
response.ReturnToPool();

// The object must still be considered "returned" (use-after-return guard still active)
await Assert.That(() => response.Responses).Throws<ObjectDisposedException>();

Open: AbortedTransactions Lifetime Invariant Undocumented

FetchResponsePartition.ReturnToPool() sets AbortedTransactions = null while the caller has already transferred this reference to PendingFetchData. The invariant (safe because PendingFetchData captures the reference at construction, before ReturnToPool() is called) is still undocumented. A single comment in ReturnToPool() is sufficient:

// Safe: AbortedTransactions reference already captured by PendingFetchData before ReturnToPool() is called.
AbortedTransactions = null;

Open: Pool Size 1024 Comment Is Internally Inconsistent

The comment on FetchResponsePartition still reads "typically 1-50" with a MaxPoolSize = 1024. With "typically 1-50" partitions per broker and "typically 1-2" concurrent brokers, the steady-state working set is 50–100 objects. A pool of 1024 holds ~900 idle objects permanently on the GC heap. Either reduce the cap to 128 or 256 (matching FetchResponseTopic), or replace the comment with justification for the larger cap.


Summary

Issue Severity Status
try/finally around ReturnToPool() High Fixed
Array.Empty<T>() allocation on return Medium Fixed
ObjectDisposedException use-after-return guards Medium Fixed
_pooled flag not volatile Medium Still open
Double-return tests are no-ops Medium Still open (test logic flawed)
AbortedTransactions lifetime invariant undocumented Low Still open
FetchResponsePartition pool size 1024 unjustified Low Still open

The implementation is functionally correct and safe. The volatile issue and the test no-op are the items most worth fixing before merging, as the test currently provides false confidence in the double-return guard.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Tests share a global ConcurrentStack pool, so parallel execution can
rent objects returned by other tests, resetting their _pooled flag
and causing use-after-return guard assertions to fail. Added
[NotInParallel] to serialize these tests.
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-Up Review (Round 5): Current State at HEAD

The two most recent commits (10a264f0, 5d5168c2) address test reliability only — removing flaky reference-identity assertions and adding [NotInParallel] to prevent pool state bleed between tests. These are good, necessary fixes. However, the four open issues from Round 4 are still unaddressed.


Still Open: _pooled Flag Is Not volatile

All three classes declare:

private bool _pooled;

The RecordBatch class in the same codebase uses private volatile bool _disposed for its equivalent guard. Without volatile, there is no guaranteed memory ordering between the thread that calls ReturnToPool() (which sets _pooled = true and then clears fields) and any thread that calls a property getter (which reads _pooled to check for use-after-return). In the current consumer loop, all accesses happen on the same thread, so this is not an active bug today — but it is a correctness hazard as the codebase evolves, and it deviates from the established pattern in RecordBatch.

Fix (one line per class):

private volatile bool _pooled;

Still Open: Double-Return Tests Are No-Ops

The FetchResponse_DoubleReturn_DoesNotDuplicateInPool test and its Topic/Partition equivalents do not exercise the guard they claim to test. The _pooled guard prevents the second ReturnToPool() call from pushing to the stack, so the Assert.Fail condition (first === response && second === response) is structurally impossible — response can appear in the stack at most once. The test cannot fail regardless of whether the guard is present or absent.

The simplest fix that actually validates the guard:

[Test]
public async Task FetchResponse_DoubleReturn_ObjectStillConsideredReturned()
{
    var response = FetchResponse.RentFromPool();
    response.ReturnToPool();

    // Second return should not throw
    response.ReturnToPool();

    // Object must still be in the "returned" state — the guard should not reset on the second call
    await Assert.That(() => response.Responses).Throws<ObjectDisposedException>();
}

This replaces the current meaningless reference-identity check with a meaningful behavioral assertion: calling ReturnToPool() twice leaves the object in the "disposed" state, not in a halfway-reset state.


Still Open: AbortedTransactions Lifetime Invariant Is Undocumented

FetchResponsePartition.ReturnToPool() nulls out AbortedTransactions after the caller has already transferred that reference to PendingFetchData. This is safe because the transfer happens before ReturnToPool() is called, but the ReturnToPool() method has no comment explaining why it is safe to null this field despite it being observable from PendingFetchData.

A future maintainer reading ReturnToPool() in isolation would see a live reference being zeroed and might reasonably worry about a use-after-free. A single comment resolves this:

// Safe: AbortedTransactions reference already captured by PendingFetchData before ReturnToPool() is called.
AbortedTransactions = null;

Still Open: Pool Size 1024 Is Inconsistent With "Typically 1–50" Comment

FetchResponsePartition has MaxPoolSize = 1024 with the comment "typically 1-50" partitions per fetch cycle. With 1–2 concurrent brokers that is a steady-state working set of 50–100 objects. The pool will permanently hold ~900 idle objects on the GC heap. Either:

  • Reduce to 128 or 256 (matching FetchResponseTopic), or
  • Update the comment with the justification for 1024 (e.g., burst tolerance, pre-warming intent).

As written, the comment argues against the cap value, which creates confusion.


Summary

Issue Severity Status
try/finally around ReturnToPool() High Fixed ✓
Array.Empty<T>() allocation on return Medium Fixed ✓
ObjectDisposedException use-after-return guards Medium Fixed ✓
Flaky reference-identity assertions in tests Medium Fixed ✓
Parallel test pool interference Medium Fixed ✓
_pooled flag not volatile Medium Still open
Double-return tests are no-ops Medium Still open
AbortedTransactions lifetime invariant undocumented Low Still open
FetchResponsePartition pool size 1024 unjustified Low Still open

The implementation is functionally correct. The volatile issue and the no-op double-return tests are the most important remaining items — the former for correctness consistency with RecordBatch, the latter because the tests currently provide false confidence in the double-return guard.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

@thomhurst
Copy link
Owner Author

Closing: this PR's core pooling work was superseded by #591 (merged to main). The beneficial additions (use-after-return guards, pooling tests) will be recreated in a new PR on top of the current ObjectPool implementation.

@thomhurst thomhurst closed this Mar 26, 2026
thomhurst added a commit that referenced this pull request Mar 26, 2026
Add volatile _pooled flag and ObjectDisposedException guards to
FetchResponse, FetchResponseTopic, and FetchResponsePartition. This
detects use-after-return bugs at runtime instead of allowing silent
data corruption when a pooled object is accessed after being returned.

Also adds double-return idempotency (second ReturnToPool is a no-op)
and comprehensive pooling tests covering field clearing, cascading
return, thread safety, and use-after-return detection.

Salvaged from #581 which was superseded by #591.
thomhurst added a commit that referenced this pull request Mar 26, 2026
* fix: add use-after-return guards to pooled FetchResponse objects

Add volatile _pooled flag and ObjectDisposedException guards to
FetchResponse, FetchResponseTopic, and FetchResponsePartition. This
detects use-after-return bugs at runtime instead of allowing silent
data corruption when a pooled object is accessed after being returned.

Also adds double-return idempotency (second ReturnToPool is a no-op)
and comprehensive pooling tests covering field clearing, cascading
return, thread safety, and use-after-return detection.

Salvaged from #581 which was superseded by #591.

* fix: address review — atomic pool guard, guard AbortedTransactions

- Replace volatile bool with int + Interlocked.CompareExchange in
  ReturnToPool() on all three classes, eliminating the TOCTOU race
  where two concurrent callers could both pass the check and double-
  return the same object to the pool.

- Add use-after-return guard to AbortedTransactions on
  FetchResponsePartition (was inconsistently unguarded while Records
  was guarded). Uses same backing field + ObjectDisposedException
  pattern.

- Add test for AbortedTransactions guard.

* fix: address review feedback — cache Records reference, strengthen tests

- Cache partitionResponse.Records in a local variable in both consumer
  fetch paths to avoid repeated Volatile.Read from the pool guard
- Add IsSameReferenceAs assertions to ReturnAndRent tests to prove
  Reset() actually clears the dirty instance (not a fresh default)
- Use non-pooled FetchResponsePartition in topic test to avoid leaking
  a rented partition into the pool via cascading ReturnToPool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant