Skip to content

Extract pool blocking-period error state into BlockingPeriodErrorState#4395

Open
mdaigle wants to merge 3 commits into
mainfrom
dev/mdaigle/pool-blocking-period-refactor
Open

Extract pool blocking-period error state into BlockingPeriodErrorState#4395
mdaigle wants to merge 3 commits into
mainfrom
dev/mdaigle/pool-blocking-period-refactor

Conversation

@mdaigle

@mdaigle mdaigle commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

This is Part 1 of 3 splitting #4376 ("Dev/mdaigle/pool rate limit") into stacked, reviewable PRs.

This PR extracts the connection pool's blocking-period (error backoff) logic out of WaitHandleDbConnectionPool into a reusable, testable BlockingPeriodErrorState class. This keeps the pool's connection-acquisition path focused on capacity/queue concerns. Part 2 will use the new class to implement blocking period support in the ChannelDbConnectionPool.

Changes

  • New BlockingPeriodErrorState — encapsulates cached-exception fast-fail, exponential backoff (5s → 60s cap), exit timer, and synchronization. Takes an injectable TimeProvider so timer scheduling is deterministic in tests.
  • WaitHandleDbConnectionPool — refactored to use BlockingPeriodErrorState instead of inline error-state fields/logic.
  • DbConnectionPoolGroup.IsBlockingPeriodEnabled() — new helper centralizing the PoolBlockingPeriod decision (Auto → not an Azure SQL endpoint, AlwaysBlock → true, NeverBlock → false). Consumed by Part 2.
  • ADP.UnsafeCreateTimer(TimeProvider, ...) — new overload returning ITimer that suppresses ExecutionContext flow while honoring the injected TimeProvider; required by BlockingPeriodErrorState.
  • BlockingPeriodErrorStateTest — unit tests using FakeTimeProvider (29 tests).

Stacking

  • Part 1 (this PR) → main
  • Part 2 (channel-pool rate limiting) → this branch
  • Part 3 (GitHub instructions) → Part 2

Checklist

  • Tests added (29 passing via FakeTimeProvider)
  • No public API changes
  • No breaking changes
  • Verified against customer repro (n/a — internal refactor)

Move the connection pool's blocking-period (error backoff) logic out of WaitHandleDbConnectionPool into a reusable, testable BlockingPeriodErrorState class with exponential backoff (5s..60s), cached-exception fast-fail, and an injectable TimeProvider for deterministic tests. Add DbConnectionPoolGroup.IsBlockingPeriodEnabled() and the ADP.UnsafeCreateTimer(TimeProvider,...) overload it depends on. Includes BlockingPeriodErrorStateTest.
Copilot AI review requested due to automatic review settings June 23, 2026 17:24
@github-project-automation github-project-automation Bot moved this to To triage in SqlClient Board Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the legacy wait-handle connection pool’s blocking-period (error backoff) logic into a dedicated BlockingPeriodErrorState component, adds a pool-group helper for determining when blocking is enabled, and introduces a TimeProvider-based timer factory to enable deterministic unit testing.

Changes:

  • Added BlockingPeriodErrorState (cached exception + exponential backoff + exit timer) and corresponding unit tests using FakeTimeProvider.
  • Refactored WaitHandleDbConnectionPool to delegate blocking-period logic to the new state object and moved the blocking-period enablement decision into DbConnectionPoolGroup.
  • Added an ADP.UnsafeCreateTimer(TimeProvider, ...) overload returning ITimer to support TimeProvider-driven timers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/BlockingPeriodErrorStateTest.cs Adds comprehensive unit tests for the new blocking-period state logic using FakeTimeProvider.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/WaitHandleDbConnectionPool.cs Replaces inlined error/backoff fields and timer logic with BlockingPeriodErrorState.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/DbConnectionPoolGroup.cs Centralizes the PoolBlockingPeriod decision into IsBlockingPeriodEnabled().
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/BlockingPeriodErrorState.cs Introduces the new reusable blocking-period error state implementation.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs Adds a TimeProvider-aware UnsafeCreateTimer overload returning ITimer.

Comment on lines +102 to +106
newTimer = ADP.UnsafeCreateTimer(
_timeProvider,
ExitCallback,
null,
Timeout.InfiniteTimeSpan,
Comment on lines +5 to +9
using System;
using System.Threading;
using Microsoft.Extensions.Time.Testing;
using Microsoft.Data.SqlClient.ConnectionPool;
using Xunit;
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.87179% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.67%. Comparing base (44e1b72) to head (b7dbf34).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs 75.00% 2 Missing ⚠️
.../SqlClient/ConnectionPool/DbConnectionPoolGroup.cs 75.00% 2 Missing ⚠️
...lient/ConnectionPool/WaitHandleDbConnectionPool.cs 88.23% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4395      +/-   ##
==========================================
- Coverage   65.32%   63.67%   -1.66%     
==========================================
  Files         285      281       -4     
  Lines       43373    66403   +23030     
==========================================
+ Hits        28335    42279   +13944     
- Misses      15038    24124    +9086     
Flag Coverage Δ
CI-SqlClient ?
PR-SqlClient-Project 63.67% <94.87%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

// so callers can inject a test double for deterministic scheduling.
if (ExecutionContext.IsFlowSuppressed())
{
return timeProvider.CreateTimer(callback, state, dueTime, period);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a timer via a TimeProvider allows for faking time in unit tests. I chose not to change existing overloads to feed into this one because there are many spots already use them and it would be a more extensive refactor.

// instead obtained creation mutex

DbConnectionInternal obj = null;
if (ErrorOccurred)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a check/act race condition here. The new class uses a local variable to check and throw.

SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Errors are set.", Id);
Interlocked.Decrement(ref _waitCount);
throw TryCloneCachedException();
_errorState.ThrowIfActive();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a chance this may not throw, in which case, loop back around to check available wait handles.


// Reset the error wait:
_errorWait = ERROR_WAIT_DEFAULT;
// A successful creation clears any prior error state and resets backoff.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior was missing before. If we somehow successfully create a connection while in the blocking period (via some race condition with checking the error state), then we should reset the whole error state including the ManualResetEvent and the cached exception, not just the wait period. There's no need to wait until the timer fires to allow more creates if we know we're succeeding now.

newObj = null; // set to null, so we do not return bad new object

// Failed to create instance
_resError = e;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of race conditions possible here. I decided to move this all under a lock so that all of the relevant values are updated atomically.


// Clone SqlExceptions so stack traces are not shared across callers; other
// exception types are rethrown as-is.
throw cached is SqlException sqlEx ? sqlEx.InternalClone() : cached;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the existing cloning logic

@mdaigle mdaigle marked this pull request as ready for review June 23, 2026 20:45
Copilot AI review requested due to automatic review settings June 23, 2026 20:45
@mdaigle mdaigle requested a review from a team as a code owner June 23, 2026 20:45
@mdaigle mdaigle added this to the 7.1.0-preview2 milestone Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment on lines +195 to +200
internal static ITimer UnsafeCreateTimer(
TimeProvider timeProvider,
TimerCallback callback,
object state,
TimeSpan dueTime,
TimeSpan period)
Comment on lines +137 to +141
_timeProvider,
ExitCallback,
null,
wait,
Timeout.InfiniteTimeSpan);
Comment on lines +132 to +135
// Create the exit timer disarmed (infinite due time); it is armed below, still
// under the lock. ADP.UnsafeCreateTimer suppresses execution-context flow so the timer
// doesn't capture and pin the current ExecutionContext and its AsyncLocals for its
// lifetime, while still honoring the injected TimeProvider for testability.
Comment on lines +598 to +608
/// <summary>
/// Verifies that the onEnter callback is invoked after the internal lock is released.
/// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> and calls
/// <see cref="BlockingPeriodErrorState.ThrowIfActive"/> — operations that are safe only
/// when the lock is not held. If the implementation were changed to hold the lock during
/// the callback invocation, any re-entrant call from the callback that tries to acquire the
/// same lock (on a non-re-entrant lock) would deadlock.
/// </summary>
[Fact]
public void OnEnter_CalledOutsideLock_CanCallBackIntoState()
{
Comment on lines +633 to +643
/// <summary>
/// Verifies that the onExit callback is invoked after the internal lock is released.
/// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> — confirming it
/// observes the cleared state — and calls <see cref="BlockingPeriodErrorState.ThrowIfActive"/>
/// without deadlocking. If the implementation were changed to hold the lock during the
/// callback, any re-entrant call from the callback would deadlock on a non-re-entrant lock.
/// </summary>
[Fact]
public void OnExit_CalledOutsideLock_CanCallBackIntoState()
{
// Arrange
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To triage

Development

Successfully merging this pull request may close these issues.

4 participants