Skip to content

Conversation

@praboud
Copy link

@praboud praboud commented Dec 4, 2025

Description of change

Adds idle connection management to ConnectionPool. This is disabled by default, but clients can configure a idle_connection_timeout. If set, connections that aren't used for longer than the configured timeout are closed automatically by a background thread. (There's a singleton IdleConnectionCleanupManager which schedules these checks for all connection pools, to avoid needing to allocate a separate thread for each pool.)

Right now, because connections are never closed once opened, clients have a "high water mark" property - the client will hold open as many connections as it needed during the busiest period that client has experienced. This can cause far more connections to be used than are really needed, as well as making the number of connections used difficult to reason about. I've deployed this patch to production, and seen a roughly 2x reduction in connections used in practice, on a large deployment of clients.

Pull Request check-list

Please make sure to review and check all of these items:

  • Do tests and lints pass with this change?
  • Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
  • Is there an example added to the examples folder (if applicable)?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

@praboud praboud changed the title Idle connections Idle connection management Dec 4, 2025
@petyaslavova
Copy link
Collaborator

Hi @praboud, thank you for your contribution! We will take a look at it soon.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds idle connection management to ConnectionPool and BlockingConnectionPool to automatically close connections that haven't been used for a configurable timeout period. This feature is disabled by default and activated by setting the idle_connection_timeout parameter. A singleton IdleConnectionCleanupManager with a background thread handles cleanup for all pools efficiently, reducing connection overhead during low-traffic periods.

Key Changes:

  • Introduced PooledConnection wrapper to track last-used timestamps
  • Implemented IdleConnectionCleanupManager singleton with heapq-based scheduling
  • Added idle_connection_timeout and idle_check_interval parameters to connection pools and Redis client

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
redis/connection.py Core implementation: added PooledConnection class, IdleConnectionCleanupManager singleton, and _cleanup_idle_connections methods to both pool types
redis/client.py Added idle_connection_timeout and idle_check_interval parameters to Redis client constructor
redis/cluster.py Added new parameters to cluster connection pool configuration allowlist
tests/test_connection_pool.py Comprehensive test suite for idle connection cleanup including mock time utilities and integration tests
tests/test_multiprocessing.py Updated to handle PooledConnection wrapper in assertions
tests/test_maint_notifications.py Updated to extract connection from PooledConnection wrapper
tests/test_maint_notifications_handling.py Refactored to use new _get_free_connections() method
Comments suppressed due to low confidence (1)

redis/connection.py:551

  • This import of module logging is redundant, as it was previously imported on line 4.
                import logging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

e,
exc_info=True,
)

Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return statement. The function signature declares -> Optional[float] and the docstring states it returns "Timestamp of the oldest remaining connection, or None if pool is empty." However, the function doesn't return oldest_connection_time at the end. It should return this value so the cleanup manager can properly reschedule the pool.

Add at the end of the function:

return oldest_connection_time
Suggested change
return oldest_connection_time

Copilot uses AI. Check for mistakes.
Comment on lines 3310 to 3321
try:
if self._in_maintenance:
self._lock.acquire()
self._locked = True
yield
finally:
if self._locked:
try:
self._lock.release()
except Exception:
pass
self._locked = False
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential concurrency issue with _locked flag. The _locked flag is used to track whether the lock was acquired, but it's not thread-safe. If multiple threads call _maintenance_lock() simultaneously when _in_maintenance is True, they could both see _locked=False, both acquire the lock (one will block), and both set _locked=True. Then in the finally block, the first thread to exit will set _locked=False, causing the second thread to potentially not release the lock or encounter issues.

Consider using a thread-local variable or a more robust locking pattern. For example:

@contextmanager
def _maintenance_lock(self):
    acquired = False
    try:
        if self._in_maintenance:
            self._lock.acquire()
            acquired = True
        yield
    finally:
        if acquired:
            try:
                self._lock.release()
            except Exception:
                pass
Suggested change
try:
if self._in_maintenance:
self._lock.acquire()
self._locked = True
yield
finally:
if self._locked:
try:
self._lock.release()
except Exception:
pass
self._locked = False
acquired = False
try:
if self._in_maintenance:
self._lock.acquire()
acquired = True
yield
finally:
if acquired:
try:
self._lock.release()
except Exception:
pass

Copilot uses AI. Check for mistakes.
Comment on lines +253 to +254
idle_connection_timeout: Optional[float] = None,
idle_check_interval: float = 60.0,
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for new parameters. The idle_connection_timeout and idle_check_interval parameters are added to the __init__ method but are not documented in the docstring. These parameters should be documented in the Args section to help users understand their purpose and usage.

Consider adding documentation like:

idle_connection_timeout:
    If set, connections that have been idle (not used) for longer than
    this timeout (in seconds) will be automatically closed. If None (default),
    idle connections are never closed. Only used when connection_pool is not provided.
idle_check_interval:
    How frequently (in seconds) to check for idle connections. Defaults to 60 seconds.
    Only used when idle_connection_timeout is set and connection_pool is not provided.

Copilot uses AI. Check for mistakes.
if wait_time > 0:
# Sleep until next scheduled check (or until notified)
self._condition.wait(timeout=wait_time)
return
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return statement after wait(). When wait_time > 0, the function waits but then falls through to execute the code below (lines 2536-2548) instead of returning None. This could cause the function to pop from the schedule and process the pool immediately after waking up, even if the wait was interrupted before the scheduled time.

Add return None after the wait:

if wait_time > 0:
    # Sleep until next scheduled check (or until notified)
    self._condition.wait(timeout=wait_time)
    return None
Suggested change
return
return None

Copilot uses AI. Check for mistakes.

with self._condition:
self._reschedule_pool(metadata, oldest_conn_time)
# the pool after the pool is rescheduled, we can clean up the WAL
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment has a grammatical error. Should be "After the pool is rescheduled" instead of "the pool after the pool is rescheduled".

Suggested change
# the pool after the pool is rescheduled, we can clean up the WAL
# After the pool is rescheduled, we can clean up the WAL

Copilot uses AI. Check for mistakes.
self.timeout = timeout
self._in_maintenance = False
self._locked = False
self.pool: Queue[PooledConnection | None] = self.queue_class(max_connections)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect initialization order. The self.pool attribute is assigned on line 3071 before calling super().__init__() on line 3072. However, super().__init__() calls reset() (via the parent ConnectionPool), and reset() (defined at line 3080) also assigns self.pool at line 3083. This means:

  1. Line 3071 initializes self.pool
  2. Line 3072 calls super().__init__() which eventually calls reset()
  3. Line 3083 (in reset()) overwrites self.pool, making line 3071 redundant

The assignment on line 3071 should be removed, as reset() will properly initialize the pool.

Suggested change
self.pool: Queue[PooledConnection | None] = self.queue_class(max_connections)

Copilot uses AI. Check for mistakes.
Comment on lines 990 to 992
mock_datetime.datetime.side_effect = lambda *args, **kwargs: datetime.datetime(
*args, **kwargs
)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 'lambda' is just a simple wrapper around a callable object. Use that object directly.

Suggested change
mock_datetime.datetime.side_effect = lambda *args, **kwargs: datetime.datetime(
*args, **kwargs
)
mock_datetime.datetime.side_effect = datetime.datetime

Copilot uses AI. Check for mistakes.
Comment on lines 3319 to 3320
except Exception:
pass
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as e:
logger.warning(
"Error releasing maintenance lock in BlockingConnectionPool: %s",
e,
exc_info=True,
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants