-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Describe the bug
The plugins at version 2.6.3 seem to aggressively allocate database connections and do not release them.
Expected Behavior
Downgrading to 2.6.1 results in a stable behaviour, where the application's DB user allocates a constant number of connections that does not grow over time.
What plugins are used? What other connection properties were set?
failover,efm2
Current Behavior
After upgrading from 2.6.1 to 2.6.3, the RDS Aurora instance is being flooded with idle connections. When restarting the application, the connection count drops down to pool size * number of pods and it immediately starts growing. Throughout the day the number of connections spike almost up to maximum (for our instance it's 401 connections), disallowing other components to connect and overloading the database.
Nothing has been changed in the application code or the configuration, only the library version has been upgraded.
JDBC driver debug logs did not show anything apart from numerous connection openings. Turning on debug logs on HikarciCP did not reveal any leaks, the housekeeper logged only the configured pool size of 3 connections.
The application itself is a Kotlin-based API service that connects to the database via HikariCP datasource. The connections accumulate despite no API traffic. Here's our configuration:
val hikariConfig = HikariConfig()
hikariConfig.jdbcUrl = "jdbc:aws-wrapper:postgresql://<rds_host>:5432/<db_name>"
hikariConfig.driverClassName = "software.amazon.jdbc.Driver"
hikariConfig.username = "<username>"
hikariConfig.password = "<password>"
hikariConfig.maximumPoolSize = 3
hikariConfig.exceptionOverrideClassName = "software.amazon.jdbc.util.HikariCPSQLException"
val dataSourceProperties = Properties()
dataSourceProperties.setProperty("wrapperPlugins", "failover, efm2")
hikariConfig.dataSourceProperties = dataSourceProperties
database - Aurora PostgreSQL, one instance of size db.t3.medium, engine version 14.9. Maximum number of connections - 401.
Inspecting the activities from the DB user accumulating the connections revealed that all of them only perform health checks:
<db_name>=> SELECT LEFT(query, 200) AS query
FROM pg_stat_activity
WHERE usename = '<username>'
ORDER BY backend_start DESC;
| query |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SELECT SERVER_ID, CASE WHEN SESSION_ID = 'MASTER_SESSION_ID' THEN TRUE ELSE FALSE END, CPU, COALESCE(REPLICA_LAG_IN_MSEC, 0), LAST_UPDATE_TIMESTAMP FROM aurora_replica_status() WHERE EXTRACT(EPOCH FROM(NOW() - LAST_UPDATE_TIMESTAMP)) <= 300 OR SESSION_ID = ' MASTER_SESSION_ID' OR LAST_UPDATE_TIMESTAMP IS NULL |
...
In the graph below, you can see the connection growth after each application restart. Yesterday, the library has been downgraded and the connections dropped to a stable number, where the slight spikes are performed by a lambda function that connects to the database via RDS proxy:
Reproduction Steps
Reproduction requires starting up a JVM application that connects to the Aurora PostgreSQL database via HikariCP middleware with AWS Advanced JDBC wrapper configured as in the field above. In our case it took >20 hours to reach over 390 connections.
Possible Solution
No response
Additional Information/Context
No response
The AWS Advanced JDBC Driver version used
2.6.3
JDK version used
21
Operating System and version
Linux