Skip to content

FIX: Cache SQLDescribeParam results for NULL parameters to reduce server round-trips#614

Open
jahnvi480 wants to merge 11 commits into
mainfrom
jahnvi/fix-null-param-describe-610
Open

FIX: Cache SQLDescribeParam results for NULL parameters to reduce server round-trips#614
jahnvi480 wants to merge 11 commits into
mainfrom
jahnvi/fix-null-param-describe-610

Conversation

@jahnvi480
Copy link
Copy Markdown
Contributor

@jahnvi480 jahnvi480 commented Jun 2, 2026

Work Item / Issue Reference

AB#45399

GitHub Issue: #610


Summary

This pull request implements a robust caching mechanism for SQL parameter type resolution when binding None (NULL) parameters in the MSSQL Python driver, addressing issues with repeated round-trips to SQLDescribeParam and previous incorrect type fallbacks. The changes affect both the Python and C++ layers, improving performance and correctness, especially for all-NULL columns and VARBINARY types. Comprehensive tests are added to ensure correctness and coverage of the new cache logic.

Enhancements to NULL parameter type resolution and caching:

  • Introduced a thread-safe cache in ddbc_bindings.cpp to store SQLDescribeParam results per statement handle and parameter index, reducing redundant round-trips and ensuring correct type inference for NULL parameters. This replaces the previous fallback to SQL_VARCHAR, which caused issues with certain column types.
  • Updated both single and array parameter binding (BindParameters and BindParameterArray) to use the new cache for resolving types of NULL parameters, ensuring consistency and correctness for all-NULL columns.

Cache invalidation and integration:

  • Added cache invalidation logic when a new SQL statement is prepared, ensuring that parameter type caches are not reused across different statements. This is handled in both SQLExecute_wrap and SQLExecuteMany_wrap.

Python interface and type mapping:

  • Modified _map_sql_type in cursor.py to always return SQL_UNKNOWN_TYPE for None parameters, delegating type resolution to the C++ cache. Removed the previous Python-side fallback to SQL_VARCHAR for all-NULL columns in executemany.

Testing improvements:

  • Added extensive tests in test_004_cursor.py to cover cache hit/miss scenarios, cache invalidation, all-NULL columns, and bypassing the cache with explicit type hints, ensuring the new logic is robust and correct.

Codebase maintenance:

  • Included necessary C++ headers for thread-safety and updated function pointer loading logs for clarity.

  • Minor loop index and logging improvements for clarity and correctness in parameter binding.

These changes collectively resolve previous issues with NULL parameter handling, improve driver efficiency, and provide comprehensive test coverage for the new behavior.

…lared_parameters round-trips (GH-610)

When a parameterized query includes None values, the driver previously
sent SQL_UNKNOWN_TYPE to the C++ layer, which triggered a SQLDescribeParam
call (internally sp_describe_undeclared_parameters) for every NULL param
on every execute(). This caused thousands of unnecessary server round-trips
per minute for workloads with frequent NULLs (e.g. sp_set_session_context).

The describe call fails most of the time — especially for stored procedure
calls — and the C++ layer already falls back to SQL_VARCHAR, so the
round-trip adds latency with no benefit.

Fix: Return SQL_VARCHAR directly from _map_sql_type() for None params,
matching the existing executemany() all-NULL column optimisation.

Closes #610
Copilot AI review requested due to automatic review settings June 2, 2026 03:16
@github-actions github-actions Bot added the pr-size: small Minimal code update label Jun 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes NULL (None) parameter type mapping in mssql_python to avoid triggering SQLDescribeParam (and the associated sp_describe_undeclared_parameters server round-trips) on every execution involving NULL parameters, improving performance for heavily parameterized workloads.

Changes:

  • Update Cursor._map_sql_type to map None parameters to SQL_VARCHAR instead of SQL_UNKNOWN_TYPE.
  • Add a unit test asserting _map_sql_type returns the expected tuple for None parameters (GH-610).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
mssql_python/cursor.py Changes NULL parameter SQL type mapping to SQL_VARCHAR to avoid SQLDescribeParam round-trips.
tests/test_001_globals.py Adds a regression test verifying _map_sql_type(None, ...) returns SQL_VARCHAR and related defaults.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

📊 Code Coverage Report

🔥 Diff Coverage

67%


🎯 Overall Coverage

80%


📈 Total Lines Covered: 6686 out of 8286
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/pybind/ddbc_bindings.cpp (67.5%): Missing lines 443-444,470-476,557-558,1569-1571,1646-1648,1758-1760,1843-1844,2607-2611,2613-2617,2916-2919,3416-3417

Summary

  • Total: 117 lines
  • Missing: 38 lines
  • Coverage: 67%

mssql_python/pybind/ddbc_bindings.cpp

Lines 439-448

  439     SQLULEN size;
  440     LOG("ResolveNullParamType: Cache MISS for hStmt=%p param[%d], calling "
  441         "SQLDescribeParam",
  442         (void*)hStmt, paramIndex);
! 443     RETCODE rc = SQLDescribeParam_ptr(hStmt, static_cast<SQLUSMALLINT>(paramIndex + 1), &type,
! 444                                       &size, &digits, &nullable);
  445 
  446     DescribedParamInfo info;
  447     if (SQL_SUCCEEDED(rc)) {
  448         info = {type, size, digits, true};

Lines 466-480

  466 }
  467 
  468 static void InvalidateDescribeCache(SQLHANDLE hStmt) {
  469     std::unique_lock<std::shared_mutex> lock(g_describeCacheMutex);
! 470     auto erased = g_describeCache.erase(hStmt);
! 471     if (erased) {
! 472         LOG("InvalidateDescribeCache: Cleared cache for hStmt=%p", (void*)hStmt);
! 473     }
! 474 }
! 475 // --- End GH-610 cache ---
! 476 
  477 namespace {
  478 
  479 const char* GetSqlCTypeAsString(const SQLSMALLINT cType) {
  480     switch (cType) {

Lines 553-562

  553                          const std::string& charEncoding = "utf-8") {
  554     LOG("BindParameters: Starting parameter binding for statement handle %p "
  555         "with %zu parameters",
  556         (void*)hStmt, params.size());
! 557 
! 558     for (int paramIndex = 0; paramIndex < static_cast<int>(params.size()); paramIndex++) {
  559         const auto& param = params[paramIndex];
  560         ParamInfo& paramInfo = paramInfos[paramIndex];
  561         LOG("BindParameters: Processing param[%d] - C_Type=%d, SQL_Type=%d, "
  562             "ColumnSize=%lu, DecimalDigits=%d, InputOutputType=%d",

Lines 1565-1575

  1565     std::u16string schema = schemaObj.is_none() ? u"" : schemaObj.cast<std::u16string>();
  1566 
  1567     // Release the GIL during the blocking ODBC catalog call
  1568     py::gil_scoped_release release;
! 1569     return SQLPrimaryKeys_ptr(StatementHandle->get(),
! 1570                               catalog.empty() ? nullptr : reinterpretU16stringAsSqlWChar(catalog),
! 1571                               catalog.empty() ? 0 : SQL_NTS,
  1572                               schema.empty() ? nullptr : reinterpretU16stringAsSqlWChar(schema),
  1573                               schema.empty() ? 0 : SQL_NTS,
  1574                               table.empty() ? nullptr : reinterpretU16stringAsSqlWChar(table),
  1575                               table.empty() ? 0 : SQL_NTS);

Lines 1642-1652

  1642         SQLWCHAR sqlState[6], message[SQL_MAX_MESSAGE_LENGTH_SQLSERVER];
  1643         SQLINTEGER nativeError;
  1644         SQLSMALLINT messageLen;
  1645 
! 1646         SQLRETURN diagReturn =
! 1647             SQLGetDiagRec_ptr(handleType, rawHandle, 1, sqlState, &nativeError, message,
! 1648                               SQL_MAX_MESSAGE_LENGTH_SQLSERVER, &messageLen);
  1649 
  1650         if (SQL_SUCCEEDED(diagReturn)) {
  1651             std::u16string sqlStateUtf16 = dupeSqlWCharAsUtf16Le(sqlState, 5);
  1652             std::u16string messageUtf16 = dupeSqlWCharAsUtf16Le(

Lines 1754-1764

  1754         py::gil_scoped_release release;
  1755         ret = SQLTables_ptr(StatementHandle->get(),
  1756                             catalog.empty() ? nullptr : reinterpretU16stringAsSqlWChar(catalog),
  1757                             catalog.empty() ? 0 : SQL_NTS,
! 1758                             schema.empty() ? nullptr : reinterpretU16stringAsSqlWChar(schema),
! 1759                             schema.empty() ? 0 : SQL_NTS,
! 1760                             table.empty() ? nullptr : reinterpretU16stringAsSqlWChar(table),
  1761                             table.empty() ? 0 : SQL_NTS,
  1762                             tableType.empty() ? nullptr : reinterpretU16stringAsSqlWChar(tableType),
  1763                             tableType.empty() ? 0 : SQL_NTS);
  1764     }

Lines 1839-1848

  1839                     "statement_handle=%p",
  1840                     rc, (void*)hStmt);
  1841                 return rc;
  1842             }
! 1843             // GH-610: Invalidate describe cache (new prepare = new param types)
! 1844             InvalidateDescribeCache(hStmt);
  1845             isStmtPrepared[0] = py::cast(true);
  1846         } else {
  1847             // Make sure the statement has been prepared earlier if we're not
  1848             // preparing now

Lines 2603-2621

  2603                     LOG("BindParameterArray: Binding SQL_C_DEFAULT (NULL) array - param_index=%d, "
  2604                         "count=%zu",
  2605                         paramIndex, paramSetSize);
  2606 
! 2607                     // GH-610: resolve SQL type for all-NULL columns via cache
! 2608                     SQLSMALLINT resolvedSqlType = info.paramSQLType;
! 2609                     SQLULEN resolvedColSize = info.columnSize;
! 2610                     SQLSMALLINT resolvedDecDigits = info.decimalDigits;
! 2611                     if (resolvedSqlType == SQL_UNKNOWN_TYPE) {
  2612                         auto resolved = ResolveNullParamType(hStmt, paramIndex);
! 2613                         resolvedSqlType = resolved.sqlType;
! 2614                         resolvedColSize = resolved.columnSize;
! 2615                         resolvedDecDigits = resolved.decimalDigits;
! 2616                     }
! 2617 
  2618                     // For NULL parameters, we need to allocate a minimal buffer and set all
  2619                     // indicators to SQL_NULL_DATA Use SQL_C_CHAR as a safe default C type for NULL
  2620                     // values
  2621                     char* nullBuffer = AllocateParamBufferArray<char>(tempBuffers, paramSetSize);

Lines 2912-2923

  2912     std::u16string catalog = catalogObj.is_none() ? u"" : catalogObj.cast<std::u16string>();
  2913     std::u16string schema = schemaObj.is_none() ? u"" : schemaObj.cast<std::u16string>();
  2914 
  2915     py::gil_scoped_release release;
! 2916     return SQLSpecialColumns_ptr(StatementHandle->get(), identifierType,
! 2917                                  catalog.empty() ? nullptr
! 2918                                                  : reinterpretU16stringAsSqlWChar(catalog),
! 2919                                  catalog.empty() ? 0 : SQL_NTS,
  2920                                  schema.empty() ? nullptr : reinterpretU16stringAsSqlWChar(schema),
  2921                                  schema.empty() ? 0 : SQL_NTS,
  2922                                  table.empty() ? nullptr : reinterpretU16stringAsSqlWChar(table),
  2923                                  table.empty() ? 0 : SQL_NTS, scope, nullable);

Lines 3412-3421

  3412                                 // exact number of characters via dataLen, so do not rely on
  3413                                 // null termination. This preserves embedded NULs and avoids
  3414                                 // any risk of reading past the valid range if the driver
  3415                                 // omits the terminator.
! 3416                                 row.append(py::cast(
! 3417                                     dupeSqlWCharAsUtf16Le(dataBuffer.data(), numCharsInData)));
  3418                                 LOG("SQLGetData: Appended NVARCHAR string "
  3419                                     "length=%lu for column %d",
  3420                                     (unsigned long)numCharsInData, i);
  3421                             } else {


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 76.4%
mssql_python.row.py: 76.9%
mssql_python.__init__.py: 77.3%
mssql_python.pybind.connection.connection.cpp: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.logging.py: 85.5%
mssql_python.connection.py: 85.6%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

@jahnvi480 jahnvi480 force-pushed the jahnvi/fix-null-param-describe-610 branch from 34b05de to b6f1478 Compare June 2, 2026 08:33
subrata-ms
subrata-ms previously approved these changes Jun 2, 2026
…o longer reachable since _map_sql_type(None) now returns SQL_VARCHAR directly.
Comment thread mssql_python/cursor.py Outdated
@jahnvi480 jahnvi480 marked this pull request as draft June 3, 2026 10:29
jahnvi480 added 2 commits June 5, 2026 09:13
…declared_parameters round-trips (GH-610)

When cursor.execute() includes None (NULL) params, the C++ BindParameters
layer calls SQLDescribeParam for each NULL param on every execution. This
triggers sp_describe_undeclared_parameters on SQL Server — a full network
round-trip per NULL param per call.

Fix: Add a statement-level cache in C++ that stores SQLDescribeParam
results per (hStmt, paramIndex). First execution describes and caches;
all subsequent executions of the same prepared statement get cache hits
with zero round-trips. Cache is cleared on SQLPrepare (new SQL = new
param types).

Changes:
- ddbc_bindings.cpp: Add DescribedParamInfo cache struct, ResolveNullParamType
  helper, cache invalidation on SQLPrepare in both SQLExecute_wrap and
  SQLExecuteMany_wrap. Both BindParameters and BindParameterArray use the
  cache for SQL_UNKNOWN_TYPE NULL params.
- ddbc_bindings.h: Restore SQLDescribeParamFunc typedef and extern.
- cursor.py: Revert _map_sql_type to return SQL_UNKNOWN_TYPE for None
  (let C++ cache handle resolution). Remove executemany SQL_VARCHAR
  hardcoded fallback (C++ BindParameterArray now resolves via cache).
- test_004_cursor.py: Update test to verify SQL_UNKNOWN_TYPE for None.

Closes #610
@github-actions github-actions Bot added pr-size: medium Moderate update size and removed pr-size: small Minimal code update labels Jun 5, 2026
Add 7 integration tests covering all cache code paths:
- Cache miss (first execute with NULL param)
- Cache hit (repeated execute with same SQL + NULL)
- Cache invalidation (different SQL triggers re-prepare)
- executemany all-NULL column (BindParameterArray path)
- executemany multiple all-NULL columns
- All-NULL params in execute
- setinputsizes bypass (explicit type skips cache)
@jahnvi480 jahnvi480 marked this pull request as ready for review June 5, 2026 08:05
@jahnvi480 jahnvi480 changed the title FIX: Skip SQLDescribeParam for NULL params to avoid sp_describe_undeclared_parameters round-trips FIX: Cache SQLDescribeParam results for NULL parameters to reduce server round-trips Jun 5, 2026
};

static std::shared_mutex g_describeCacheMutex;
static std::unordered_map<SQLHANDLE, std::unordered_map<int, DescribedParamInfo>> g_describeCache;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is only ever evicted via the two InvalidateDescribeCache calls after SQLPrepare. Correctness today relies on the invariant that any code path reaching ResolveNullParamType has either just run SQLPrepare on the handle, or is re-executing the same SQL on the same HSTMT. Under that invariant, stale-cache-on-recycled-pointer reads cannot happen.

But the one issue remain:

Memory leak. When an HSTMT is freed without a subsequent SQLPrepare on the recycled pointer, its entries are orphaned forever. Three concrete sources (please confirm this yourself):
Connection::disconnect() marks child STMTs as _implicitly_freed, their cache entries are never touched.
Pooled connections (SQL_ATTR_RESET_CONNECTION path) keep the DBC alive across checkin, so freed STMT pointers rarely get recycled back to the same DBC and cache entries accumulate over the pool's lifetime.
Cursors that never call SQLPrepare (no-param execute >> SQLExecDirect, metadata methods, failed executes) leak on close even with pointer recycling.

Suggested fix — make cache lifetime ≤ handle lifetime by construction:

void SqlHandle::free() {
if (_handle && SQLFreeHandle_ptr) {
// ... existing shutdown / implicitly_freed checks ...
if (_type == SQL_HANDLE_STMT) {
InvalidateDescribeCache(_handle); // GH-610
}
SQLFreeHandle_ptr(_type, _handle);
_handle = nullptr;
}
}

Plus the same call in SQLFreeHandle_wrap and inside the _implicitly_freed short-circuit branch so disconnect-time STMTs are evicted too. The existing SQLPrepare invalidations can stay as harmless safety nets.

static DescribedParamInfo ResolveNullParamType(SQLHANDLE hStmt, int paramIndex) {
// 1. Check cache (shared/read lock — concurrent readers allowed)
{
std::shared_lock<std::shared_mutex> lock(g_describeCacheMutex);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process-global mutex g_describeCacheMutex becomes a serialization point for all cursors across all connections that bind NULL parameters. While the read path uses shared_lock, every cache miss takes the exclusive lock for the writeback. Under multi-threaded workloads with many distinct prepared statements, this is a contention hotspot.

One way to avoid this contention is to move the cache onto SqlHandle (one mutex per statement handle, no contention across statements).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants