Follow-ups to #1398 fix: consolidate DBR capability cache writers and surface query failures #1413

sd-db · 2026-04-21T10:06:03Z

sd-db
Apr 21, 2026
Maintainer

Once the fix for #1398 lands — it gates the cache write on a non-None DBR version so a transient _query_dbr_version failure can't permanently disable every capability-gated feature — there are two small cleanups that I deliberately left out of the bug-fix PR to keep it surgical. Tracking them here for later.

1. Consolidate `_cache_dbr_capabilities` and `_try_cache_dbr_capabilities`

After #1398's fix the two methods are byte-identical:

_cache_dbr_capabilities — called from open() after a successful handshake.
_try_cache_dbr_capabilities — called eagerly from _create_fresh_connection before the credentials_manager is set.

Both now guard the write on dbr_version is not None. A follow-up PR should:

Delete _try_cache_dbr_capabilities.
Rename _cache_dbr_capabilities to convey the idempotent + fail-safe semantics (e.g. _ensure_dbr_capabilities).
Update the call site in _create_fresh_connection.
Parametrize the duplicated TestCacheDbr / TestTryCacheDbr classes in tests/unit/test_connection_manager.py over the single remaining method.

Expected size: ~40 LOC removed net. Low risk — both call sites already tolerate a missing cache entry.

2. Stop silently swallowing every exception inside `_query_dbr_version`

dbt/adapters/databricks/connections.py currently has:

except Exception:
    pass

This is what made #1398 painful to diagnose in the first place — a permission error on SET spark.databricks.clusterUsageTags.sparkVersion, a transient network blip, or any other runtime error during version detection is indistinguishable from a cluster whose version we just couldn't parse.

Minimum follow-up: log the exception at debug level.

except Exception as exc:
    logger.debug(f"DBR version query failed for {http_path}: {exc}")

Ideally we would also narrow the catch to the exception types we actually expect and let genuine bugs propagate, but that is a larger change.

Non-goals

Not proposing a retry layer inside the cache writer. The open() path already has retry logic, and post-bug: _cache_dbr_capabilities permanently poisons capability cache when version query fails #1398 each subsequent open() can re-attempt the version query cleanly. Adding another retry layer would just obscure the first failure.
Not changing the default-on-unknown-version behavior (dbr_version is None → capability = False). That conservative default is correct — the bug was that we got stuck in it, not that it existed.

Happy to carry these out myself once #1398's fix merges.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-ups to #1398 fix: consolidate DBR capability cache writers and surface query failures #1413

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Follow-ups to #1398 fix: consolidate DBR capability cache writers and surface query failures #1413

Uh oh!

sd-db Apr 21, 2026 Maintainer

1. Consolidate _cache_dbr_capabilities and _try_cache_dbr_capabilities

2. Stop silently swallowing every exception inside _query_dbr_version

Non-goals

Replies: 0 comments

sd-db
Apr 21, 2026
Maintainer

1. Consolidate `_cache_dbr_capabilities` and `_try_cache_dbr_capabilities`

2. Stop silently swallowing every exception inside `_query_dbr_version`