You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once the fix for #1398 lands — it gates the cache write on a non-None DBR version so a transient _query_dbr_version failure can't permanently disable every capability-gated feature — there are two small cleanups that I deliberately left out of the bug-fix PR to keep it surgical. Tracking them here for later.
1. Consolidate _cache_dbr_capabilities and _try_cache_dbr_capabilities
After #1398's fix the two methods are byte-identical:
_cache_dbr_capabilities — called from open() after a successful handshake.
_try_cache_dbr_capabilities — called eagerly from _create_fresh_connection before the credentials_manager is set.
Both now guard the write on dbr_version is not None. A follow-up PR should:
Delete _try_cache_dbr_capabilities.
Rename _cache_dbr_capabilities to convey the idempotent + fail-safe semantics (e.g. _ensure_dbr_capabilities).
Update the call site in _create_fresh_connection.
Parametrize the duplicated TestCacheDbr / TestTryCacheDbr classes in tests/unit/test_connection_manager.py over the single remaining method.
Expected size: ~40 LOC removed net. Low risk — both call sites already tolerate a missing cache entry.
2. Stop silently swallowing every exception inside _query_dbr_version
dbt/adapters/databricks/connections.py currently has:
except Exception:
pass
This is what made #1398 painful to diagnose in the first place — a permission error on SET spark.databricks.clusterUsageTags.sparkVersion, a transient network blip, or any other runtime error during version detection is indistinguishable from a cluster whose version we just couldn't parse.
Minimum follow-up: log the exception at debug level.
except Exception as exc:
logger.debug(f"DBR version query failed for {http_path}: {exc}")
Ideally we would also narrow the catch to the exception types we actually expect and let genuine bugs propagate, but that is a larger change.
Not changing the default-on-unknown-version behavior (dbr_version is None → capability = False). That conservative default is correct — the bug was that we got stuck in it, not that it existed.
Happy to carry these out myself once #1398's fix merges.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Once the fix for #1398 lands — it gates the cache write on a non-None DBR version so a transient
_query_dbr_versionfailure can't permanently disable every capability-gated feature — there are two small cleanups that I deliberately left out of the bug-fix PR to keep it surgical. Tracking them here for later.1. Consolidate
_cache_dbr_capabilitiesand_try_cache_dbr_capabilitiesAfter #1398's fix the two methods are byte-identical:
_cache_dbr_capabilities— called fromopen()after a successful handshake._try_cache_dbr_capabilities— called eagerly from_create_fresh_connectionbefore thecredentials_manageris set.Both now guard the write on
dbr_version is not None. A follow-up PR should:_try_cache_dbr_capabilities._cache_dbr_capabilitiesto convey the idempotent + fail-safe semantics (e.g._ensure_dbr_capabilities)._create_fresh_connection.TestCacheDbr/TestTryCacheDbrclasses intests/unit/test_connection_manager.pyover the single remaining method.Expected size: ~40 LOC removed net. Low risk — both call sites already tolerate a missing cache entry.
2. Stop silently swallowing every exception inside
_query_dbr_versiondbt/adapters/databricks/connections.pycurrently has:This is what made #1398 painful to diagnose in the first place — a permission error on
SET spark.databricks.clusterUsageTags.sparkVersion, a transient network blip, or any other runtime error during version detection is indistinguishable from a cluster whose version we just couldn't parse.Minimum follow-up: log the exception at
debuglevel.Ideally we would also narrow the catch to the exception types we actually expect and let genuine bugs propagate, but that is a larger change.
Non-goals
open()path already has retry logic, and post-bug: _cache_dbr_capabilities permanently poisons capability cache when version query fails #1398 each subsequentopen()can re-attempt the version query cleanly. Adding another retry layer would just obscure the first failure.dbr_version is None → capability = False). That conservative default is correct — the bug was that we got stuck in it, not that it existed.Happy to carry these out myself once #1398's fix merges.
Beta Was this translation helpful? Give feedback.
All reactions