Describe the feature
The incremental materialization should reuse the columns already fetched
by process_schema_changes instead of letting the downstream strategy
macros re-issue another DESCRIBE TABLE EXTENDED on the same target
relation.
Concretely, capture the return value of process_schema_changes, fall
back to a single adapter.get_columns_in_relation(existing_relation)
when it returns empty (i.e. on_schema_change == 'ignore'), and thread
the result through strategy_arg_dict['dest_columns'] so that
databricks__get_merge_sql, get_delete_insert_sql, and
get_insert_into_sql skip their own DESCRIBE when columns are already
provided. This eliminates one metadata round-trip per incremental model,
per run.
Describe alternatives you've considered
- Adapter-level per-run cache on
get_columns_in_relation — would
also eliminate the redundant calls, but correct cache invalidation
across all the code paths that mutate a relation's schema (ALTER,
schema sync, DROP) is hard to get right.
- Skip
DESCRIBE entirely by using manifest-known columns —
possible in theory but requires dbt-core to expose the compiled
output schema, which is a broader change.
- Status quo — accept one extra
DESCRIBE per model. Low effort
but measurably slower on projects with many small incremental models.
The proposed change is the simplest of the three: capture the columns
already computed inside process_schema_changes and pass them through
the existing strategy_arg_dict['dest_columns'] slot.
Additional context
Observed on a project with 9 incremental stg models under the V1 path
(use_materialization_v2: false) with on_schema_change: 'fail':
- Current: 2 ×
DESCRIBE TABLE EXTENDED <target> AS JSON per model
(one from process_schema_changes, one from the strategy macro)
- With this change: 1 ×
DESCRIBE TABLE EXTENDED <target> AS JSON
per model
Affected strategy macros and line numbers (dbt/include/databricks/macros/materializations/incremental/strategies.sql):
databricks__get_merge_sql — get_columns_in_relation(target) at L276
get_delete_insert_sql — get_columns_in_relation(target_relation) at L141
get_insert_into_sql — get_columns_in_relation(target_relation) at L224
Who will this benefit?
Anyone running incremental models on dbt-databricks, especially users
with many small models where per-model metadata overhead dominates.
Most impactful for scheduled no-op / small-delta dbt run invocations
on SQL warehouses where each round-trip is 100–300 ms.
Are you interested in contributing this feature?
Yes — I'll open a PR linking back to this issue right after filing it.
I have a branch ready and have verified locally (V1 path,
on_schema_change: 'fail') that target DESCRIBE calls drop from 2 to
1 per incremental model across merge, append, and delete+insert
strategies.
Describe the feature
The incremental materialization should reuse the columns already fetched
by
process_schema_changesinstead of letting the downstream strategymacros re-issue another
DESCRIBE TABLE EXTENDEDon the same targetrelation.
Concretely, capture the return value of
process_schema_changes, fallback to a single
adapter.get_columns_in_relation(existing_relation)when it returns empty (i.e.
on_schema_change == 'ignore'), and threadthe result through
strategy_arg_dict['dest_columns']so thatdatabricks__get_merge_sql,get_delete_insert_sql, andget_insert_into_sqlskip their ownDESCRIBEwhen columns are alreadyprovided. This eliminates one metadata round-trip per incremental model,
per run.
Describe alternatives you've considered
get_columns_in_relation— wouldalso eliminate the redundant calls, but correct cache invalidation
across all the code paths that mutate a relation's schema (ALTER,
schema sync, DROP) is hard to get right.
DESCRIBEentirely by using manifest-known columns —possible in theory but requires dbt-core to expose the compiled
output schema, which is a broader change.
DESCRIBEper model. Low effortbut measurably slower on projects with many small incremental models.
The proposed change is the simplest of the three: capture the columns
already computed inside
process_schema_changesand pass them throughthe existing
strategy_arg_dict['dest_columns']slot.Additional context
Observed on a project with 9 incremental stg models under the V1 path
(
use_materialization_v2: false) withon_schema_change: 'fail':DESCRIBE TABLE EXTENDED <target> AS JSONper model(one from
process_schema_changes, one from the strategy macro)DESCRIBE TABLE EXTENDED <target> AS JSONper model
Affected strategy macros and line numbers (
dbt/include/databricks/macros/materializations/incremental/strategies.sql):databricks__get_merge_sql—get_columns_in_relation(target)at L276get_delete_insert_sql—get_columns_in_relation(target_relation)at L141get_insert_into_sql—get_columns_in_relation(target_relation)at L224Who will this benefit?
Anyone running
incrementalmodels on dbt-databricks, especially userswith many small models where per-model metadata overhead dominates.
Most impactful for scheduled no-op / small-delta
dbt runinvocationson SQL warehouses where each round-trip is 100–300 ms.
Are you interested in contributing this feature?
Yes — I'll open a PR linking back to this issue right after filing it.
I have a branch ready and have verified locally (V1 path,
on_schema_change: 'fail') that targetDESCRIBEcalls drop from 2 to1 per incremental model across merge, append, and delete+insert
strategies.