Skip to content

Incremental strategies fire a redundant DESCRIBE on the target even after process_schema_changes #1411

@moomindani

Description

@moomindani

Describe the feature

The incremental materialization should reuse the columns already fetched
by process_schema_changes instead of letting the downstream strategy
macros re-issue another DESCRIBE TABLE EXTENDED on the same target
relation.

Concretely, capture the return value of process_schema_changes, fall
back to a single adapter.get_columns_in_relation(existing_relation)
when it returns empty (i.e. on_schema_change == 'ignore'), and thread
the result through strategy_arg_dict['dest_columns'] so that
databricks__get_merge_sql, get_delete_insert_sql, and
get_insert_into_sql skip their own DESCRIBE when columns are already
provided. This eliminates one metadata round-trip per incremental model,
per run.

Describe alternatives you've considered

  1. Adapter-level per-run cache on get_columns_in_relation — would
    also eliminate the redundant calls, but correct cache invalidation
    across all the code paths that mutate a relation's schema (ALTER,
    schema sync, DROP) is hard to get right.
  2. Skip DESCRIBE entirely by using manifest-known columns
    possible in theory but requires dbt-core to expose the compiled
    output schema, which is a broader change.
  3. Status quo — accept one extra DESCRIBE per model. Low effort
    but measurably slower on projects with many small incremental models.

The proposed change is the simplest of the three: capture the columns
already computed inside process_schema_changes and pass them through
the existing strategy_arg_dict['dest_columns'] slot.

Additional context

Observed on a project with 9 incremental stg models under the V1 path
(use_materialization_v2: false) with on_schema_change: 'fail':

  • Current: 2 × DESCRIBE TABLE EXTENDED <target> AS JSON per model
    (one from process_schema_changes, one from the strategy macro)
  • With this change: 1 × DESCRIBE TABLE EXTENDED <target> AS JSON
    per model

Affected strategy macros and line numbers (dbt/include/databricks/macros/materializations/incremental/strategies.sql):

  • databricks__get_merge_sqlget_columns_in_relation(target) at L276
  • get_delete_insert_sqlget_columns_in_relation(target_relation) at L141
  • get_insert_into_sqlget_columns_in_relation(target_relation) at L224

Who will this benefit?

Anyone running incremental models on dbt-databricks, especially users
with many small models where per-model metadata overhead dominates.
Most impactful for scheduled no-op / small-delta dbt run invocations
on SQL warehouses where each round-trip is 100–300 ms.

Are you interested in contributing this feature?

Yes — I'll open a PR linking back to this issue right after filing it.
I have a branch ready and have verified locally (V1 path,
on_schema_change: 'fail') that target DESCRIBE calls drop from 2 to
1 per incremental model across merge, append, and delete+insert
strategies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions