perf: skip unnecessary metadata fetch calls for tags when not configured#1387
perf: skip unnecessary metadata fetch calls for tags when not configured#1387tejassp-db wants to merge 17 commits intomainfrom
Conversation
Skip fetch_tags and fetch_column_tags information_schema queries during incremental and view materializations when the model has no tags configured. This avoids unnecessary server roundtrips on every run for models that don't use tags, while preserving full fetch behavior when tags are present or when the model config is unavailable. PECOBLR-2497
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||
Add unit and functional coverage for skipping tag metadata queries when model config does not require them. This protects the new fetch-planning logic across incremental, view, streaming table, and materialized view test paths without changing unrelated unstaged work.
partial_parse.msgpack does not persist compiled_code, causing get_config_from_model to raise "Cannot compile model ... with no SQL query" for materialized view and streaming table change tests. Co-authored-by: Isaac
Co-authored-by: Isaac
The previous `dict and len(...) > 0` short-circuited to a dict literal when empty, breaking the declared `-> bool` return type. Co-authored-by: Isaac
sd-db
left a comment
There was a problem hiding this comment.
Overall intentation of the change looks good, there are a few things specifically where we need changes
- Function signature - make new param optional + naming + other related changes
- Coverage - Can you confirm on coverage across all supported materialisation types (MV/ST missing; Views don't have support for column_tags)
- Functional test changes
| else: | ||
| results["information_schema.tags"] = None | ||
|
|
||
| column_tag_config = ( |
There was a problem hiding this comment.
Similar comments on naming + checks on desried_column_tag_config
There was a problem hiding this comment.
Suffix _config and surrounding code makes it obvious that its coming from the user configuration.
There was a problem hiding this comment.
Variable naming should still be intentional and descriptive, we should update it to model_column_tag_config
- Add functional tests for optional tag fetching in Materialized view. - Fix tests to check exceptions, and not log lines. - Make parameter optional to be backward compatible with jinja calls.
sd-db
left a comment
There was a problem hiding this comment.
Added some more comments, overall much better. I still feel we should add a comment in all the materialisation types where we are doing this optimisation that this only works because we don't support deleting/removing tags through dbt-databricks, as that will break the optimisations done and better to encode as a comment
| results["information_schema.tags"] = adapter.execute_macro("fetch_tags", kwargs=kwargs) | ||
| results["show_tblproperties"] = adapter.execute_macro("fetch_tbl_properties", kwargs=kwargs) | ||
|
|
||
| table_tag_config = model_config.config.get(TagsProcessor.name) if model_config else None |
There was a problem hiding this comment.
rename to model_table_tag_config we should differentiate between desired(model) configs and existing (relation) configs better
|
|
||
| from tests.functional.adapter.helpers import FAIL_IF_TAG_FETCH_CALLED_MACROS | ||
|
|
||
| VIEW_WITHOUT_TAGS_SQL = """ |
There was a problem hiding this comment.
we should move this inside fixtures file
| """ | ||
|
|
||
|
|
||
| def get_model_config(project, relation: BaseRelation): |
There was a problem hiding this comment.
Do we need this ? Since model_config is optional in the tests we can just not pass it and that also tests for other behaviour overall. Look at the places where this gets called it is most a no-op. I think this is mostly an artifact of the config not being Optional earlier. Since now it is we can look to remove as well.
There was a problem hiding this comment.
Currently this is adding extra complexity while mostly being no-op
| adapter.execute_macro.return_value = MagicMock() | ||
|
|
||
| results = MaterializedViewAPI._describe_relation(adapter, MagicMock()) | ||
| results = MaterializedViewAPI._describe_relation(adapter, MagicMock(), Mock()) |
There was a problem hiding this comment.
nit: while the change is okay, it would be better if just did not pass model_config here, i.e no change as model_config being None, triggers the actual tag fetch macro which is what the test is looking for. This test was added when there no optimisation to not fetch and better to test that path instead of the else path.
| ) | ||
|
|
||
|
|
||
| class TestDescribeRelationMetadataFetchPlanning: |
There was a problem hiding this comment.
nit: These tests look really good, but we missed MV materealisation here which would have this even more complete. Fine without as well.
Summary
What changed
Test plan
Unit tests: empty tags (skip), non-empty tags (fetch), null config fallback (fetch), both tags + column tags present, hive_metastore (skip)
Functional tests: override fetch_tags/fetch_column_tags macros to raise errors, confirming calls are actually skipped when expected
PECOBLR-2497