Skip to content

Commit 99c3384

Browse files
authored
Feat: allow overriding the dialects' normalization strategies (#2779)
1 parent c062c91 commit 99c3384

File tree

3 files changed

+46
-1
lines changed

3 files changed

+46
-1
lines changed

docs/guides/configuration.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -893,6 +893,27 @@ Example configuration:
893893

894894
The default model kind is `VIEW` unless overridden with the `kind` key. For more information on model kinds, refer to [model concepts page](../concepts/models/model_kinds.md).
895895

896+
##### Identifier resolution
897+
898+
When a SQL engine receives a query such as `SELECT id FROM "some_table"`, it eventually needs to understand what database objects the identifiers `id` and `"some_table"` correspond to. This process is usually referred to as identifier (or name) resolution.
899+
900+
Different SQL dialects implement different rules when resolving identifiers in queries. For example, certain identifiers may be treated as case-sensitive (e.g. if they're quoted), and a case-insensitive identifier is usually either lowercased or uppercased, before the engine actually looks up what object it corresponds to.
901+
902+
SQLMesh analyzes model queries so that it can extract useful information from them, such as computing Column-Level Lineage. To facilitate this analysis, it _normalizes_ and _quotes_ all identifiers in those queries, [respecting each dialect's resolution rules](https://sqlglot.com/sqlglot/dialects/dialect.html#Dialect.normalize_identifier).
903+
904+
The "normalization strategy", i.e. whether case-insensitive identifiers are lowercased or uppercased, is configurable per dialect. For example, to treat all identifiers as case-sensitive in a BigQuery project, one can do:
905+
906+
=== "YAML"
907+
908+
```yaml linenums="1"
909+
model_defaults:
910+
dialect: "bigquery,normalization_strategy=case_sensitive"
911+
```
912+
913+
This may be useful in cases where the name casing needs to be preserved, since then SQLMesh won't be able to normalize them.
914+
915+
See [here](https://sqlglot.com/sqlglot/dialects/dialect.html#NormalizationStrategy) to learn more about the supported normalization strategies.
916+
896917
#### Model Kinds
897918

898919
Model kinds are required in each model file's `MODEL` DDL statement. They may optionally be used to specify a default kind in the model defaults configuration key.

sqlmesh/core/context.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
from types import MappingProxyType
5050

5151
import pandas as pd
52-
from sqlglot import exp
52+
from sqlglot import Dialect, exp
5353
from sqlglot.lineage import GraphHTML
5454

5555
from sqlmesh.core import analytics
@@ -325,6 +325,13 @@ def __init__(
325325

326326
self.path, self.config = t.cast(t.Tuple[Path, C], next(iter(self.configs.items())))
327327

328+
# This allows overriding the default dialect's normalization strategy, so for example
329+
# one can do `dialect="duckdb,normalization_strategy=lowercase"` and this will be
330+
# applied to the DuckDB dialect globally
331+
if "normalization_strategy" in str(self.config.dialect):
332+
dialect = Dialect.get_or_raise(self.config.dialect)
333+
type(dialect).NORMALIZATION_STRATEGY = dialect.normalization_strategy
334+
328335
if self.config.disable_anonymized_analytics:
329336
analytics.disable_analytics()
330337

tests/core/test_context.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -735,3 +735,20 @@ def test_disabled_model(copy_to_temp_path):
735735

736736
assert (path[0] / "models" / "disabled.sql").exists()
737737
assert not context.get_model("sushi.disabled")
738+
739+
740+
def test_override_dialect_normalization_strategy():
741+
config = Config(
742+
model_defaults=ModelDefaultsConfig(dialect="duckdb,normalization_strategy=lowercase")
743+
)
744+
745+
# This has the side-effect of mutating DuckDB globally to override its normalization strategy
746+
Context(config=config)
747+
748+
from sqlglot.dialects import DuckDB
749+
from sqlglot.dialects.dialect import NormalizationStrategy
750+
751+
assert DuckDB.NORMALIZATION_STRATEGY == NormalizationStrategy.LOWERCASE
752+
753+
# The above change is applied globally so we revert it to avoid breaking other tests
754+
DuckDB.NORMALIZATION_STRATEGY = NormalizationStrategy.CASE_INSENSITIVE

0 commit comments

Comments
 (0)