Skip to content

Commit 6564b4e

Browse files
authored
Merge branch 'main' into add_fabric_warehouse
2 parents f5a562e + 50b57db commit 6564b4e

File tree

221 files changed

+14489
-1019
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

221 files changed

+14489
-1019
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,4 +157,4 @@ metastore_db/
157157
spark-warehouse/
158158

159159
# claude
160-
.claude/
160+
.claude/

docs/concepts/models/overview.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -507,11 +507,15 @@ Some properties are only available in specific model kinds - see the [model conf
507507
: Set this to true to indicate that all changes to this model should be [forward-only](../plans.md#forward-only-plans).
508508

509509
### on_destructive_change
510-
: What should happen when a change to a [forward-only model](../../guides/incremental_time.md#forward-only-models) or incremental model in a [forward-only plan](../plans.md#forward-only-plans) causes a destructive modification to the table schema (i.e., requires dropping an existing column).
510+
: What should happen when a change to a [forward-only model](../../guides/incremental_time.md#forward-only-models) or incremental model in a [forward-only plan](../plans.md#forward-only-plans) causes a destructive modification to the table schema (i.e., requires dropping an existing column or modifying column constraints in ways that could cause data loss).
511511

512512
SQLMesh checks for destructive changes at plan time based on the model definition and run time based on the model's underlying physical tables.
513513

514-
Must be one of the following values: `allow`, `warn`, or `error` (default).
514+
Must be one of the following values: `allow`, `warn`, `error` (default), or `ignore`.
515+
516+
!!! warning "Ignore is Dangerous"
517+
518+
`ignore` is dangerous since it can result in error or data loss. It likely should never be used but could be useful as an "escape-hatch" or a way to workaround unexpected behavior.
515519

516520
### disable_restatement
517521
: Set this to true to indicate that [data restatement](../plans.md#restatement-plans) is disabled for this model.

docs/guides/custom_materializations.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ class CustomFullMaterialization(CustomMaterialization):
6464
query_or_df: QueryOrDF,
6565
model: Model,
6666
is_first_insert: bool,
67+
render_kwargs: t.Dict[str, t.Any],
6768
**kwargs: t.Any,
6869
) -> None:
6970
self.adapter.replace_query(table_name, query_or_df)
@@ -78,6 +79,7 @@ Let's unpack this materialization:
7879
* `query_or_df` - a query (of SQLGlot expression type) or DataFrame (Pandas, PySpark, or Snowpark) instance to be inserted
7980
* `model` - the model definition object used to access model parameters and user-specified materialization arguments
8081
* `is_first_insert` - whether this is the first insert for the current version of the model (used with batched or multi-step inserts)
82+
* `render_kwargs` - a dictionary of arguments used to render the model query
8183
* `kwargs` - additional and future arguments
8284
* The `self.adapter` instance is used to interact with the target engine. It comes with a set of useful high-level APIs like `replace_query`, `columns`, and `table_exists`, but also supports executing arbitrary SQL expressions with its `execute` method.
8385

@@ -150,6 +152,7 @@ class CustomFullMaterialization(CustomMaterialization):
150152
query_or_df: QueryOrDF,
151153
model: Model,
152154
is_first_insert: bool,
155+
render_kwargs: t.Dict[str, t.Any],
153156
**kwargs: t.Any,
154157
) -> None:
155158
config_value = model.custom_materialization_properties["config_key"]
@@ -232,6 +235,7 @@ class CustomFullMaterialization(CustomMaterialization[MyCustomKind]):
232235
query_or_df: QueryOrDF,
233236
model: Model,
234237
is_first_insert: bool,
238+
render_kwargs: t.Dict[str, t.Any],
235239
**kwargs: t.Any,
236240
) -> None:
237241
assert isinstance(model.kind, MyCustomKind)

docs/guides/incremental_time.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,12 @@ The check is performed at plan time based on the model definition. SQLMesh may n
171171

172172
A model's `on_destructive_change` [configuration setting](../reference/model_configuration.md#incremental-models) determines what happens when SQLMesh detects a destructive change.
173173

174-
By default, SQLMesh will error so no data is lost. You can set `on_destructive_change` to `warn` or `allow` in the model's `MODEL` block to allow destructive changes.
174+
By default, SQLMesh will error so no data is lost. You can set `on_destructive_change` to `warn` or `allow` in the model's `MODEL` block to allow destructive changes.
175+
`ignore` can be used to not perform the schema change and allow the table's definition to diverge from the model definition.
176+
177+
!!! warning "Ignore is Dangerous"
178+
179+
`ignore` is dangerous since it can result in error or data loss. It likely should never be used but could be useful as an "escape-hatch" or a way to workaround unexpected behavior.
175180

176181
This example configures a model to silently `allow` destructive changes:
177182

docs/integrations/dbt.md

Lines changed: 88 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -44,16 +44,36 @@ Prepare an existing dbt project to be run by SQLMesh by executing the `sqlmesh i
4444
$ sqlmesh init -t dbt
4545
```
4646

47-
SQLMesh will use the data warehouse connection target in your dbt project `profiles.yml` file. The target can be changed at any time.
47+
This will create a file called `sqlmesh.yaml` containing the [default model start date](../reference/model_configuration.md#model-defaults). This configuration file is a minimum starting point for enabling SQLMesh to work with your DBT project.
48+
49+
As you become more comfortable with running your project under SQLMesh, you may specify additional SQLMesh [configuration](../reference/configuration.md) as required to unlock more features.
50+
51+
!!! note "profiles.yml"
52+
53+
SQLMesh will use the existing data warehouse connection target from your dbt project's `profiles.yml` file so the connection configuration does not need to be duplicated in `sqlmesh.yaml`. You may change the target at any time in the dbt config and SQLMesh will pick up the new target.
4854

4955
### Setting model backfill start dates
5056

51-
Models **require** a start date for backfilling data through use of the `start` configuration parameter. `start` can be defined individually for each model in its `config` block or globally in the `dbt_project.yml` file as follows:
57+
Models **require** a start date for backfilling data through use of the `start` configuration parameter. `start` can be defined individually for each model in its `config` block or globally in the `sqlmesh.yaml` file as follows:
5258

53-
```
54-
> models:
55-
> +start: Jan 1 2000
56-
```
59+
=== "sqlmesh.yaml"
60+
61+
```yaml
62+
model_defaults:
63+
start: '2000-01-01'
64+
```
65+
66+
=== "dbt Model"
67+
68+
```jinja
69+
{{
70+
config(
71+
materialized='incremental',
72+
start='2000-01-01',
73+
...
74+
)
75+
}}
76+
```
5777

5878
### Configuration
5979

@@ -63,47 +83,89 @@ SQLMesh derives a project's configuration from its dbt configuration files. This
6383

6484
[Certain engines](https://sqlmesh.readthedocs.io/en/stable/guides/configuration/?h=unsupported#state-connection), like Trino, cannot be used to store SQLMesh's state.
6585

66-
As a workaround, we recommend specifying a supported state engine using the `state_connection` argument instead.
86+
In addition, even if your warehouse is supported for state, you may find that you get better performance by using a [traditional database](../concepts/state.md) to store state as these are a better fit for the state workload than a warehouse optimized for analytics workloads.
6787

68-
Learn more about how to configure state connections in Python [here](https://sqlmesh.readthedocs.io/en/stable/guides/configuration/#state-connection).
88+
In these cases, we recommend specifying a [supported production state engine](../concepts/state.md#state) using the `state_connection` configuration.
6989

70-
#### Runtime vars
90+
This involves updating `sqlmesh.yaml` to add a gateway configuration for the state connection:
7191

72-
dbt supports passing variable values at runtime with its [CLI `vars` option](https://docs.getdbt.com/docs/build/project-variables#defining-variables-on-the-command-line).
92+
```yaml
93+
gateways:
94+
"": # "" (empty string) is the default gateway
95+
state_connection:
96+
type: postgres
97+
...
7398

74-
In SQLMesh, these variables are passed via configurations. When you initialize a dbt project with `sqlmesh init`, a file `config.py` is created in your project directory.
99+
model_defaults:
100+
start: '2000-01-01'
101+
```
75102
76-
The file creates a SQLMesh `config` object pointing to the project directory:
103+
Or, for a specific dbt profile defined in `profiles.yml`, eg `dev`:
77104

78-
```python
79-
config = sqlmesh_config(Path(__file__).parent)
105+
```yaml
106+
gateways:
107+
dev: # must match the target dbt profile name
108+
state_connection:
109+
type: postgres
110+
...
111+
112+
model_defaults:
113+
start: '2000-01-01'
80114
```
81115

82-
Specify runtime variables by adding a Python dictionary to the `sqlmesh_config()` `variables` argument.
116+
Learn more about how to configure state connections [here](https://sqlmesh.readthedocs.io/en/stable/guides/configuration/#state-connection).
117+
118+
#### Runtime vars
119+
120+
dbt supports passing variable values at runtime with its [CLI `vars` option](https://docs.getdbt.com/docs/build/project-variables#defining-variables-on-the-command-line).
121+
122+
In SQLMesh, these variables are passed via configurations. When you initialize a dbt project with `sqlmesh init`, a file `sqlmesh.yaml` is created in your project directory.
123+
124+
You may define global variables in the same way as a native project by adding a `variables` section to the config.
83125

84126
For example, we could specify the runtime variable `is_marketing` and its value `no` as:
85127

86-
```python
87-
config = sqlmesh_config(
88-
Path(__file__).parent,
89-
variables={"is_marketing": "no"}
90-
)
128+
```yaml
129+
variables:
130+
is_marketing: no
131+
132+
model_defaults:
133+
start: '2000-01-01'
91134
```
92135

136+
Variables can also be set at the gateway/profile level which override variables set at the project level. See the [variables documentation](../concepts/macros/sqlmesh_macros.md#gateway-variables) to learn more about how to specify them at different levels.
137+
138+
#### Combinations
139+
93140
Some projects use combinations of runtime variables to control project behavior. Different combinations can be specified in different `sqlmesh_config` objects, with the relevant configuration passed to the SQLMesh CLI command.
94141

142+
!!! info "Python config"
143+
144+
Switching between different config objects requires the use of [Python config](../guides/configuration.md#python) instead of the default YAML config.
145+
146+
You will need to create a file called `config.py` in the root of your project with the following contents:
147+
148+
```py
149+
from pathlib import Path
150+
from sqlmesh.dbt.loader import sqlmesh_config
151+
152+
config = sqlmesh_config(Path(__file__).parent)
153+
```
154+
155+
Note that any config from `sqlmesh.yaml` will be overlayed on top of the active Python config so you dont need to remove the `sqlmesh.yaml` file
156+
95157
For example, consider a project with a special configuration for the `marketing` department. We could create separate configurations to pass at runtime like this:
96158

97159
```python
98160
config = sqlmesh_config(
99-
Path(__file__).parent,
100-
variables={"is_marketing": "no", "include_pii": "no"}
101-
)
161+
Path(__file__).parent,
162+
variables={"is_marketing": "no", "include_pii": "no"}
163+
)
102164
103165
marketing_config = sqlmesh_config(
104-
Path(__file__).parent,
105-
variables={"is_marketing": "yes", "include_pii": "yes"}
106-
)
166+
Path(__file__).parent,
167+
variables={"is_marketing": "yes", "include_pii": "yes"}
168+
)
107169
```
108170

109171
By default, SQLMesh will use the configuration object named `config`. Use a different configuration by passing the object name to SQLMesh CLI commands with the `--config` option. For example, we could run a `plan` with the marketing configuration like this:

docs/integrations/engines/bigquery.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ pip install "sqlmesh[bigquery]"
145145
| Option | Description | Type | Required |
146146
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------:|:--------:|
147147
| `type` | Engine type name - must be `bigquery` | string | Y |
148-
| `method` | Connection methods - see [allowed values below](#connection-methods). Default: `oauth`. | string | N |
148+
| `method` | Connection methods - see [allowed values below](#authentication-methods). Default: `oauth`. | string | N |
149149
| `project` | The ID of the GCP project | string | N |
150150
| `location` | The location of for the datasets (can be regional or multi-regional) | string | N |
151151
| `execution_project` | The name of the GCP project to bill for the execution of the models. If not set, the project associated with the model will be used. | string | N |

examples/custom_materializations/custom_materializations/custom_kind.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,9 @@ def insert(
2424
query_or_df: QueryOrDF,
2525
model: Model,
2626
is_first_insert: bool,
27+
render_kwargs: t.Dict[str, t.Any],
2728
**kwargs: t.Any,
2829
) -> None:
2930
assert type(model.kind).__name__ == "ExtendedCustomKind"
3031

31-
self._replace_query_for_model(model, table_name, query_or_df)
32+
self._replace_query_for_model(model, table_name, query_or_df, render_kwargs)

examples/custom_materializations/custom_materializations/full.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ def insert(
1717
query_or_df: QueryOrDF,
1818
model: Model,
1919
is_first_insert: bool,
20+
render_kwargs: t.Dict[str, t.Any],
2021
**kwargs: t.Any,
2122
) -> None:
22-
self._replace_query_for_model(model, table_name, query_or_df)
23+
self._replace_query_for_model(model, table_name, query_or_df, render_kwargs)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"sqlmesh.projectPaths": ["./repo_1", "./repo_2"]
3+
}

0 commit comments

Comments
 (0)