Skip to content

Commit 6f3e9a9

Browse files
authored
Merge branch 'main' into feat/clustered-by-auto-none
2 parents 6ad7244 + 192fbe9 commit 6f3e9a9

40 files changed

Lines changed: 1799 additions & 60 deletions

Makefile

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,13 @@ install-dev-dbt-%:
4949
$(MAKE) install-dev; \
5050
if [ "$$version" = "1.6.0" ]; then \
5151
echo "Applying overrides for dbt 1.6.0"; \
52-
$(PIP) install 'pydantic>=2.0.0' 'google-cloud-bigquery==3.30.0' 'databricks-sdk==0.28.0' --reinstall; \
52+
$(PIP) install 'pydantic>=2.0.0' 'google-cloud-bigquery==3.30.0' 'databricks-sdk==0.28.0' \
53+
'pyOpenSSL>=24.0.0' --reinstall; \
5354
fi; \
5455
if [ "$$version" = "1.7.0" ]; then \
5556
echo "Applying overrides for dbt 1.7.0"; \
56-
$(PIP) install 'databricks-sdk==0.28.0' --reinstall; \
57+
$(PIP) install 'databricks-sdk==0.28.0' \
58+
'pyOpenSSL>=24.0.0' --reinstall; \
5759
fi; \
5860
if [ "$$version" = "1.5.0" ]; then \
5961
echo "Applying overrides for dbt 1.5.0"; \

docs/concepts/models/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ This table lists each engine's support for `TABLE` and `VIEW` object comments:
184184
| DuckDB <=0.9 | N | N |
185185
| DuckDB >=0.10 | Y | Y |
186186
| MySQL | Y | Y |
187-
| MSSQL | N | N |
187+
| MSSQL | Y | Y |
188188
| Postgres | Y | Y |
189189
| GCP Postgres | Y | Y |
190190
| Redshift | Y | N |

docs/concepts/models/python_models.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,33 @@ def entrypoint(
369369
)
370370
```
371371

372+
Blueprint variables can also be used as **column names and column types** in the `columns` dictionary. For example, if each blueprint produces a model with a different set of column names and types, both can be parameterized using the same `@{variable}` syntax:
373+
374+
```python linenums="1"
375+
import pandas as pd
376+
from sqlmesh import ExecutionContext, model
377+
378+
@model(
379+
"@{customer}.metrics",
380+
kind="FULL",
381+
blueprints=[
382+
{"customer": "customer1", "primary_metric": "revenue", "primary_type": "int", "secondary_metric": "cost", "secondary_type": "double"},
383+
{"customer": "customer2", "primary_metric": "sales", "primary_type": "text", "secondary_metric": "profit", "secondary_type": "double"},
384+
],
385+
columns={
386+
"@{primary_metric}": "@{primary_type}",
387+
"@{secondary_metric}": "@{secondary_type}",
388+
},
389+
)
390+
def entrypoint(context: ExecutionContext, **kwargs) -> pd.DataFrame:
391+
return pd.DataFrame({
392+
context.blueprint_var("primary_metric"): [1],
393+
context.blueprint_var("secondary_metric"): [1.5],
394+
})
395+
```
396+
397+
Global variables (defined in the project config) can also be used as column names and types in the same way.
398+
372399
Note the use of curly brace syntax `@{customer}` in the model name above. It is used to ensure SQLMesh can combine the macro variable into the model name identifier correctly - learn more [here](../../concepts/macros/sqlmesh_macros.md#embedding-variables-in-strings).
373400

374401
Blueprint variable mappings can also be constructed dynamically, e.g., by using a macro: `blueprints="@gen_blueprints()"`. This is useful in cases where the `blueprints` list needs to be sourced from external sources, such as CSV files.

docs/integrations/dlt.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,12 @@ This will create the configuration file and directories, which are found in all
2828

2929
SQLMesh will also automatically generate models to ingest data from the pipeline incrementally. Incremental loading is ideal for large datasets where recomputing entire tables is resource-intensive. In this case utilizing the [`INCREMENTAL_BY_TIME_RANGE` model kind](../concepts/models/model_kinds.md#incremental_by_time_range). However, these model definitions can be customized to meet your specific project needs.
3030

31-
#### Specify the path to the pipelines directory
31+
#### Specify the path to the pipelines working directory
3232

33-
The default location for dlt pipelines is `~/.dlt/pipelines/<pipeline_name>`. If your pipelines are in a [different directory](https://dlthub.com/docs/general-usage/pipeline#separate-working-environments-with-pipelines_dir), use the `--dlt-path` argument to specify the path explicitly:
33+
The default location for dlt pipeline working state is `~/.dlt/pipelines/<pipeline_name>`. If dlt stores your pipeline state in a [different pipelines working directory](https://dlthub.com/docs/general-usage/pipeline#separate-working-environments-with-pipelines_dir), use the `--dlt-path` argument to specify that directory explicitly. This should be the directory where dlt stores pipeline state, not the directory containing your pipeline scripts:
3434

3535
```bash
36-
sqlmesh init -t dlt --dlt-pipeline <pipeline-name> --dlt-path <pipelines-directory> dialect
36+
sqlmesh init -t dlt --dlt-pipeline <pipeline-name> --dlt-path <pipelines-working-directory> dialect
3737
```
3838

3939
### Generating models on demand
@@ -58,10 +58,10 @@ sqlmesh dlt_refresh <pipeline-name> --force
5858
sqlmesh dlt_refresh <pipeline-name> --table <dlt-table>
5959
```
6060

61-
- **Provide the explicit path to the pipelines directory** (using `--dlt-path`):
61+
- **Provide the explicit path to the pipelines working directory** (using `--dlt-path`):
6262

6363
```bash
64-
sqlmesh dlt_refresh <pipeline-name> --dlt-path <pipelines-directory>
64+
sqlmesh dlt_refresh <pipeline-name> --dlt-path <pipelines-working-directory>
6565
```
6666

6767
#### Configuration

docs/integrations/engines/clickhouse.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -420,6 +420,54 @@ If a model has many records in each partition, you may see additional performanc
420420

421421
Choose a model's time partitioning granularity based on the characteristics of the data it will process, making sure the total number of partitions is 1000 or fewer.
422422

423+
## Multi-gateway setup
424+
425+
ClickHouse does not have a catalog concept — its fully-qualified table names are two-level (`database.table`), not three-level (`catalog.database.table`).
426+
427+
When a SQLMesh project uses ClickHouse alongside a catalog-aware gateway such as Trino or BigQuery, the two gateway types produce FQNs with different nesting depths. SQLMesh's internal schema tracking requires uniform nesting, so it assigns a **virtual catalog** to ClickHouse models at load time.
428+
429+
### How the virtual catalog works
430+
431+
- SQLMesh automatically detects the nesting mismatch and injects a virtual catalog into each ClickHouse adapter when a catalog-aware gateway is also present.
432+
- ClickHouse models will appear with three-level FQNs in `sqlmesh plan` output and logs — for example, `__ch_prod__.mydb.mytable` for a gateway named `ch_prod`.
433+
- The virtual catalog prefix is **never sent to ClickHouse**. It is stripped from every DDL and DML statement before execution.
434+
- When ClickHouse is the only gateway in a project, no virtual catalog is assigned and models remain two-level.
435+
436+
### Adding a second gateway to an existing ClickHouse-only project
437+
438+
!!! warning "Re-materialization required"
439+
Adding a catalog-aware gateway (such as Trino or BigQuery) to a project that previously used ClickHouse as the only gateway triggers a **full re-materialization of every ClickHouse model** on the next `sqlmesh apply`. Plan for this before making the change.
440+
441+
If your project previously used ClickHouse as the only gateway, your models were fingerprinted with 2-level FQNs (`db.table`). Adding a catalog-aware gateway causes all ClickHouse models to be treated as new versions (their FQNs change to `__{gateway_name}__.db.table`):
442+
443+
- `FULL` models are recreated once — cost is proportional to the size of each table.
444+
- `INCREMENTAL_BY_TIME_RANGE` models require a **full historical backfill** from the model's configured start date.
445+
- The old 2-level model names appear as **Removed** in the plan and will be cleaned up after the environment TTL expires.
446+
447+
This is a one-time cost at the transition point and does not recur. There is no way to skip it — `--forward-only` does not apply because SQLMesh treats the 3-level names as new models, not modified ones.
448+
449+
### Virtual catalog naming
450+
451+
By default, the virtual catalog name is derived from **the gateway name you chose in your config**, wrapped in double underscores — for example, a gateway named `clickhouse` produces `__clickhouse__`, and a gateway named `ch_prod` produces `__ch_prod__`. The double-underscore wrapping makes it visually clear that this is an internal SQLMesh concept, not a real ClickHouse object.
452+
453+
You can override the default name by setting `virtual_catalog` in your ClickHouse connection configuration:
454+
455+
```yaml
456+
gateways:
457+
clickhouse:
458+
connection:
459+
type: clickhouse
460+
host: my-clickhouse-host
461+
username: default
462+
virtual_catalog: ch_virtual # optional; defaults to __{gateway_name}__ (e.g. __clickhouse__)
463+
trino:
464+
connection:
465+
type: trino
466+
...
467+
```
468+
469+
With this configuration, ClickHouse models will appear as `ch_virtual.mydb.mytable` in plan output instead of `__clickhouse__.mydb.mytable`.
470+
423471
## Local/Built-in Scheduler
424472

425473
**Engine Adapter Type**: `clickhouse`
@@ -446,4 +494,5 @@ If a model has many records in each partition, you may see additional performanc
446494
| `server_host_name` | The ClickHouse server hostname as identified by the CN or SNI of its TLS certificate. Set this to avoid SSL errors when connecting through a proxy or tunnel with a different hostname. | string | N |
447495
| `tls_mode` | Controls advanced TLS behavior. proxy and strict do not invoke ClickHouse mutual TLS connection, but do send client cert and key. mutual assumes ClickHouse mutual TLS auth with a client certificate. | string | N |
448496
| `connection_settings` | Additional [connection settings](https://clickhouse.com/docs/integrations/python#settings-argument) | dict | N |
449-
| `connection_pool_options` | Additional [options](https://clickhouse.com/docs/integrations/python#customizing-the-http-connection-pool) for the HTTP connection pool | dict | N |
497+
| `connection_pool_options` | Additional [options](https://clickhouse.com/docs/integrations/python#customizing-the-http-connection-pool) for the HTTP connection pool | dict | N |
498+
| `virtual_catalog` | Override the virtual catalog name used when ClickHouse runs alongside a catalog-aware gateway (e.g. Trino). Defaults to `__{gateway_name}__`. See [Multi-gateway setup](#multi-gateway-setup) for details. | string | N |

docs/integrations/engines/databricks.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,28 @@ The only relevant SQLMesh configuration parameter is the optional `catalog` para
271271
| `disable_databricks_connect` | When running locally, disable the use of Databricks Connect for all model operations (so use SQL Connector for all models) | bool | N |
272272
| `disable_spark_session` | Do not use SparkSession if it is available (like when running in a notebook). | bool | N |
273273

274+
### Query tags
275+
276+
Databricks SQL Connector supports per-query tags through the `query_tags` model session property. Specify tags as a `MAP(...)` of string keys to string or `NULL` values:
277+
278+
```sql
279+
MODEL (
280+
name sqlmesh_example.tagged_model,
281+
dialect databricks,
282+
session_properties (
283+
query_tags = MAP(
284+
'team', 'data-eng',
285+
'app', 'sqlmesh',
286+
'feature', NULL
287+
)
288+
)
289+
);
290+
291+
SELECT 1 AS id;
292+
```
293+
294+
Query tags are only applied when SQLMesh executes SQL through the Databricks SQL Connector. They are not applied when SQLMesh routes execution through Databricks Connect, a Databricks notebook SparkSession, or the Spark engine adapter.
295+
274296
## Model table properties to support altering tables
275297

276298
If you are making a change to the structure of a table that is [forward only](../../guides/incremental_time.md#forward-only-models), then you may need to add the following to your model's `physical_properties`:

docs/integrations/github.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,8 +293,10 @@ Below is an example of how to define the default config for the bot in either YA
293293
| `enable_deploy_command` | Indicates if the `/deploy` command should be enabled in order to allowed synchronized deploys to production. Default: `False` | bool | N |
294294
| `command_namespace` | The namespace to use for SQLMesh commands. For example if you provide `#SQLMesh` as a value then commands will be expected in the format of `#SQLMesh/<command>`. Default: `None` meaning no namespace is used. | string | N |
295295
| `auto_categorize_changes` | Auto categorization behavior to use for the bot. If not provided then the project-wide categorization behavior is used. See [Auto-categorize model changes](https://sqlmesh.readthedocs.io/en/stable/guides/configuration/#auto-categorize-model-changes) for details. | dict | N |
296-
| `default_pr_start` | Default start when creating PR environment plans. If running in a mode where the bot automatically backfills models (based on `auto_categorize_changes` behavior) then this can be used to limit the amount of data backfilled. Defaults to `None` meaning the start date is set to the earliest model's start or to 1 day ago if [data previews](../concepts/plans.md#data-preview) need to be computed. | str | N |
296+
| `default_pr_start` | Default start when creating PR environment plans. If running in a mode where the bot automatically backfills models (based on `auto_categorize_changes` behavior) then this can be used to limit the amount of data backfilled. Defaults to `None` meaning the start date is set to the earliest model's start. | str | N |
297297
| `pr_min_intervals` | Intended for use when `default_pr_start` is set to a relative time, eg `1 week ago`. This ensures that at least this many intervals across every model are included for backfill in the PR environment. Without this, models with an interval unit wider than `default_pr_start` (such as `@monthly` models if `default_pr_start` was set to `1 week ago`) will be excluded from backfill entirely. | int | N |
298+
| `default_pr_preview_start` | Default start when computing [data previews](../concepts/plans.md#data-preview) for forward-only changes in PR environments. Defaults to `yesterday`, independent of `default_pr_start`, so preview data can be limited without reducing the regular PR backfill window. | str | N |
299+
| `pr_preview_min_intervals` | Intended for use when `default_pr_preview_start` is set to a relative time. This ensures that at least this many intervals are included for forward-only previews in the PR environment. Default: `1` | int | N |
298300
| `skip_pr_backfill` | Indicates if the bot should skip backfilling models in the PR environment. Default: `True` | bool | N |
299301
| `pr_include_unmodified` | Indicates whether to include unmodified models in the PR environment. Default to the project's config value (which defaults to `False`) | bool | N |
300302
| `run_on_deploy_to_prod` | Indicates whether to run latest intervals when deploying to prod. If set to false, the deployment will backfill only the changed models up to the existing latest interval in production, ignoring any missing intervals beyond this point. Default: `False` | bool | N |
@@ -320,6 +322,7 @@ Example with all properties defined:
320322
sql: full
321323
seed: full
322324
default_pr_start: "1 week ago"
325+
default_pr_preview_start: "yesterday"
323326
skip_pr_backfill: false
324327
run_on_deploy_to_prod: false
325328
prod_branch_name: production
@@ -344,6 +347,7 @@ Example with all properties defined:
344347
seed=AutoCategorizationMode.FULL,
345348
),
346349
default_pr_start="1 week ago",
350+
default_pr_preview_start="yesterday",
347351
skip_pr_backfill=False,
348352
run_on_deploy_to_prod=False,
349353
prod_branch_name="production",

docs/reference/cli.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,8 @@ Options:
279279
empty.
280280
--dlt-pipeline TEXT DLT pipeline for which to generate a SQLMesh project.
281281
Use alongside template: dlt
282-
--dlt-path TEXT The directory where the DLT pipeline resides. Use
282+
--dlt-path TEXT The DLT pipelines working directory, where DLT stores
283+
pipeline state (by default ~/.dlt/pipelines). Use
283284
alongside template: dlt
284285
--help Show this message and exit.
285286
```

docs/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@ mkdocs-material==9.0.5
44
mkdocs-material-extensions==1.1.1
55
mkdocs-glightbox==0.3.7
66
pdoc==14.5.1
7+
pygments==2.19.2 # Temporary pin: 2.20.0 breaks docs build; revisit after the fix

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ bigquery = [
5151
# pinned an older SQLGlot which is incompatible with SQLMesh
5252
bigframes = ["bigframes>=1.32.0"]
5353
clickhouse = ["clickhouse-connect"]
54-
databricks = ["databricks-sql-connector[pyarrow]"]
54+
databricks = ["databricks-sql-connector[pyarrow]>=4.2.6"]
5555
dev = [
5656
"agate",
5757
"beautifulsoup4",

0 commit comments

Comments
 (0)