Skip to content

Commit 5c4ce6c

Browse files
authored
Docs: add info on specifying model kinds in python models (#1518)
* Elaborate on how to specify model kinds in python models * Add model kind specification to python models concepts doc * Streamline model configuration ref info
1 parent 28b6c86 commit 5c4ce6c

File tree

4 files changed

+153
-87
lines changed

4 files changed

+153
-87
lines changed

docs/concepts/models/python_models.md

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,37 @@ The function takes an `ExecutionContext` that is able to run queries and to retr
3737

3838
If the function output is too large, it can also be returned in chunks using Python generators.
3939

40+
## `@model` specification
41+
42+
The arguments provided in the `@model` specification have the same names as those provided in a SQL model's `MODEL` DDL.
43+
44+
Most of the arguments are simply Python-formatted equivalents of the SQL version, but Python model `kind`s are specified with model kind objects. All model kind arguments are listed in the [models configuration reference page](../reference/model_configuration.md#model-kind-properties). A model's `kind` object must be imported at the beginning of the model definition file before use in the model specification.
45+
46+
Supported model kind objects include:
47+
48+
- [ViewKind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#ViewKind)
49+
- [FullKind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#FullKind)
50+
- [SeedKind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#SeedKind)
51+
- [IncrementalByTimeRangeKind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#IncrementalByTimeRangeKind)
52+
- [IncrementalByUniqueKeyKind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#IncrementalByUniqueKeyKind)
53+
- [SCDType2Kind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#SCDType2Kind)
54+
- [EmbeddedKind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#EmbeddedKind)
55+
- [ExternalKind()](https://sqlmesh.readthedocs.io/en/stable/_readthedocs/html/sqlmesh/core/model/kind.html#ExternalKind)
56+
57+
This example demonstrates how to specify an incremental by time range model kind in Python:
58+
59+
```python linenums="1"
60+
from sqlmesh import ExecutionContext, model
61+
from sqlmesh.core.model import IncrementalByTimeRangeKind
62+
63+
@model(
64+
"docs_example.incremental_model",
65+
kind=IncrementalByTimeRangeKind(
66+
time_column="ds"
67+
)
68+
)
69+
```
70+
4071
## Execution context
4172
Python models can do anything you want, but it is strongly recommended for all models to be [idempotent](../glossary.md#idempotency). Python models can fetch data from upstream models or even data outside of SQLMesh.
4273

@@ -50,7 +81,7 @@ df = context.fetchdf("SELECT * FROM my_table")
5081
In order to fetch data from an upstream model, you first get the table name using `context`'s `table` method. This returns the appropriate table name for the current runtime [environment](../environments.md):
5182

5283
```python linenums="1"
53-
table = context.table("upstream_model")
84+
table = context.table("docs_example.upstream_model")
5485
df = context.fetchdf(f"SELECT * FROM {table}")
5586
```
5687

@@ -63,7 +94,7 @@ In this example, only `upstream_dependency` will be captured, while `another_dep
6394
```python linenums="1"
6495
@model(
6596
"my_model.with_explicit_dependencies",
66-
depends_on=["upstream_dependency"], # captured
97+
depends_on=["docs_example.upstream_dependency"], # captured
6798
)
6899
def execute(
69100
context: ExecutionContext,
@@ -74,7 +105,7 @@ def execute(
74105
) -> pd.DataFrame:
75106

76107
# ignored due to @model dependency "upstream_dependency"
77-
context.table("another_dependency")
108+
context.table("docs_example.another_dependency")
78109
```
79110

80111
## Examples
@@ -91,7 +122,7 @@ import pandas as pd
91122
from sqlmesh import ExecutionContext, model
92123

93124
@model(
94-
"basic",
125+
"docs_example.basic",
95126
owner="janet",
96127
cron="@daily",
97128
columns={
@@ -123,7 +154,7 @@ import pandas as pd
123154
from sqlmesh import ExecutionContext, model
124155

125156
@model(
126-
"sql_pandas",
157+
"docs_example.sql_pandas",
127158
columns={
128159
"id": "int",
129160
"name": "text",
@@ -161,7 +192,7 @@ from pyspark.sql import DataFrame, functions
161192
from sqlmesh import ExecutionContext, model
162193

163194
@model(
164-
"pyspark",
195+
"docs_example.pyspark",
165196
columns={
166197
"id": "int",
167198
"name": "text",
@@ -194,7 +225,7 @@ This examples uses the Python generator `yield` to batch the model output:
194225

195226
```python linenums="1" hl_lines="20"
196227
@model(
197-
"batching",
228+
"docs_example.batching",
198229
columns={
199230
"id": "int",
200231
},

docs/guides/configuration.md

Lines changed: 57 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -775,6 +775,8 @@ Example configuration specifying a Postgres default connection, in-memory DuckDB
775775

776776
### Models
777777

778+
#### Model defaults
779+
778780
The `model_defaults` key is **required** and must contain a value for the `dialect` key. All SQL dialects [supported by the SQLGlot library](https://github.com/tobymao/sqlglot/blob/main/sqlglot/dialects/dialect.py) are allowed. Other values are set automatically unless explicitly overridden in the model definition.
779781

780782
All supported `model_defaults` keys are listed in the [models configuration reference page](../reference/model_configuration.md#model-defaults).
@@ -804,82 +806,82 @@ Example configuration:
804806
)
805807
```
806808

807-
#### Model Kind
808-
The default model kind is 'view' unless overridden with the `kind` key. For more information, refer to [model kinds](../concepts/models/model_kinds.md).
809+
The default model kind is `VIEW` unless overridden with the `kind` key. For more information on model kinds, refer to [model concepts page](../concepts/models/model_kinds.md).
809810

810-
Example:
811+
#### Model Kinds
811812

812-
=== "YAML"
813+
Model kinds are required in each model file's `MODEL` DDL statement. They may optionally be used to specify a default kind in the model defaults configuration key.
813814

814-
```yaml linenums="1"
815-
model_defaults:
816-
dialect: snowflake
817-
kind: full
818-
```
815+
All model kind specification keys are listed in the [models configuration reference page](../reference/model_configuration.md#model-kind-properties).
819816

820-
=== "Python"
817+
The `VIEW`, `FULL`, and `EMBEDDED` model kinds are specified by name only, while other models kinds require additional parameters and are provided with an array of parameters:
821818

822-
```python linenums="1"
823-
from sqlmesh.core.config import Config, ModelDefaultsConfig
819+
=== "YAML"
824820

825-
config = Config(
826-
model_defaults=ModelDefaultsConfig(
827-
dialect="snowflake",
828-
kind="full",
829-
),
830-
)
831-
```
821+
`FULL` model only requires a name:
832822

833-
If a kind requires additional parameters it can be provided as an object:
823+
```sql linenums="1"
824+
MODEL(
825+
name docs_example.full_model,
826+
kind FULL
827+
);
828+
```
834829

835-
=== "YAML"
830+
`INCREMENTAL_BY_TIME_RANGE` requires an array specifying the model's `time_column`:
836831

837-
```yaml linenums="1"
838-
model_defaults:
839-
dialect: snowflake,
840-
kind:
841-
name: incremental_by_time_range
842-
time_column: ds
832+
```sql linenums="1"
833+
MODEL(
834+
name docs_example.incremental_model,
835+
kind INCREMENTAL_BY_TIME_RANGE (
836+
time_column ds
837+
)
838+
);
843839
```
844840

845-
=== "Python"
841+
Python model kinds are specified with model kind objects. Python model kind objects have the same arguments as their SQL counterparts, listed in the [models configuration reference page](../reference/model_configuration.md#model-kind-properties).
846842

847-
The Python `model_defaults` `kind` argument takes a model kind object with a value of:
843+
This example demonstrates how to specify an incremental by time range model kind in Python:
848844

849-
- EmbeddedKind
850-
- ExternalKind
851-
- FullKind
852-
- IncrementalByTimeRangeKind
853-
- IncrementalByUniqueKeyKind
854-
- IncrementalUnmanagedKind
855-
- SeedKind
856-
- ViewKind
845+
=== "Python"
857846

858847
```python linenums="1"
859-
from sqlmesh.core.config import (
860-
Config,
861-
ModelDefaultsConfig,
862-
IncrementalByTimeRangeKind
863-
)
864-
865-
config = Config(
866-
model_defaults=ModelDefaultsConfig(
867-
dialect="snowflake",
868-
kind=IncrementalByTimeRangeKind(
869-
time_column="ds",
870-
),
871-
),
848+
from sqlmesh import ExecutionContext, model
849+
from sqlmesh.core.model import IncrementalByTimeRangeKind
850+
851+
@model(
852+
"docs_example.incremental_model",
853+
kind=IncrementalByTimeRangeKind(
854+
time_column="ds"
855+
)
872856
)
873857
```
874858

859+
Learn more about specifying Python models at the [Python models concepts page](../concepts/models/python_models.md#model-specification).
860+
875861
### Debug mode
876862

877-
To enable debug mode set the `SQLMESH_DEBUG` environment variable to one of the following values: `1`, `true`, `t`, `yes` or `y`.
863+
To enable debug mode set the `SQLMESH_DEBUG` environment variable to one of the following values: "1", "true", "t", "yes" or "y".
878864

879-
Enabling this mode ensures that full backtraces are printed when using CLI. Additionally, the default log level is set to `DEBUG` when this mode is enabled.
865+
Enabling this mode ensures that full backtraces are printed when using CLI. The default log level is set to `DEBUG` when this mode is enabled.
880866

881867
Example enabling debug mode for the CLI command `sqlmesh plan`:
882868

883-
```bash
884-
$ SQLMESH_DEBUG=1 sqlmesh plan
885-
```
869+
=== "Bash"
870+
871+
```bash
872+
$ SQLMESH_DEBUG=1 sqlmesh plan
873+
```
874+
875+
=== "MS Powershell"
876+
877+
```powershell
878+
PS> $env:SQLMESH_DEBUG=1
879+
PS> sqlmesh plan
880+
```
881+
882+
=== "MS CMD"
883+
884+
```cmd
885+
C:\> set SQLMESH_DEBUG=1
886+
C:\> sqlmesh plan
887+
```

docs/reference/configuration.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,9 @@ For example, you might have a specific connection where your tests should run re
172172

173173
## Debug mode
174174

175-
To enable debug mode set the `SQLMESH_DEBUG` environment variable to one of the following values: "1", "true", "t", "yes" or "y". Enabling this mode ensures that full backtraces are printed when using CLI. The default log level is set to `DEBUG` when this mode is enabled.
175+
To enable debug mode set the `SQLMESH_DEBUG` environment variable to one of the following values: "1", "true", "t", "yes" or "y".
176+
177+
Enabling this mode ensures that full backtraces are printed when using CLI. The default log level is set to `DEBUG` when this mode is enabled.
176178

177179
Example enabling debug mode for the CLI command `sqlmesh plan`:
178180

0 commit comments

Comments
 (0)