Skip to content

Commit 89a7e05

Browse files
authored
Docs: expand custom materializations guide (#2851)
1 parent 3c1152d commit 89a7e05

1 file changed

Lines changed: 87 additions & 44 deletions

File tree

Lines changed: 87 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,66 @@
11
# Custom materializations guide
22

3-
SQLMesh supports a variety of [model kinds](../concepts/models/model_kinds.md) to capture the most common semantics of how transformations can be evaluated and materialized.
3+
SQLMesh supports a variety of [model kinds](../concepts/models/model_kinds.md) that reflect the most common approaches to evaluating and materializing data transformations.
44

5-
There are times, however, when a specific use case doesn't align with any of the supported materialization strategies. For scenarios like this, SQLMesh allows users to create their own materialization implementation using Python.
5+
Sometimes, however, a specific use case cannot be addressed with an existing model kind. For scenarios like this, SQLMesh allows users to create their own materialization implementation using Python.
66

7-
Please note that this is an advanced feature and should only be considered if all other approaches to addressing a use case have been exhausted. If you're at this decision point, we recommend you reach out to our team in the community slack: [here](https://tobikodata.com/community.html)
7+
__NOTE__: this is an advanced feature and should only be considered if all other approaches have been exhausted. If you're at this decision point, we recommend you reach out to our team in the [community slack](https://tobikodata.com/community.html) before investing time building a custom materialization. If an existing model kind can solve your problem, we want to clarify the SQLMesh documentation; if an existing kind can _almost_ solve your problem, we want to consider modifying the kind so all SQLMesh users can solve the problem as well.
88

9-
## Creating a materialization
9+
## Background
1010

11-
The fastest way to add a new custom materialization is to add a new `.py` file with the implementation to the `materializations/` folder of the project. SQLMesh will automatically import all Python modules in this folder at project load time and register the custom materializations accordingly.
11+
A SQLMesh model kind consists of methods for executing and managing the outputs of data transformations - collectively, these are the kind's "materialization."
1212

13-
To create a custom materialization strategy, you need to inherit the `CustomMaterialization` base class and, at a very minimum, provide an implementation for the `insert` method.
13+
Some materializations are relatively simple. For example, the SQL [FULL model kind](../concepts/models/model_kinds.md#full) completely replaces existing data each time it is run, so its materialization boils down to executing `CREATE OR REPLACE [table name] AS [your model query]`.
1414

15-
For example, a simple custom full-refresh materialization strategy might look like the following:
15+
The materializations for other kinds, such as [INCREMENTAL BY TIME RANGE](../concepts/models/model_kinds.md#incremental_by_time_range), require additional logic to process the correct time intervals and replace/insert their results into an existing table.
1616

17-
```python linenums="1"
18-
from __future__ import annotations
17+
A model kind's materialization may differ based on the SQL engine executing the model. For example, PostgreSQL does not support `CREATE OR REPLACE TABLE`, so `FULL` model kinds instead `DROP` the existing table then `CREATE` a new table. SQLMesh already contains the logic needed to materialize existing model kinds on all [supported engines](../integrations/overview.md#execution-engines).
1918

20-
import typing as t
19+
## Overview
20+
21+
Custom materializations are analogous to new model kinds. Users [specify them by name](#using-custom-materializations-in-models) in a model definition's `MODEL` block, and they may accept user-specified arguments.
22+
23+
A custom materialization must:
24+
25+
- Be written in Python code
26+
- Be a Python class that inherits the SQLMesh `CustomMaterialization` base class
27+
- Use or override the `insert` method from the SQLMesh [`MaterializableStrategy`](https://github.com/TobikoData/sqlmesh/blob/034476e7f64d261860fd630c3ac56d8a9c9f3e3a/sqlmesh/core/snapshot/evaluator.py#L1146) class/subclasses
28+
- Be loaded or imported by SQLMesh at runtime
29+
30+
A custom materialization may:
31+
32+
- Use or override methods from the SQLMesh [`MaterializableStrategy`](https://github.com/TobikoData/sqlmesh/blob/034476e7f64d261860fd630c3ac56d8a9c9f3e3a/sqlmesh/core/snapshot/evaluator.py#L1146) class/subclasses
33+
- Use or override methods from the SQLMesh [`EngineAdapter`](https://github.com/TobikoData/sqlmesh/blob/034476e7f64d261860fd630c3ac56d8a9c9f3e3a/sqlmesh/core/engine_adapter/base.py#L67) class/subclasses
34+
- Execute arbitrary SQL code and fetch results with the engine adapter `execute` and related methods
2135

22-
from sqlmesh import CustomMaterialization, Model
36+
A custom materialization may perform arbitrary Python processing with Pandas or other libraries, but in most cases that logic should reside in a [Python model](../concepts/models/python_models.md) instead of the materialization.
2337

38+
A SQLMesh project will automatically load any custom materializations present in its `materializations/` directory. Alternatively, the materialization may be bundled into a [Python package](#python-packaging) and installed with standard methods.
39+
40+
## Creating a custom materialization
41+
42+
Create a new custom materialization by adding a `.py` file containing the implementation to the `materializations/` folder in the project directory. SQLMesh will automatically import all Python modules in this folder at project load time and register the custom materializations. (Find more information about sharing and packaging custom materializations [below](#sharing-custom-materializations).)
43+
44+
A custom materialization must be a class that inherits the `CustomMaterialization` base class and provides an implementation for the `insert` method.
45+
46+
For example, a minimal full-refresh custom materialization might look like the following:
47+
48+
```python linenums="1"
49+
from sqlmesh import CustomMaterialization # required
50+
51+
# argument typing: strongly recommended but optional best practice
52+
from __future__ import annotations
53+
from sqlmesh import Model
54+
import typing as t
2455
if t.TYPE_CHECKING:
2556
from sqlmesh import QueryOrDF
2657

27-
2858
class CustomFullMaterialization(CustomMaterialization):
2959
NAME = "my_custom_full"
3060

3161
def insert(
3262
self,
33-
table_name: str,
63+
table_name: str, # ": str" is optional argument typing
3464
query_or_df: QueryOrDF,
3565
model: Model,
3666
is_first_insert: bool,
@@ -40,28 +70,29 @@ class CustomFullMaterialization(CustomMaterialization):
4070

4171
```
4272

43-
Let's unpack the above implementation:
73+
Let's unpack this materialization:
4474

45-
* `NAME` - determines the name of the custom materialization. This name will be used in model definitions to reference a specific strategy. If not specified, the name of the class will be used instead.
46-
* The `insert` method comes with the following arguments:
47-
* `table_name` - the name of a target table (or a view) into which the data should be inserted.
48-
* `query_or_df` - a query (a SQLGlot expression) or a DataFrame (pandas, PySpark, or Snowpark) instance which has to be inserted.
49-
* `model` - the associated model definition object which can be used to get any model parameters as well as custom materialization settings.
50-
* `is_first_insert` - whether this is the first insert for the current version of the model.
51-
* `kwargs` - contains additional and future arguments.
52-
* The `self.adapter` instance is used to interact with the target engine. It comes with a set of useful high-level APIs like `replace_query`, `create_table`, and `table_exists`, but also supports execution of arbitrary SQL expressions with its `execute` method.
75+
* `NAME` - name of the custom materialization. This name is used to specify the materialization in a model definition `MODEL` block. If not specified in the custom materialization, the name of the class is used in the `MODEL` block instead.
76+
* The `insert` method has the following arguments:
77+
* `table_name` - the name of a target table or view into which the data should be inserted
78+
* `query_or_df` - a query (of SQLGlot expression type) or DataFrame (Pandas, PySpark, or Snowpark) instance to be inserted
79+
* `model` - the model definition object used to access model parameters and user-specified materialization arguments
80+
* `is_first_insert` - whether this is the first insert for the current version of the model (used with batched or multi-step inserts)
81+
* `kwargs` - additional and future arguments
82+
* The `self.adapter` instance is used to interact with the target engine. It comes with a set of useful high-level APIs like `replace_query`, `columns`, and `table_exists`, but also supports executing arbitrary SQL expressions with its `execute` method.
5383

54-
You can also control how the associated data objects (tables, views, etc.) are created and deleted by overriding the `create` and `delete` methods accordingly:
84+
You can control how data objects (tables, views, etc.) are created and deleted by overriding the `MaterializableStrategy` class's `create` and `delete` methods:
5585

5686
```python linenums="1"
57-
from __future__ import annotations
87+
from sqlmesh import CustomMaterialization # required
5888

89+
# argument typing: strongly recommended but optional best practice
90+
from __future__ import annotations
91+
from sqlmesh import Model
5992
import typing as t
6093

61-
from sqlmesh import CustomMaterialization, Model
62-
63-
6494
class CustomFullMaterialization(CustomMaterialization):
95+
# NAME and `insert` method code here
6596
...
6697

6798
def create(
@@ -72,28 +103,30 @@ class CustomFullMaterialization(CustomMaterialization):
72103
render_kwargs: t.Dict[str, t.Any],
73104
**kwargs: t.Any,
74105
) -> None:
75-
# Custom creation logic.
106+
# Custom table/view creation logic.
107+
# Likely uses `self.adapter` methods like `create_table`, `create_view`, or `ctas`.
76108

77109
def delete(self, name: str, **kwargs: t.Any) -> None:
78-
# Custom deletion logic.
110+
# Custom table/view deletion logic.
111+
# Likely uses `self.adapter` methods like `drop_table` or `drop_view`.
79112
```
80113

81-
## Using custom materializations in models
114+
## Using a custom materialization
82115

83-
In order to use the newly created materialization, use the special model kind `CUSTOM`:
116+
Specify the model kind `CUSTOM` in a model definition `MODEL` block to use the custom materialization. Specify the `NAME` from the custom materialization code in the `materialization` attribute of the `CUSTOM` kind:
84117

85118
```sql linenums="1"
86119
MODEL (
87120
name my_db.my_model,
88-
kind CUSTOM (materialization 'my_custom_full')
121+
kind CUSTOM (
122+
materialization 'my_custom_full'
123+
)
89124
);
90125
```
91126

92-
The name of the materialization strategy is provided in the `materialization` attribute of the `CUSTOM` kind.
93-
94-
Additionally, you can provide an optional list of arbitrary key-value pairs in the `materialization_properties` attribute:
127+
A custom materialization may accept arguments specified in an array of key-value pairs in the `CUSTOM` kind's `materialization_properties` attribute:
95128

96-
```sql linenums="1"
129+
```sql linenums="1" hl_lines="5-7"
97130
MODEL (
98131
name my_db.my_model,
99132
kind CUSTOM (
@@ -105,7 +138,7 @@ MODEL (
105138
);
106139
```
107140

108-
These properties can be accessed with the model reference within the materialization implementation:
141+
The custom materialization implementation accesses the `materialization_properties` via the `model` object's `custom_materialization_properties` dictionary:
109142

110143
```python linenums="1" hl_lines="12"
111144
class CustomFullMaterialization(CustomMaterialization):
@@ -121,18 +154,28 @@ class CustomFullMaterialization(CustomMaterialization):
121154
) -> None:
122155
config_value = model.custom_materialization_properties["config_key"]
123156
# Proceed with implementing the insertion logic.
124-
# Example for existing materialization for look and feel: https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/snapshot/evaluator.py
157+
# Example existing materialization for look and feel: https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/snapshot/evaluator.py
125158
```
126159

127-
## Packaging custom materializations
160+
## Sharing custom materializations
161+
162+
### Copying files
163+
164+
The simplest (but least robust) way to use a custom materialization in multiple SQLMesh projects is for each project to place a copy of the materialization's Python code in its `materializations/` directory.
165+
166+
If you use this approach, we strongly recommend storing the materialization code in a version-controlled repository and creating a reliable method of notifying users when it is updated.
167+
168+
This approach may be appropriate for smaller organizations, but it is not robust.
169+
170+
### Python packaging
128171

129-
To share custom materializations across multiple SQLMesh projects, you need to create and publish a Python package containing your implementation.
172+
A more complex (but robust) way to use a custom materialization in multiple SQLMesh projects is to create and publish a Python package containing the implementation.
130173

131-
When using SQLMesh with Airflow or other external schedulers, note that the `materializations/` folder might not be available on the Airflow cluster side. Therefore, you'll need a package that can be installed there.
174+
One scenario that requires Python packaging is when a SQLMesh project uses Airflow or other external schedulers, and the scheduler cluster does not have the `materializations/` folder available. The cluster will use standard Python package installation methods to import the custom materialization.
132175

133-
Custom materializations can be packaged into a Python package and exposed via [setuptools entrypoints](https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata) mechanism. Once the package is installed, SQLMesh will automatically load custom materializations from the entrypoint list.
176+
Package and expose custom materializations with the [setuptools entrypoints](https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata) mechanism. Once the package is installed, SQLMesh will automatically load custom materializations from the entrypoint list.
134177

135-
If your custom materialization class is defined in the `my_package/my_materialization.py` module, you can expose it as an entry point in the `pyproject.toml` file as follows:
178+
For example, if your custom materialization class is defined in the `my_package/my_materialization.py` module, you can expose it as an entrypoint in the `pyproject.toml` file as follows:
136179

137180
```toml
138181
[project.entry-points."sqlmesh.materializations"]
@@ -152,4 +195,4 @@ setup(
152195
)
153196
```
154197

155-
Refer to the [custom_materializations](https://github.com/TobikoData/sqlmesh/tree/main/examples/custom_materializations) package example for more details.
198+
Refer to the SQLMesh Github [custom_materializations](https://github.com/TobikoData/sqlmesh/tree/main/examples/custom_materializations) example for more details on Python packaging.

0 commit comments

Comments
 (0)