You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SQLMesh supports a variety of [model kinds](../concepts/models/model_kinds.md)to capture the most common semantics of how transformations can be evaluated and materialized.
3
+
SQLMesh supports a variety of [model kinds](../concepts/models/model_kinds.md)that reflect the most common approaches to evaluating and materializing data transformations.
4
4
5
-
There are times, however, when a specific use case doesn't align with any of the supported materialization strategies. For scenarios like this, SQLMesh allows users to create their own materialization implementation using Python.
5
+
Sometimes, however, a specific use case cannot be addressed with an existing model kind. For scenarios like this, SQLMesh allows users to create their own materialization implementation using Python.
6
6
7
-
Please note that this is an advanced feature and should only be considered if all other approaches to addressing a use case have been exhausted. If you're at this decision point, we recommend you reach out to our team in the community slack: [here](https://tobikodata.com/community.html)
7
+
__NOTE__: this is an advanced feature and should only be considered if all other approaches have been exhausted. If you're at this decision point, we recommend you reach out to our team in the [community slack](https://tobikodata.com/community.html) before investing time building a custom materialization. If an existing model kind can solve your problem, we want to clarify the SQLMesh documentation; if an existing kind can _almost_ solve your problem, we want to consider modifying the kind so all SQLMesh users can solve the problem as well.
8
8
9
-
## Creating a materialization
9
+
## Background
10
10
11
-
The fastest way to add a new custom materialization is to add a new `.py` file with the implementation to the `materializations/` folder of the project. SQLMesh will automatically import all Python modules in this folder at project load time and register the custom materializations accordingly.
11
+
A SQLMesh model kind consists of methods for executing and managing the outputs of data transformations - collectively, these are the kind's "materialization."
12
12
13
-
To create a custom materialization strategy, you need to inherit the `CustomMaterialization` base class and, at a very minimum, provide an implementation for the `insert` method.
13
+
Some materializations are relatively simple. For example, the SQL [FULL model kind](../concepts/models/model_kinds.md#full) completely replaces existing data each time it is run, so its materialization boils down to executing `CREATE OR REPLACE [table name] AS [your model query]`.
14
14
15
-
For example, a simple custom full-refresh materialization strategy might look like the following:
15
+
The materializations for other kinds, such as [INCREMENTAL BY TIME RANGE](../concepts/models/model_kinds.md#incremental_by_time_range), require additional logic to process the correct time intervals and replace/insert their results into an existing table.
16
16
17
-
```python linenums="1"
18
-
from__future__import annotations
17
+
A model kind's materialization may differ based on the SQL engine executing the model. For example, PostgreSQL does not support `CREATE OR REPLACE TABLE`, so `FULL` model kinds instead `DROP` the existing table then `CREATE` a new table. SQLMesh already contains the logic needed to materialize existing model kinds on all [supported engines](../integrations/overview.md#execution-engines).
19
18
20
-
import typing as t
19
+
## Overview
20
+
21
+
Custom materializations are analogous to new model kinds. Users [specify them by name](#using-custom-materializations-in-models) in a model definition's `MODEL` block, and they may accept user-specified arguments.
22
+
23
+
A custom materialization must:
24
+
25
+
- Be written in Python code
26
+
- Be a Python class that inherits the SQLMesh `CustomMaterialization` base class
27
+
- Use or override the `insert` method from the SQLMesh [`MaterializableStrategy`](https://github.com/TobikoData/sqlmesh/blob/034476e7f64d261860fd630c3ac56d8a9c9f3e3a/sqlmesh/core/snapshot/evaluator.py#L1146) class/subclasses
28
+
- Be loaded or imported by SQLMesh at runtime
29
+
30
+
A custom materialization may:
31
+
32
+
- Use or override methods from the SQLMesh [`MaterializableStrategy`](https://github.com/TobikoData/sqlmesh/blob/034476e7f64d261860fd630c3ac56d8a9c9f3e3a/sqlmesh/core/snapshot/evaluator.py#L1146) class/subclasses
33
+
- Use or override methods from the SQLMesh [`EngineAdapter`](https://github.com/TobikoData/sqlmesh/blob/034476e7f64d261860fd630c3ac56d8a9c9f3e3a/sqlmesh/core/engine_adapter/base.py#L67) class/subclasses
34
+
- Execute arbitrary SQL code and fetch results with the engine adapter `execute` and related methods
21
35
22
-
from sqlmesh import CustomMaterialization, Model
36
+
A custom materialization may perform arbitrary Python processing with Pandas or other libraries, but in most cases that logic should reside in a [Python model](../concepts/models/python_models.md) instead of the materialization.
23
37
38
+
A SQLMesh project will automatically load any custom materializations present in its `materializations/` directory. Alternatively, the materialization may be bundled into a [Python package](#python-packaging) and installed with standard methods.
39
+
40
+
## Creating a custom materialization
41
+
42
+
Create a new custom materialization by adding a `.py` file containing the implementation to the `materializations/` folder in the project directory. SQLMesh will automatically import all Python modules in this folder at project load time and register the custom materializations. (Find more information about sharing and packaging custom materializations [below](#sharing-custom-materializations).)
43
+
44
+
A custom materialization must be a class that inherits the `CustomMaterialization` base class and provides an implementation for the `insert` method.
45
+
46
+
For example, a minimal full-refresh custom materialization might look like the following:
47
+
48
+
```python linenums="1"
49
+
from sqlmesh import CustomMaterialization # required
50
+
51
+
# argument typing: strongly recommended but optional best practice
table_name: str,# ": str" is optional argument typing
34
64
query_or_df: QueryOrDF,
35
65
model: Model,
36
66
is_first_insert: bool,
@@ -40,28 +70,29 @@ class CustomFullMaterialization(CustomMaterialization):
40
70
41
71
```
42
72
43
-
Let's unpack the above implementation:
73
+
Let's unpack this materialization:
44
74
45
-
*`NAME` - determines the name of the custom materialization. This name will be used in model definitions to reference a specific strategy. If not specified, the name of the class will be used instead.
46
-
* The `insert` method comes with the following arguments:
47
-
*`table_name` - the name of a target table (or a view) into which the data should be inserted.
48
-
*`query_or_df` - a query (a SQLGlot expression) or a DataFrame (pandas, PySpark, or Snowpark) instance which has to be inserted.
49
-
*`model` - the associated model definition object which can be used to get any model parameters as well as custom materialization settings.
50
-
*`is_first_insert` - whether this is the first insert for the current version of the model.
51
-
*`kwargs` - contains additional and future arguments.
52
-
* The `self.adapter` instance is used to interact with the target engine. It comes with a set of useful high-level APIs like `replace_query`, `create_table`, and `table_exists`, but also supports execution of arbitrary SQL expressions with its `execute` method.
75
+
*`NAME` - name of the custom materialization. This name is used to specify the materialization in a model definition `MODEL` block. If not specified in the custom materialization, the name of the class is used in the `MODEL` block instead.
76
+
* The `insert` method has the following arguments:
77
+
*`table_name` - the name of a target table or view into which the data should be inserted
78
+
*`query_or_df` - a query (of SQLGlot expression type) or DataFrame (Pandas, PySpark, or Snowpark) instance to be inserted
79
+
*`model` - the model definition object used to access model parameters and user-specified materialization arguments
80
+
*`is_first_insert` - whether this is the first insert for the current version of the model (used with batched or multi-step inserts)
81
+
*`kwargs` - additional and future arguments
82
+
* The `self.adapter` instance is used to interact with the target engine. It comes with a set of useful high-level APIs like `replace_query`, `columns`, and `table_exists`, but also supports executing arbitrary SQL expressions with its `execute` method.
53
83
54
-
You can also control how the associated data objects (tables, views, etc.) are created and deleted by overriding the `create` and `delete` methods accordingly:
84
+
You can control how data objects (tables, views, etc.) are created and deleted by overriding the `MaterializableStrategy` class's `create` and `delete` methods:
55
85
56
86
```python linenums="1"
57
-
from__future__importannotations
87
+
fromsqlmeshimportCustomMaterialization # required
58
88
89
+
# argument typing: strongly recommended but optional best practice
# Likely uses `self.adapter` methods like `drop_table` or `drop_view`.
79
112
```
80
113
81
-
## Using custom materializations in models
114
+
## Using a custom materialization
82
115
83
-
In order to use the newly created materialization, use the special model kind `CUSTOM`:
116
+
Specify the model kind `CUSTOM` in a model definition `MODEL` block to use the custom materialization. Specify the `NAME` from the custom materialization code in the `materialization` attribute of the `CUSTOM` kind:
84
117
85
118
```sql linenums="1"
86
119
MODEL (
87
120
name my_db.my_model,
88
-
kind CUSTOM (materialization 'my_custom_full')
121
+
kind CUSTOM (
122
+
materialization 'my_custom_full'
123
+
)
89
124
);
90
125
```
91
126
92
-
The name of the materialization strategy is provided in the `materialization` attribute of the `CUSTOM` kind.
93
-
94
-
Additionally, you can provide an optional list of arbitrary key-value pairs in the `materialization_properties` attribute:
127
+
A custom materialization may accept arguments specified in an array of key-value pairs in the `CUSTOM` kind's `materialization_properties` attribute:
95
128
96
-
```sql linenums="1"
129
+
```sql linenums="1" hl_lines="5-7"
97
130
MODEL (
98
131
name my_db.my_model,
99
132
kind CUSTOM (
@@ -105,7 +138,7 @@ MODEL (
105
138
);
106
139
```
107
140
108
-
These properties can be accessed with the model reference within the materialization implementation:
141
+
The custom materialization implementation accesses the `materialization_properties` via the `model` object's `custom_materialization_properties` dictionary:
# Example for existing materialization for look and feel: https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/snapshot/evaluator.py
157
+
# Example existing materialization for look and feel: https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/snapshot/evaluator.py
125
158
```
126
159
127
-
## Packaging custom materializations
160
+
## Sharing custom materializations
161
+
162
+
### Copying files
163
+
164
+
The simplest (but least robust) way to use a custom materialization in multiple SQLMesh projects is for each project to place a copy of the materialization's Python code in its `materializations/` directory.
165
+
166
+
If you use this approach, we strongly recommend storing the materialization code in a version-controlled repository and creating a reliable method of notifying users when it is updated.
167
+
168
+
This approach may be appropriate for smaller organizations, but it is not robust.
169
+
170
+
### Python packaging
128
171
129
-
To share custom materializations across multiple SQLMesh projects, you need to create and publish a Python package containing your implementation.
172
+
A more complex (but robust) way to use a custom materialization in multiple SQLMesh projects is to create and publish a Python package containing the implementation.
130
173
131
-
When using SQLMesh with Airflow or other external schedulers, note that the `materializations/` folder might not be available on the Airflow cluster side. Therefore, you'll need a package that can be installed there.
174
+
One scenario that requires Python packaging is when a SQLMesh project uses Airflow or other external schedulers, and the scheduler cluster does not have the `materializations/` folder available. The cluster will use standard Python package installation methods to import the custom materialization.
132
175
133
-
Custom materializations can be packaged into a Python package and exposed via[setuptools entrypoints](https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata) mechanism. Once the package is installed, SQLMesh will automatically load custom materializations from the entrypoint list.
176
+
Package and expose custom materializations with the[setuptools entrypoints](https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata) mechanism. Once the package is installed, SQLMesh will automatically load custom materializations from the entrypoint list.
134
177
135
-
If your custom materialization class is defined in the `my_package/my_materialization.py` module, you can expose it as an entry point in the `pyproject.toml` file as follows:
178
+
For example, if your custom materialization class is defined in the `my_package/my_materialization.py` module, you can expose it as an entrypoint in the `pyproject.toml` file as follows:
136
179
137
180
```toml
138
181
[project.entry-points."sqlmesh.materializations"]
@@ -152,4 +195,4 @@ setup(
152
195
)
153
196
```
154
197
155
-
Refer to the [custom_materializations](https://github.com/TobikoData/sqlmesh/tree/main/examples/custom_materializations)package example for more details.
198
+
Refer to the SQLMesh Github [custom_materializations](https://github.com/TobikoData/sqlmesh/tree/main/examples/custom_materializations) example for more details on Python packaging.
0 commit comments