Skip to content

Commit aec6b00

Browse files
authored
Docs: add observer guide (#1803)
* Add observer guide * Add table chart toggle, sqlmesh observe command * Optimize image
1 parent 14e5d8a commit aec6b00

22 files changed

+210
-0
lines changed

docs/guides/observer.md

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# SQLMesh Observer
2+
3+
Data pipelines break. Upstream sources change without warning, buggy code gets merged, and cloud services randomly time out. These problems are ubiquitous, and someone is responsible for fixing them (probably you if you're reading this).
4+
5+
SQLMesh Observer provides the information you need to rapidly detect, understand, and remedy problems with SQLMesh data transformation pipelines.
6+
7+
This page describes how to install, run, and use SQLMesh Observer.
8+
9+
## The Challenge
10+
11+
Remediating problems with data pipelines is challenging because there are so many potential causes. For transformation pipelines, those range from upstream source timeouts to SQL query errors to Python library conflicts (and more!).
12+
13+
A useful observation tool should enable answering the following questions:
14+
15+
- Did a problem occur?
16+
- When did it occur?
17+
- What type of problem is it?
18+
- Where is the problem coming from?
19+
- What is causing the problem?
20+
21+
SQLMesh Observer supports answering these questions in four ways:
22+
23+
1. Automatically [notifying users](./notifications.md) if a problem occurs
24+
2. Capturing, storing, and displaying historical measures to reveal when a problem occurred
25+
3. Enabling easy navigation from aggregated to granular information about pipeline components to identify the problem source
26+
4. Centralizing error information from multiple sources to debug the problem
27+
28+
## Measures
29+
30+
SQLMesh Observer automatically captures and stores measures from all SQLMesh actions. We now briefly review the SQLMesh workflow before describing the different measures Observer captures.
31+
32+
### SQLMesh workflow
33+
34+
The core of a SQLMesh project is its **models**. Roughly, each model consists of one SQL query and metadata that tells SQLMesh about how the model should be processed.
35+
36+
Each model may have **audits** that validate the data returned by a model (e.g., verifying that a column contains no `NULL` values). By default, SQLMesh will stop running a project if an audit fails for any of its models.
37+
38+
When you run a project on a SQL engine, you must choose an **environment** in which to run it. Environments allow people to modify projects in an isolated space that won't interfere with anyone else (or the version of the project running in production).
39+
40+
SQLMesh stores a unique fingerprint of the project's content on each run so it can determine if any of that content has changed the next time you run it in that environment.
41+
42+
When a project's content has changed, an environment is updated to reflect those changes with a SQLMesh **plan**. The plan identifies all the changes and determines which data will be affected by them so it only has to re-run the relevant models.
43+
44+
After changes have been applied with a plan, the project is **run** on a schedule to process new data that has arrived since the previous run.
45+
46+
The five entities in bold - models, audits, environments, runs, and plans - provide the information SQLMesh Observer captures to help you efficiently identify and remediate problems with your transformation pipeline.
47+
48+
### Data
49+
50+
We now describe the specific measures SQLMesh captures about each entity.
51+
52+
SQLMesh performs its primary actions during **plans** and **runs**, so most measures are generated when they occur. Both plans and runs are executed in a specific **environment**, so all of their measures are environment-specific.
53+
54+
These measures are recorded and stored for each plan or run in a specific environment:
55+
56+
- When it began and ended
57+
- Total run time
58+
- Whether it failed
59+
- Whether and how any model audits failed
60+
- The model versions evaluated during the plan/run
61+
- Each model's run time
62+
63+
Additionally, you can define custom measures that will be captured for each model. Defined with a SQL query, the measures are calculated for each model in the project. For example, you might record the total number of rows returned by each model so you can detect significant increases or decreases over time.
64+
65+
## Installation
66+
67+
SQLMesh Observer is part of the `sqlmesh-enterprise` Python library and is installed via `pip`.
68+
69+
Installation requires a license key provided by Tobiko Data. You include the license key in the `pip` install command executed from the command line. It is quite long, so we recommend placing it in a file that the installation command reads. In this example, we have stored the key in a `txt` file:
70+
71+
![SQLMesh Enterprise key stored in txt file](./observer/observer_key-file.png)
72+
73+
Run the installation command and read the key file with the following command. The key is passed to the `--extra-index-url` argument, either directly by pasting the key into the command or by reading the key from file with an embedded `cat` command. You should replace `<path to key file>` with the path to your key file:
74+
75+
``` bash
76+
> pip install "sqlmesh-enterprise" --extra-index-url "$(cat <path to key file>)"
77+
```
78+
79+
`sqlmesh-enterprise` works by overriding components of `sqlmesh` open source, and installing `sqlmesh-enterprise` will automatically install open-source `sqlmesh`.
80+
81+
SQLMesh extras, such as SQL engine drivers, can be passed directly to the `sqlmesh-enterprise` installation command. This example installs the SQLMesh Slack notification and Snowflake engine driver extras:
82+
83+
``` bash
84+
> pip install "sqlmesh-enterprise[slack,snowflake]" --extra-index-url "$(cat <path to key file>)"
85+
```
86+
87+
NOTE: `sqlmesh-enterprise` will not function properly if open-source `sqlmesh` is installed after it.
88+
89+
## Startup
90+
91+
As with the open-source [SQLMesh Browser UI](../quickstart/ui.md), SQLMesh Observer is initiated from the command line then opened in a web browser.
92+
93+
First, navigate to your project directory in the CLI. Then start Observer by running the `sqlmesh observe` command:
94+
95+
```bash
96+
sqlmesh observe
97+
```
98+
99+
After starting up, SQLMesh Observer is served at `http://127.0.0.1:8000` by default:
100+
101+
![SQLMesh Observer startup on CLI](./observer/observer_cli.png)
102+
103+
Navigate to the URL by clicking the link in your terminal (if supported) or copy-pasting it into your web browser:
104+
105+
![SQLMesh Observer dashboard interface](./observer/observer_dashboard.png)
106+
107+
## Interface
108+
109+
We now describe the components of the SQLMesh Observer user interface.
110+
111+
### Dashboard
112+
113+
The "Dashboard" page is displayed when Observer starts - it consists of the following components:
114+
115+
1. Links to the other two pages, "Environments" and "Plan Applications," in the top left
116+
2. Counts and links to key information about environments, models, and plans in the top center
117+
3. Interactive chart of historical `run` run times in the middle center
118+
4. Interactive chart of historical audit failure counts in the bottom left
119+
5. Interactive chart of historical `run` failures in the bottom right
120+
121+
![SQLMesh Observer dashboard](./observer/observer_dashboard-components.png)
122+
123+
### Charts
124+
125+
Observer presents historical information via charts and tables. Most charts represent time on the x-axis and share the same appearance and user options.
126+
127+
In a chart's top left corner is the `Time` selector, which sets the range of the x-axis. For example, the first chart displays 1 week of data, from November 27 through December 4. The second chart displays the same data but includes 3 months of historical data beginning on September 4:
128+
129+
![SQLMesh Observer chart x-axis time selector](./observer/observer_chart-time-selector.png)
130+
131+
In a chart's top right corner is the `Scale` selector, which toggles between a linear and log y-axis scale. A log scale may be helpful for comparing highly variable data series over time. This example displays the data from the second chart in the previous figure with a log y-axis scale:
132+
133+
![SQLMesh Observer chart y-axis scale selector](./observer/observer_chart-scale-selector.png)
134+
135+
Charts also display the data underlying a specific data point when the mouse hovers over it:
136+
137+
![SQLMesh Observer chart mouse hover](./observer/observer_chart-hover.png)
138+
139+
Many charts display purple `Plan` markers, which provide contextual information about when changes to the project occurred. Clicking on the marker will open a page containing [more information about the plan](#plan-applications).
140+
141+
Some Observer tables include a button that toggles a chart of the measures in the table:
142+
143+
![SQLMesh Observer table chart toggle](./observer/observer_table-chart-toggle.png)
144+
145+
146+
### Environments
147+
148+
Access the `Environments` landing page via the navigation links in the dashboard's top left. It displays a table listing each SQLMesh environment, the date it was created, the date it was last updated, and the date it expires (after which the SQLMesh janitor will delete it). The `prod` environment is always present and has no expiration date.
149+
150+
![SQLMesh Observer environment landing page](./observer/observer_environments-landing.png)
151+
152+
Clicking an environment's name in the table open's the environment's information page. The page begins with historical charts of run time, audit failures, and evaluation failures:
153+
154+
![SQLMesh Observer environment information page](./observer/observer_environments-info-1.png)
155+
156+
The page continues with lists of recent audit failures, evaluation failure, and model evaluations:
157+
158+
![SQLMesh Observer environment information: recent occurrences](./observer/observer_environments-info-2.png)
159+
160+
The page finishes with a list of models that differ from those currently in the `prod` environment, a list of the audits that have historically failed most frequently, a list of the models that have historically failed most frequently, and a list of the models with the longest run times:
161+
162+
![SQLMesh Observer environment information: historical outliers](./observer/observer_environments-info-3.png)
163+
164+
Each model differing from the `prod` environment may be expanded to view the text diff between the two. The models are listed separately based on whether the plan directly or indirectly modified them, and breaking changes are indicated with an orange "Breaking" label:
165+
166+
![SQLMesh Observer environment information: model text diff](./observer/observer_environments-info-prod-diff.png)
167+
168+
### Plan Applications
169+
170+
Access the `Plan Applications` landing page via the navigation links in the dashboard's top left. It displays a table listing each SQLMesh project plan that has been applied and includes the following information about each:
171+
172+
- Plan ID
173+
- Previous plan ID (most recent plan executed prior)
174+
- Environment to which the plan was applied (with link to environment information page)
175+
- A count of models in the plan (with link to the plan's models)
176+
- Whether the plan included model restatements
177+
- Whether the plan was in forward-only mode
178+
- The start and end dates of the time interval covered by the plan
179+
- The start and end times of the plan application
180+
181+
![SQLMesh Observer plans list](./observer/observer_plans-list.png)
182+
183+
Clicking a Plan ID opens its information page, which lists the information included in the landing page table and links to models added or modified by the plan:
184+
185+
![SQLMesh Observer plan information page](./observer/observer_plans-information.png)
186+
187+
Modified models can be expanded to display a text diff of the change:
188+
189+
![SQLMesh Observer plan text diff](./observer/observer_plans-text-diff.png)
190+
191+
### Models
192+
193+
A model can change over time, so its information is associated with a specific SQLMesh environment and plan. Access a model's page via links in a plan or environment page.
194+
195+
The model information page begins with historical charts of model run time, audit failures, and evaluation failures:
196+
197+
![SQLMesh Observer model charts](./observer/observer_model-information-1.png)
198+
199+
It continues with details about the model, including its metadata (e.g., model dialect and kind), model text, and list of previous model versions and text diffs:
200+
201+
![SQLMesh Observer model details](./observer/observer_model-information-2.png)
202+
203+
Next, the Loaded Intervals section displays the time intervals that have been loaded and are currently present in the model's physical table, and the Recent Model Evaluations section lists the time interval each evaluation processed and the evaluation's start and end times:
204+
205+
![SQLMesh Observer model time intervals](./observer/observer_model-information-3.png)
206+
207+
The model information page concludes with a list of most frequent audits the model has failed, the most frequent time intervals that failed, and the largest historical model run times:
208+
209+
![SQLMesh Observer historical outliers](./observer/observer_model-information-4.png)
36.8 KB
Loading
32.1 KB
Loading
63.3 KB
Loading
11 KB
Loading
147 KB
Loading
164 KB
Loading
133 KB
Loading
139 KB
Loading
113 KB
Loading

0 commit comments

Comments
 (0)