Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions config/_default/menus/main.en.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5070,6 +5070,12 @@ menu:
identifier: data_catalog
parent: data_observability_heading
weight: 65000
- name: Lineage
url: data_observability/lineage
pre: data-observability-wui
identifier: data_lineage
parent: data_observability_heading
weight: 67500
- name: Quality Monitoring
url: data_observability/quality_monitoring/
pre: data-observability-wui
Expand Down
4 changes: 4 additions & 0 deletions content/en/data_observability/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ further_reading:
- link: '/data_observability/data_catalog/'
tag: 'Documentation'
text: 'Data Catalog'
- link: '/data_observability/lineage/'
tag: 'Documentation'
text: 'Lineage'
- link: '/data_observability/quality_monitoring/'
tag: 'Documentation'
text: 'Quality Monitoring'
Expand Down Expand Up @@ -32,6 +35,7 @@ Data Observability (DO) helps data teams improve the reliability of data for ana

{{< whatsnext desc="Data Observability consists of the following:" >}}
{{< nextlink href="/data_observability/data_catalog/" >}}Data Catalog: Browse and search a centralized inventory of your data assets across connected integrations.{{< /nextlink >}}
{{< nextlink href="/data_observability/lineage/" >}}Lineage: Trace upstream dependencies and downstream consumers across your data stack.{{< /nextlink >}}
{{< nextlink href="/data_observability/quality_monitoring/" >}}Quality Monitoring: Identify data issues before downstream BI and AI applications are impacted.{{< /nextlink >}}
{{< nextlink href="/data_observability/jobs_monitoring/" >}}Jobs Monitoring: Observe, troubleshoot, and optimize jobs across your data pipelines.{{< /nextlink >}}
{{< /whatsnext >}}
7 changes: 6 additions & 1 deletion content/en/data_observability/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ further_reading:
- link: '/data_observability/quality_monitoring/'
tag: 'Documentation'
text: 'Quality Monitoring'
- link: '/data_observability/lineage/'
tag: 'Documentation'
text: 'Lineage'
- link: '/data_observability/jobs_monitoring/'
tag: 'Documentation'
text: 'Jobs Monitoring'
Expand All @@ -27,7 +30,7 @@ When you open the catalog at [/data-obs/catalog](https://app.datadoghq.com/data-
- **Links to the source system**: direct references back to the origin platform so you can navigate from the catalog to the source in one click
- **Tags**: `key:value` metadata pairs pulled from the source system if available
- **Monitor Status**: displays the state of any active [Data Quality Monitors](/data_observability/quality_monitoring/) on the asset
- **Lineage**: upstream and downstream dependencies, where supported by the integration
- **Lineage**: upstream and downstream dependencies, where supported by the integration. To explore lineage across assets, see [Lineage][1].

Use the left sidebar to filter assets by type: {{< ui >}}All assets{{< /ui >}}, {{< ui >}}Databases{{< /ui >}}, {{< ui >}}Schemas{{< /ui >}}, or {{< ui >}}Tables{{< /ui >}}. Connected integrations (such as Snowflake, dbt, and BigQuery) are also listed individually in the sidebar.

Expand All @@ -45,3 +48,5 @@ Wildcards and unions are also supported:
- **Intersection**: `dim_zendesk AND data_owner:TS-OPS-ANALYTICS`

Recent searches are saved and surfaced in the dropdown for quick reuse.

[1]: /data_observability/lineage/
5 changes: 5 additions & 0 deletions content/en/data_observability/jobs_monitoring/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ description: "Monitor performance, reliability, and cost efficiency of data proc
aliases:
- /data_jobs/
further_reading:
- link: '/data_observability/lineage/'
tag: 'Documentation'
text: 'Lineage'
- link: '/data_streams'
tag: 'Documentation'
text: 'Data Streams Monitoring'
Expand All @@ -19,6 +22,7 @@ Data Observability: Jobs Monitoring provides visibility into the performance, re
- Track the health and performance of data processing jobs across your accounts and workspaces. See which take up the most compute resources or have inefficiencies.
- Receive an alert when a job fails—or when a job is taking too long to complete.
- Analyze job execution details and stack traces.
- Use [Lineage][2] to assess upstream causes and downstream impact for failing or delayed jobs.
- Correlate infrastructure metrics, Spark metrics from the Spark UI, logs, and cluster configuration.
- Compare multiple runs to facilitate troubleshooting, and to optimize provisioning and configuration during deployment.

Expand Down Expand Up @@ -71,3 +75,4 @@ To determine why a stage is taking a long time to complete, you can use the {{<
{{< partial name="whats-next/whats-next.html" >}}

[1]: https://app.datadoghq.com/monitors/templates
[2]: /data_observability/lineage/
132 changes: 132 additions & 0 deletions content/en/data_observability/lineage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
title: Lineage
description: Trace upstream dependencies and downstream consumers across data assets, jobs, dashboards, and applications.
further_reading:
- link: "/data_observability/data_catalog/"
tag: "Documentation"
text: "Data Catalog"
- link: "/data_observability/quality_monitoring/"
tag: "Documentation"
text: "Quality Monitoring"
- link: "/data_observability/jobs_monitoring/"
tag: "Documentation"
text: "Jobs Monitoring"
- link: "https://www.datadoghq.com/blog/data-lineage/"
tag: "Blog"
text: "Understanding data lineage"
---

## Overview

Lineage shows how data flows through your stack—from source systems and warehouse tables, through transformations and jobs, to the dashboards and applications that consume it. Use it to trace quality issues to their root cause, assess the blast radius of a failing job or a planned schema change, and route incidents to the right owner.

Datadog builds lineage automatically from metadata collected through your [Quality Monitoring][1] and [Jobs Monitoring][2] integrations (Snowflake, BigQuery, Databricks, dbt, Airflow, Fivetran, Looker, Tableau, and others). Anything in the Data Observability Catalog can appear in the graph.

{{< img src="data_observability/lineage/lineage-overview.png" alt="The Lineage page showing upstream and downstream dependencies for an anchored Snowflake table" style="width:100%;" >}}

To open Lineage, go to **Data Observability > Lineage**.

## Select anchor assets

Every lineage view centers on an **anchor**: the single asset whose upstream and downstream neighbors you want to explore. Datadog marks the anchor node with an `ANCHOR` badge.

To set an anchor, use the search bar at the top of the page:

1. Choose an asset type from the **Any asset** dropdown (for example, *Table*, *Column*, *Dashboard*, or *Job*). Leave it set to **Any asset** to search across all types.
2. Enter the asset name. Datadog searches all connected sources in the Data Observability Catalog.
3. Select a result to anchor the graph.

**One Anchor**
{{< img src="data_observability/lineage/anchor-search-bar-one.png" alt="The anchor search bar with one anchor selected" style="width:100%;" >}}

**Multiple Anchors**
{{< img src="data_observability/lineage/anchor-search-bar-multiple.png" alt="The anchor search bar with multiple anchors selected" style="width:100%;" >}}

**Search Query**
{{< img src="data_observability/lineage/anchor-search-bar-query.png" alt="The anchor search bar with multiple anchors selected via a search query" style="width:100%;" >}}

The graph renders with the anchors in the center and upstream and downstream neighbors expanding to the left and right.

## Navigate the graph

After you set an anchor, the lineage graph renders in the main panel. Upstream dependencies appear to the left; downstream consumers appear to the right. Each node shows the asset's name, type, source, and basic stats such as row or column count where available.

The toolbar on the right of the canvas provides **zoom in**, **zoom out**, **Reset view**, and **Center anchors**.

The time selector in the top-right corner (`1w`, `Past 1 Week`, and so on) sets the window used to evaluate lineage. Datadog derives relationships from query history and job runs within this window: widen it to surface older or less frequent dependencies, narrow it to show only what's active.

## Lineage Controls

The **Lineage Controls** panel on the left configures the shape and contents of the graph.

### Map, List, and Find

Toggle between **Map** (the default graph view) and **List** (a flat, sortable list of every asset in the current slice). Use **List** to export, copy, or scan a large lineage; use **Map** to understand structure visually.

The magnifying-glass icon next to the toggle fits the graph to the viewport.

The **Find in map** search box highlights nodes in the current graph by name. Unlike the top-of-page search, it does not change the anchor—it only locates nodes already on screen.

### Depth

**Depth** controls how many hops of lineage to load on either side of the anchor.

- The left selector sets **upstream depth** (levels of parents).
- The right selector sets **downstream depth** (levels of children).
- Set either to `∞` to load all available hops in that direction.

{{< img src="data_observability/lineage/depth-controls.png" alt="The Depth controls showing upstream and downstream selectors flanking the ANCHOR badge" style="width:40%;" >}}

Increase depth to find a distant root cause or downstream consumer. Decrease depth when the graph is too dense to navigate.

### Filter

The **Filter** section controls which asset types are displayed. For Snowflake, the available types are **Column** and **Table**; BI integrations add dashboards and reports; jobs add tasks and DAGs. The number next to each type shows how many of those assets exist in the current slice.

Filter when the slice contains the right assets but the graph is too noisy. For example, when scoping the blast radius of a column change, uncheck **Table** to remove table-level clutter and leave **Column** checked.

Filtering does not change the anchor or the depth—it only hides nodes from the rendered graph.

### Group by

**Group by** sets the level of aggregation. Available levels depend on the source. For Snowflake, you can group by **Accounts**, **Databases**, **Schemas**, **Tables**, or **Columns**.

Grouping is most useful for zooming out: group by **Schemas** to see how data flows across a warehouse, then drill down to **Tables** or **Columns** after you find the area of interest.

## Common workflows

### Root cause analysis

When a downstream asset—a dashboard, a model, an ML feature—is broken or stale, lineage helps you walk backward to the source.

1. Anchor on the broken asset (a Looker dashboard, a Snowflake table, a dbt model).
2. Set downstream depth to `0` to focus on upstream assets.
3. Group by **Tables** for the broad structure; switch to **Columns** if the issue is at column level.
4. Step backward through upstream nodes. Failures, freshness anomalies, and schema changes flagged by Quality Monitoring or Jobs Monitoring appear as status indicators on the graph.
5. Open a flagged node to jump to its quality monitors, recent job runs, or schema history.

### Impact analysis (blast radius)

Before changing or dropping a column, table, or model, use lineage to see what depends on it.

1. Anchor on the asset you plan to change.
2. Set downstream depth to `∞` and upstream depth to `0`.
3. Filter to the asset types you care about—for example, leave dashboards and reports visible to identify affected BI consumers.
4. Switch to **List** view to export the full list of affected assets or share it with the owning teams.

{{< img src="data_observability/lineage/impact-analysis-list-view.png" alt="The List view showing every downstream asset that depends on a given Snowflake table, with type and source columns" style="width:100%;" >}}

### Tracing a column end-to-end

Most integrated sources support column-level lineage.

1. In the search bar, change the asset type to **Column** and search for the column to trace.
2. Anchor on the result.
3. Group by **Columns** to keep the graph at column granularity.

## Further Reading

{{< partial name="whats-next/whats-next.html" >}}

[1]: /data_observability/quality_monitoring/
[2]: /data_observability/jobs_monitoring/
5 changes: 5 additions & 0 deletions content/en/data_observability/quality_monitoring/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ further_reading:
- link: '/data_observability/jobs_monitoring'
tag: 'Documentation'
text: 'Jobs Monitoring'
- link: '/data_observability/lineage/'
tag: 'Documentation'
text: 'Lineage'
---

## Overview
Expand All @@ -23,6 +26,7 @@ With Quality Monitoring, you can:
- Surface changes in column-level metrics such as null counts or uniqueness
- Set up monitors using static thresholds or historical baselines
- Trace quality issues using lineage views that show upstream jobs and downstream impact
- Explore end-to-end dependencies in [Lineage][1] to identify root causes and downstream impact

## Supported data sources

Expand Down Expand Up @@ -65,3 +69,4 @@ With Quality Monitoring, you can:

{{< partial name="whats-next/whats-next.html" >}}

[1]: /data_observability/lineage/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading