Skip to content

BigQuery: FROM UNNEST(...) AS alias not recognized as a scope source, causing incorrect column lineage #209

@eitsupi

Description

@eitsupi

Summary

When analyzing BigQuery SQL that uses FROM UNNEST(...) AS alias (lateral unnest), the lineage tracker fails to recognize the UNNEST as a scope source. Column references to the UNNEST alias in the SELECT list become unresolved leaf nodes, producing an empty table name in the lineage output.

Root Cause

Three functions in the lineage/scope pipeline have no handling for Expression::Unnest, causing it to be silently ignored:

lineage.rsget_select_sources() (line ~479):

fn extract_source(expr: &Expression) -> Option<SourceInfo> {
    match expr {
        Expression::Table(t) => { ... }
        Expression::Subquery(s) => { ... }
        Expression::Paren(p) => extract_source(&p.this),
        _ => None,   // ← Unnest falls here, never registered as a source
    }
}

lineage.rssource_names_from_from_join() (line ~961):

fn source_name(expr: &Expression) -> Option<String> {
    match expr {
        Expression::Table(table) => { ... }
        Expression::Subquery(subquery) => { ... }
        Expression::Paren(paren) => source_name(&paren.this),
        _ => None,   // ← Unnest falls here, alias never added to source names
    }
}

scope.rsadd_table_to_scope() (line ~638):

fn add_table_to_scope(expr: &Expression, scope: &mut Scope) {
    match expr {
        Expression::Table(table) => { ... }
        Expression::Subquery(subquery) => { ... }
        Expression::Paren(paren) => add_table_to_scope(&paren.this, scope),
        _ => {}   // ← Unnest falls here, never added to scope
    }
}

Because the UNNEST alias is never registered, when resolving date_val AS week_start, the column reference date_val finds zero from-sources (from_source_names.len() == 0) and takes the "multiple sources / unresolved" path in resolve_unqualified_column(), producing a bare leaf node named "date_val" with an empty table.

Reproduction

SELECT
    date_val AS week_start
FROM
    UNNEST(GENERATE_DATE_ARRAY('2024-01-01', '2024-12-31', INTERVAL 1 WEEK)) AS date_val

Lineage for week_start:

week_start
  └── (leaf)  table="", column="date_val"   ← unresolved; table should not be empty

For comparison, a string literal in the same pattern:

SELECT 'app' AS data_source FROM some_table

Lineage for data_source:

data_source
  └── (leaf)  table="", column="data_source"

Both produce table="", but the UNNEST case resolves to the intermediate UNNEST alias (date_val) rather than being properly tracked to the UNNEST source — creating an asymmetry that makes it impossible to distinguish "no real upstream" from "unresolved UNNEST alias".

Expected Behavior

The UNNEST alias should be recognized as a virtual table source, and the leaf node should reflect that. Concretely, date_val should resolve to the UNNEST expression, producing something analogous to {table: "date_val", column: "date_val"} rather than {table: "", column: "date_val"}.

Comparison with sqlglot

sqlglot handles this correctly. It defines Unnest as a UDTF (User-Defined Table Function), creates a UDTF scope for it, and exposes the alias column names via UDTF.selects. The resulting lineage tree is:

week_start (Alias)
  └── _0.date_val (Identifier)   ← UNNEST treated as virtual table _0, column date_val

The leaf's .source points to the UNNEST expression itself:

UNNEST(GENERATE_DATE_ARRAY('2024-01-01', '2024-12-31', INTERVAL '1' WEEK)) AS _0(date_val)

Proposed Fix

Add Expression::Unnest arms to the three functions above so that the UNNEST alias is registered as a scope source. For example, in source_names_from_from_join():

Expression::Unnest(u) => u.alias.as_ref().map(|a| a.name.clone()),

And in add_table_to_scope(), add the UNNEST expression as a source under its alias name, similar to how Expression::Table is handled. Once the alias is in scope, resolve_unqualified_column() will have one source to resolve against and will produce a properly attributed leaf node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions