Summary
When analyzing BigQuery SQL that uses FROM UNNEST(...) AS alias (lateral unnest), the lineage tracker fails to recognize the UNNEST as a scope source. Column references to the UNNEST alias in the SELECT list become unresolved leaf nodes, producing an empty table name in the lineage output.
Root Cause
Three functions in the lineage/scope pipeline have no handling for Expression::Unnest, causing it to be silently ignored:
lineage.rs — get_select_sources() (line ~479):
fn extract_source(expr: &Expression) -> Option<SourceInfo> {
match expr {
Expression::Table(t) => { ... }
Expression::Subquery(s) => { ... }
Expression::Paren(p) => extract_source(&p.this),
_ => None, // ← Unnest falls here, never registered as a source
}
}
lineage.rs — source_names_from_from_join() (line ~961):
fn source_name(expr: &Expression) -> Option<String> {
match expr {
Expression::Table(table) => { ... }
Expression::Subquery(subquery) => { ... }
Expression::Paren(paren) => source_name(&paren.this),
_ => None, // ← Unnest falls here, alias never added to source names
}
}
scope.rs — add_table_to_scope() (line ~638):
fn add_table_to_scope(expr: &Expression, scope: &mut Scope) {
match expr {
Expression::Table(table) => { ... }
Expression::Subquery(subquery) => { ... }
Expression::Paren(paren) => add_table_to_scope(&paren.this, scope),
_ => {} // ← Unnest falls here, never added to scope
}
}
Because the UNNEST alias is never registered, when resolving date_val AS week_start, the column reference date_val finds zero from-sources (from_source_names.len() == 0) and takes the "multiple sources / unresolved" path in resolve_unqualified_column(), producing a bare leaf node named "date_val" with an empty table.
Reproduction
SELECT
date_val AS week_start
FROM
UNNEST(GENERATE_DATE_ARRAY('2024-01-01', '2024-12-31', INTERVAL 1 WEEK)) AS date_val
Lineage for week_start:
week_start
└── (leaf) table="", column="date_val" ← unresolved; table should not be empty
For comparison, a string literal in the same pattern:
SELECT 'app' AS data_source FROM some_table
Lineage for data_source:
data_source
└── (leaf) table="", column="data_source"
Both produce table="", but the UNNEST case resolves to the intermediate UNNEST alias (date_val) rather than being properly tracked to the UNNEST source — creating an asymmetry that makes it impossible to distinguish "no real upstream" from "unresolved UNNEST alias".
Expected Behavior
The UNNEST alias should be recognized as a virtual table source, and the leaf node should reflect that. Concretely, date_val should resolve to the UNNEST expression, producing something analogous to {table: "date_val", column: "date_val"} rather than {table: "", column: "date_val"}.
Comparison with sqlglot
sqlglot handles this correctly. It defines Unnest as a UDTF (User-Defined Table Function), creates a UDTF scope for it, and exposes the alias column names via UDTF.selects. The resulting lineage tree is:
week_start (Alias)
└── _0.date_val (Identifier) ← UNNEST treated as virtual table _0, column date_val
The leaf's .source points to the UNNEST expression itself:
UNNEST(GENERATE_DATE_ARRAY('2024-01-01', '2024-12-31', INTERVAL '1' WEEK)) AS _0(date_val)
Proposed Fix
Add Expression::Unnest arms to the three functions above so that the UNNEST alias is registered as a scope source. For example, in source_names_from_from_join():
Expression::Unnest(u) => u.alias.as_ref().map(|a| a.name.clone()),
And in add_table_to_scope(), add the UNNEST expression as a source under its alias name, similar to how Expression::Table is handled. Once the alias is in scope, resolve_unqualified_column() will have one source to resolve against and will produce a properly attributed leaf node.
Summary
When analyzing BigQuery SQL that uses
FROM UNNEST(...) AS alias(lateral unnest), the lineage tracker fails to recognize the UNNEST as a scope source. Column references to the UNNEST alias in the SELECT list become unresolved leaf nodes, producing an empty table name in the lineage output.Root Cause
Three functions in the lineage/scope pipeline have no handling for
Expression::Unnest, causing it to be silently ignored:lineage.rs—get_select_sources()(line ~479):lineage.rs—source_names_from_from_join()(line ~961):scope.rs—add_table_to_scope()(line ~638):Because the UNNEST alias is never registered, when resolving
date_val AS week_start, the column referencedate_valfinds zero from-sources (from_source_names.len() == 0) and takes the "multiple sources / unresolved" path inresolve_unqualified_column(), producing a bare leaf node named"date_val"with an empty table.Reproduction
Lineage for
week_start:For comparison, a string literal in the same pattern:
Lineage for
data_source:Both produce
table="", but the UNNEST case resolves to the intermediate UNNEST alias (date_val) rather than being properly tracked to the UNNEST source — creating an asymmetry that makes it impossible to distinguish "no real upstream" from "unresolved UNNEST alias".Expected Behavior
The UNNEST alias should be recognized as a virtual table source, and the leaf node should reflect that. Concretely,
date_valshould resolve to the UNNEST expression, producing something analogous to{table: "date_val", column: "date_val"}rather than{table: "", column: "date_val"}.Comparison with sqlglot
sqlglot handles this correctly. It defines
Unnestas aUDTF(User-Defined Table Function), creates a UDTF scope for it, and exposes the alias column names viaUDTF.selects. The resulting lineage tree is:The leaf's
.sourcepoints to the UNNEST expression itself:Proposed Fix
Add
Expression::Unnestarms to the three functions above so that the UNNEST alias is registered as a scope source. For example, insource_names_from_from_join():And in
add_table_to_scope(), add the UNNEST expression as a source under its alias name, similar to howExpression::Tableis handled. Once the alias is in scope,resolve_unqualified_column()will have one source to resolve against and will produce a properly attributed leaf node.