Skip to content

EXPLAIN COST gives wrong estimated input size when querying via VIEW instead of physical Delta path #56990

Description

@omarMseddi0

[SQL] EXPLAIN COST gives wrong estimated input size when querying via VIEW instead of physical Delta path

Component

Spark SQL (Catalyst statistics / EXPLAIN COST)

Describe the problem

EXPLAIN COST reports the correct estimated input size when querying a Delta table by its physical path. But if a VIEW (or table) is created on top of that same path, EXPLAIN COST on the view returns a wrong estimated input size for the identical data. Query results are correct — only the cost/size estimate is wrong.

Steps to reproduce

-- 1. Correct size
EXPLAIN COST SELECT * FROM delta.`/path/to/table_x`;

-- 2. Wrap in a view
CREATE OR REPLACE VIEW table_x AS
SELECT * FROM delta.`/path/to/table_x`;

-- 3. Wrong size
EXPLAIN COST SELECT * FROM table_x;

Observed

Step 1: correct sizeInBytes.
Step 3: incorrect sizeInBytes for the same underlying data.

Expected

EXPLAIN COST should report the same, accurate estimated input size whether queried via the physical path or via a view/logical table name wrapping it.

Environment

  • Spark: 3.5.6

  • Delta Lake: 3.3.0

  • Deployment: (k8s)

Willingness to contribute

  • Yes, independently
  • Yes, with guidance
  • [] No

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions