Skip to content

[Improvement]: Collect table_summary metrics independently of self-optimizing #4099

@j1wonpark

Description

@j1wonpark

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently, table_summary metrics (total files, file sizes, health score, etc.) are only collected when self-optimizing.enabled=true. This is because the metric update path (setTableSummary()) is gated behind the optimizingConfig.isEnabled() check in TableRuntimeRefreshExecutor.tryEvaluatingPendingInput().

As a result, tables with self-optimizing.enabled=false always show 0 or N/A for key monitoring metrics (Health Score, Total Files, Total Files Size) in Grafana dashboards, even though the table has actual data files.

Metric collection and self-optimizing execution are conceptually independent concerns. Users should be able to monitor table health without enabling the optimizing process.

How should we improve?

Introduce a new table property self-optimizing.table-summary.enabled that allows metric collection to be enabled independently of self-optimizing.

Behavior matrix:

self-optimizing.enabled table-summary.enabled Optimizing runs Metrics collected
true (any) Yes Yes
false true No Yes
false false (default) No No

Implementation approach:

  1. Add SELF_OPTIMIZING_TABLE_SUMMARY_ENABLED constant in TableProperties
  2. Add tableSummaryEnabled field to OptimizingConfig
  3. Parse the new property in TableConfigurations.parseOptimizingConfig()
  4. Add an else if branch in TableRuntimeRefreshExecutor.tryEvaluatingPendingInput() to collect summary metrics when optimizing is disabled but table-summary is enabled

The default value of false ensures no behavioral change for existing tables.

Usage example:

ALTER TABLE db.my_table SET TBLPROPERTIES (
  'self-optimizing.enabled'               = 'false',
  'self-optimizing.table-summary.enabled' = 'true'
);

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions