-
Notifications
You must be signed in to change notification settings - Fork 376
Description
Search before asking
- I have searched in the issues and found no similar issues.
What would you like to be improved?
Currently, table_summary metrics (total files, file sizes, health score, etc.) are only collected when self-optimizing.enabled=true. This is because the metric update path (setTableSummary()) is gated behind the optimizingConfig.isEnabled() check in TableRuntimeRefreshExecutor.tryEvaluatingPendingInput().
As a result, tables with self-optimizing.enabled=false always show 0 or N/A for key monitoring metrics (Health Score, Total Files, Total Files Size) in Grafana dashboards, even though the table has actual data files.
Metric collection and self-optimizing execution are conceptually independent concerns. Users should be able to monitor table health without enabling the optimizing process.
How should we improve?
Introduce a new table property self-optimizing.table-summary.enabled that allows metric collection to be enabled independently of self-optimizing.
Behavior matrix:
self-optimizing.enabled |
table-summary.enabled |
Optimizing runs | Metrics collected |
|---|---|---|---|
true |
(any) | Yes | Yes |
false |
true |
No | Yes |
false |
false (default) |
No | No |
Implementation approach:
- Add
SELF_OPTIMIZING_TABLE_SUMMARY_ENABLEDconstant inTableProperties - Add
tableSummaryEnabledfield toOptimizingConfig - Parse the new property in
TableConfigurations.parseOptimizingConfig() - Add an
else ifbranch inTableRuntimeRefreshExecutor.tryEvaluatingPendingInput()to collect summary metrics when optimizing is disabled but table-summary is enabled
The default value of false ensures no behavioral change for existing tables.
Usage example:
ALTER TABLE db.my_table SET TBLPROPERTIES (
'self-optimizing.enabled' = 'false',
'self-optimizing.table-summary.enabled' = 'true'
);Are you willing to submit PR?
- Yes I am willing to submit a PR!
Subtasks
No response
Code of Conduct
- I agree to follow this project's Code of Conduct