Search before asking
Motivation
Motivation
Paimon Rust scan planning has several metadata pruning paths, including partition pruning, bucket pruning, file min/max stats pruning, LIMIT split reduction, COUNT(*) statistics rewrite, and time-travel snapshot selection.
Today these pruning decisions are hard to inspect from tests or physical plan output. This makes it difficult to identify cases that still do full scans or produce too many splits.
There is also a specific gap for non-partition IN predicates: file min/max stats can prove that some files cannot match, but IN currently fails open and keeps those files.
Proposal
Add lightweight scan planning trace counters so tests and DataFusion physical plan display can show how many manifests, manifest entries, splits, and files survive each pruning stage.
Use the trace to add self-contained pruning baselines for:
- partition pruning
- bucket-key pruning
- SQL BETWEEN partition pruning
- LIMIT split reduction
- COUNT(*) statistics rewrite
- time-travel snapshot selection
Separately, improve non-partition IN stats pruning by checking whether any IN literal overlaps the file min/max range. Keep conservative behavior for NOT IN, missing stats, corrupt stats, and unsupported comparisons.
Solution
Anything else?
No response
Willingness to contribute
Search before asking
Motivation
Motivation
Paimon Rust scan planning has several metadata pruning paths, including partition pruning, bucket pruning, file min/max stats pruning, LIMIT split reduction, COUNT(*) statistics rewrite, and time-travel snapshot selection.
Today these pruning decisions are hard to inspect from tests or physical plan output. This makes it difficult to identify cases that still do full scans or produce too many splits.
There is also a specific gap for non-partition
INpredicates: file min/max stats can prove that some files cannot match, butINcurrently fails open and keeps those files.Proposal
Add lightweight scan planning trace counters so tests and DataFusion physical plan display can show how many manifests, manifest entries, splits, and files survive each pruning stage.
Use the trace to add self-contained pruning baselines for:
Separately, improve non-partition
INstats pruning by checking whether anyINliteral overlaps the file min/max range. Keep conservative behavior forNOT IN, missing stats, corrupt stats, and unsupported comparisons.Solution
INstats pruning with file min/max stats.Anything else?
No response
Willingness to contribute