refactor: Use cast preimages for cast predicate rewrites#22906
refactor: Use cast preimages for cast predicate rewrites#22906discord9 wants to merge 2 commits into
Conversation
Signed-off-by: discord9 <discord9@163.com>
|
This PR looks like a very nice solution for the cast pattern. I'm comfortable proceeding with it, but please forgive me for briefly advocating an alternative approach (that I'm to happy to help reviewing or implementing): I believe the fundamental goal here is to enable pruning through nested expressions, and the propagation based approach could be a better long term solution. My concern with the preimage approach is that it requires introducing and maintaining an ever-growing set of reverse-transformation rules. Even with additional rules, there will likely still be cases that cannot be handled. If this becomes a supported pattern, I worry that the long-term maintenance burden could be significant. In contrast, the propagation approach seems both more general and easier to reason about. The key intuition is that it follows a forward-evaluation model, similar to normal expression evaluation, whereas the preimage approach attempts to reverse complex expressions back into a simpler form. In many cases, the latter is inherently more difficult and may require expression-specific logic. |
Which issue does this PR close?
Rationale for this change
This is a draft follow-up to the timestamp precision narrowing discussion in cast predicate simplification.
The previous cast unwrap path could only rewrite predicates by moving the original comparison operator from
CAST(expr AS target_type) OP literaltoexpr OP casted_literal. That shape is not correct for many-to-one casts such as timestamp precision narrowing, where the source-domain preimage of one target timestamp value is a range rather than a singleton.For example,
CAST(ts_ns AS Timestamp(ms)) > TimestampMillisecond(1000)should not becomets_ns > TimestampNanosecond(1000000000). The correct source-domain boundary is based on the timestamp bucket preimage and becomests_ns >= TimestampNanosecond(1001000000).What changes are included in this PR?
CastPredicatePreimageabstraction indatafusion-expr-common:Exact(ScalarValue)for singleton source-domain preimages.Range(Interval)for half-open source-domain intervals.Are these changes tested?
Yes. This PR adds/updates tests for:
'0123',Validated locally with:
Are there any user-facing changes?
No user-facing API changes are intended. This is an optimizer correctness/refactoring change for cast predicate rewrites.