[X-2935] proto: preserve nested DynamicFilterPhysicalExpr through snapshot walk#62
Merged
Merged
Conversation
…pshot walk `serialize_physical_expr_with_converter` only honored the top-level `DynamicFilterPhysicalExpr` special case; any nested wrapper inside a `BinaryExpr` / `CastExpr` / etc. was folded into its current() snapshot by the unconditional `snapshot_physical_expr(value)` call right after. When a `FilterPushdown(Post)` pass re-pushes a SortExec's self-filter into a FileScan that already carries a static WHERE, the resulting predicate is `BinaryExpr(AND, static_WHERE, dyn_filter)`. The old code serialized this as `BinaryExpr(AND, static_WHERE, lit(true))` on the wire (since `current()` returns `lit(true)` before TopK runs), and the data-server side decoded a fully static predicate with no live link to TopK -- the IncrementalRowGroupPruner saw no `snapshot_generation` movement and never tightened pruning. Replace the unconditional snapshot walker with a `transform_up` that skips `DynamicFilterPhysicalExpr` nodes. Other dynamic exprs (`HashTableLookupExpr`, etc.) still get snapshotted as before; the DynamicFilter wrapper survives to the downcast chain, where the existing line-318 branch serializes it as `PhysicalDynamicFilterNode` at whatever nesting depth. Adds `dynamic_filter_nested_in_binary_expr_survives_proto_roundtrip` covering the `BinaryExpr(static_WHERE, AND, dyn_filter)` shape. Upstream apache#21929 (commit 077f08a, merged 2026-05-22) fixed this structurally by removing the top-level snapshot call and routing every PhysicalExpr through `try_to_proto` hooks. This patch is the minimal port until the atlas DF fork adopts that refactor.
There was a problem hiding this comment.
Pull request overview
This PR fixes distributed physical-expression protobuf serialization so that DynamicFilterPhysicalExpr wrappers are preserved even when nested inside other expressions (e.g., BinaryExpr(AND, static_where, dyn_filter)), rather than being snapshotted into their current() literal and losing the live update link required for pruning improvements downstream.
Changes:
- Update
serialize_physical_expr_with_converterto snapshot dynamic expressions via atransform_upwalk that skipsDynamicFilterPhysicalExprnodes, allowing them to reach the dedicatedPhysicalDynamicFilterNodeserialization path at any depth. - Add a regression test ensuring a nested
DynamicFilterPhysicalExprsurvives a proto roundtrip and thatupdate()bumps the tree-widesnapshot_generation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| datafusion/proto/src/physical_plan/to_proto.rs | Reworks snapshotting during serialization to preserve nested DynamicFilterPhysicalExpr while still snapshotting other dynamic exprs. |
| datafusion/proto/tests/cases/roundtrip_physical_plan.rs | Adds a regression test for the nested dynamic-filter-in-BinaryExpr roundtrip and generation bump behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
zhuqi-lucas
added a commit
to massive-com/datafusion-materialized-views
that referenced
this pull request
Jun 16, 2026
Picks up the nested DynamicFilterPhysicalExpr snapshot-walker fix (massive-com/arrow-datafusion#62) on branch-53.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
serialize_physical_expr_with_converteronly honored the top-levelDynamicFilterPhysicalExprspecial case; any nested wrapper inside aBinaryExpr/CastExpr/ etc. was folded into itscurrent()snapshot by the unconditionalsnapshot_physical_expr(value)call right after.When a
FilterPushdown(Post)pass re-pushes a SortExec self-filter into a FileScan that already carries a static WHERE, the resulting predicate isBinaryExpr(AND, static_WHERE, dyn_filter). The old code serialized this asBinaryExpr(AND, static_WHERE, lit(true))on the wire (sincecurrent()returnslit(true)before TopK runs), and the data-server side decoded a fully static predicate with no live link to TopK.IncrementalRowGroupPrunersaw nosnapshot_generationmovement and never tightened pruning.What this PR changes
Replace the unconditional snapshot walker with a
transform_upthat skipsDynamicFilterPhysicalExprnodes. Other dynamic exprs (HashTableLookupExpr, etc.) still get snapshotted as before; the DynamicFilter wrapper survives to the downcast chain, where the existing branch serializes it asPhysicalDynamicFilterNodeat whatever nesting depth.Adds a regression test
dynamic_filter_nested_in_binary_expr_survives_proto_roundtripcovering theBinaryExpr(static_WHERE, AND, dyn_filter)shape. Pins that the decoded right-side stays a liveDynamicFilterPhysicalExpr(not aLiteral) and that anupdate()on it bumps the tree-widesnapshot_generation.Affects
PhysicalExprtree that contains a nestedDynamicFilterPhysicalExpr— wire output now carriesPhysicalDynamicFilterNodeinstead of aLiteralsnapshot.Upstream reference
apache#21929 (commit
077f08a9a, merged 2026-05-22) fixed this structurally by removing the top-level snapshot call entirely and routing everyPhysicalExprthroughtry_to_protohooks. This patch is the minimal port until the atlas DF fork adopts that refactor.Validation
cargo test -p datafusion-proto --test proto_integration dynamic_filter→ 5/5 pass (4 existing + 1 new)[patch]block: the atlas-sidetest_dedup_post_filterpushdown_rerun_preserves_shared_inner_e2eflips from FAIL to PASS purely with this patch applied (no atlas-local workaround needed).