[X-2935] Cherry-pick missing dedup with_new_children block from upstream #21807#61
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a correctness issue in DeduplicatingDeserializer::proto_to_physical_expr where cache hits previously returned the cached outer expression Arc unchanged, potentially discarding occurrence-specific children (e.g., after FilterPushdown rewrites column refs). The change aligns behavior with upstream by rewrapping cached expressions using with_new_children(...) populated from the current proto body.
Changes:
- Update
DeduplicatingDeserializer::proto_to_physical_exprcache-hit logic to rewrap cached expressions viawith_new_children(parsed.children())while preserving shared inner state. - Relax existing tests that assumed outer-
Arcpointer equality and add a regression test for “same Inner, distinct children” roundtrips. - Expand assertions around dynamic filter deduplication invariants in roundtrip tests.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| datafusion/proto/src/physical_plan/mod.rs | Rewrap cached physical expressions with occurrence-specific children on cache hits to avoid child clobbering. |
| datafusion/proto/tests/cases/roundtrip_physical_plan.rs | Adjust existing dynamic-filter dedup tests and add a regression test for distinct-children occurrences. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
722eae2 to
5ff5d36
Compare
xudong963
approved these changes
Jun 15, 2026
…e#21807 When this fork ported apache#21807 in commit 47f263d, the 'with_new_children on cache hit' block in DeduplicatingDeserializer was accidentally omitted. The simpler 'return Ok(Arc::clone(cached))' version we landed is correct only when every occurrence of a given Inner has the same outer wrapper shape. In production this fails: FilterPushdown clones a SortExec's DynamicFilterPhysicalExpr and rewrites its children's column refs to match the downstream FileScan's file schema. Both outer wrappers share the same Inner Arc (same expression_id, same wire expr_id) but DIFFERENT children. On decode the simpler cache hit returns the first decode's outer wrapper, silently discarding the second occurrence's children. prune_by_statistics then resolves column refs against the wrong positions and pruning becomes a no-op. Observed end-to-end on ny2 staging 2026-06-15 after walker removal: row_groups_pruned_statistics=0 total bytes_scanned=91 MB time_elapsed_processing=31 s (was 1.99K pruned / 136 KB / ~100ms with the walker still active) Cherry-picks the missing block: parse the proto body first, then on cache hit return Arc::clone(cached).with_new_children(parsed.children()). This keeps the cached Inner (so TopK heap-max updates propagate) but installs the proto body's children on a fresh outer wrapper (so each occurrence keeps its own column refs). Adds a regression test that fails without the fix and passes with it, with assertion messages pointing at the exact root cause.
5ff5d36 to
766a24c
Compare
zhuqi-lucas
added a commit
to massive-com/datafusion-materialized-views
that referenced
this pull request
Jun 15, 2026
Picks up the dedup with_new_children fix (massive-com/arrow-datafusion#61) on branch-53.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When this fork originally ported apache#21807 (commit 47f263d), the cache-hit
with_new_childrenblock in `DeduplicatingDeserializer::proto_to_physical_expr` was accidentally omitted. The simpler `return Ok(Arc::clone(cached))` we landed is correct only when every occurrence of a given Inner has the same outer wrapper shape -- which doesn't hold in practice.Why this is broken in production
FilterPushdown clones a SortExec's `DynamicFilterPhysicalExpr` and rewrites its children's column refs to match the downstream FileScan's file schema. Both outer wrappers share the same `Arc` (same `expression_id`, same wire `expr_id`) but DIFFERENT children. On decode the simpler cache hit returns the first decode's outer wrapper as-is, silently discarding the second occurrence's children. `prune_by_statistics` then resolves column refs against the wrong positions and pruning becomes a no-op.
Observed end-to-end on ny2 staging 2026-06-15 after walker removal:
What this PR does
Cherry-picks the missing block from upstream `apache/datafusion@948cd09` (
#21807\):This keeps the cached Inner (so TopK heap-max updates propagate end-to-end) but installs the proto body's children on a fresh outer wrapper (so each occurrence keeps its own column refs).
Tests
left: 0, right: 1, exactly matching the prod symptom.Affects