Skip to content

feat(proto): plumb PhysicalProtoConverterExtension into try_encode_expr / try_decode_expr#22922

Open
zhuqi-lucas wants to merge 1 commit into
apache:mainfrom
zhuqi-lucas:qizhu/extension-codec-with-converter
Open

feat(proto): plumb PhysicalProtoConverterExtension into try_encode_expr / try_decode_expr#22922
zhuqi-lucas wants to merge 1 commit into
apache:mainfrom
zhuqi-lucas:qizhu/extension-codec-with-converter

Conversation

@zhuqi-lucas

@zhuqi-lucas zhuqi-lucas commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #22920.

Rationale for this change

Note: This PR is scoped to downstream-defined custom PhysicalExpr types. Custom file sources and plan extension types serialize through plan-level try_encode / try_decode, which already take proto_converter. The remaining gap is the expression-level fallback in serialize_physical_expr_with_converter / parse_physical_expr_with_converter, where codec.try_encode_expr / codec.try_decode_expr is invoked without the active converter.

#21807 introduced the DynamicFilterPhysicalExpr dedup pipeline so that identical references on the wire reconstruct to one shared Arc<Inner> on the decode side via expression_id cache keys. The pipeline works end-to-end for plans built from upstream types, and #22011 hooked it through SortExec / AggregateExec / HashJoinExec plan codecs.

In to_proto.rs::serialize_physical_expr_with_converter, the extension codec's try_encode_expr is reached only as a fallback after the built-in path (expr.try_to_proto(&ctx)?) and the ScalarFunctionExpr downcast. So try_encode_expr only fires for downstream-defined custom PhysicalExpr types. The same applies symmetrically to parse_physical_expr_with_converter and try_decode_expr on the decode side.

When such a custom-PhysicalExpr codec needs to serialize nested PhysicalExprNode fields inside its blob, the only helper available today is the free serialize_physical_expr (hardwired to DefaultPhysicalProtoConverter). So nested expressions inside a custom-PhysicalExpr blob get expr_id: None on the wire, and the DeduplicatingDeserializer's cache never hits for refs that travel through a custom-expr boundary — even when an outer DeduplicatingProtoConverter is in effect on the rest of the plan.

Concretely: two references to the same DynamicFilterPhysicalExpr (one inside a custom-expr's nested field, one outside) reconstruct as distinct Inner allocations after roundtrip, so heap-max updates from a SortExec fail to propagate to the wrapped reference.

What changes are included in this PR?

Matches the existing plan-level pattern. try_encode and try_decode already take a proto_converter parameter; this change extends the expr-level methods the same way:

fn try_decode_expr(
    &self,
    buf: &[u8],
    inputs: &[Arc<dyn PhysicalExpr>],
    proto_converter: &dyn PhysicalProtoConverterExtension,  // new
) -> Result<Arc<dyn PhysicalExpr>>;

fn try_encode_expr(
    &self,
    node: &Arc<dyn PhysicalExpr>,
    buf: &mut Vec<u8>,
    proto_converter: &dyn PhysicalProtoConverterExtension,  // new
) -> Result<()>;

The two internal call sites in serialize_physical_expr_with_converter and parse_physical_expr_with_converter are updated to thread the active converter through. Downstream codecs whose custom PhysicalExpr types embed nested PhysicalExprNode fields can now route those through proto_converter.physical_expr_to_proto / proto_converter.proto_to_physical_expr and pick up dedup automatically.

Are these changes tested?

Yes — a new extension_codec_expr_participates_in_deduplication proto-roundtrip test constructs a BinaryExpr whose left operand is a bare DynamicFilterPhysicalExpr and right operand is a custom WrapperExpr whose extension codec embeds the same dynamic filter inside its serialized blob. After a DeduplicatingProtoConverter roundtrip, an update() applied to the bare-side decoded filter is observed by current() on the wrapped-side decoded filter, proving both refs back the same Inner. Without the converter-aware extension codec the test fails because the two refs end up with distinct Inner allocations.

Are there any user-facing changes?

Yes — this is a breaking change for downstream codecs that override try_encode_expr / try_decode_expr (i.e. codecs that handle custom PhysicalExpr types). They need to add the new proto_converter parameter (and may name it _proto_converter if they don't carry nested expressions). Codecs that only override the plan-level try_encode / try_decode are unaffected.

Wire format is unchanged (PhysicalExprNode.expr_id is unchanged); the fix is purely on the Rust trait API.

Copilot AI review requested due to automatic review settings June 12, 2026 06:15
@github-actions github-actions Bot added the proto Related to proto crate label Jun 12, 2026
@zhuqi-lucas zhuqi-lucas requested review from adriangb and alamb June 12, 2026 06:15
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-proto v54.0.0 (current)
       Built [  58.088s] (current)
     Parsing datafusion-proto v54.0.0 (current)
      Parsed [   0.020s] (current)
    Building datafusion-proto v54.0.0 (baseline)
       Built [  58.541s] (baseline)
     Parsing datafusion-proto v54.0.0 (baseline)
      Parsed [   0.019s] (baseline)
    Checking datafusion-proto v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.258s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure trait_method_parameter_count_changed: pub trait method parameter count changed ---

Description:
A trait method now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-item-signature
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/trait_method_parameter_count_changed.ron

Failed in:
  PhysicalExtensionCodec::try_decode_expr now takes 3 instead of 2 parameters, in file /home/runner/work/datafusion/datafusion/datafusion/proto/src/physical_plan/mod.rs:3952
  PhysicalExtensionCodec::try_encode_expr now takes 3 instead of 2 parameters, in file /home/runner/work/datafusion/datafusion/datafusion/proto/src/physical_plan/mod.rs:3968

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [ 118.899s] datafusion-proto

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 12, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends DataFusion’s physical-expression protobuf conversion pipeline so extension codecs can participate in expression deduplication (notably for DynamicFilterPhysicalExpr) by threading the active PhysicalProtoConverterExtension through PhysicalExtensionCodec::{try_encode_expr, try_decode_expr}. This closes the gap where expressions serialized inside extension-codec blobs could not benefit from a deduplicating converter on decode, breaking referential integrity across extension boundaries.

Changes:

  • Add a proto_converter: &dyn PhysicalProtoConverterExtension parameter to PhysicalExtensionCodec::try_encode_expr and try_decode_expr.
  • Plumb the converter through the internal extension-codec expression fallback paths in serialize_physical_expr_with_converter and parse_physical_expr_with_converter.
  • Add a regression test proving a nested extension expression can share the same decoded DynamicFilterPhysicalExpr::Inner via deduplication.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
datafusion/proto/src/physical_plan/mod.rs Extends PhysicalExtensionCodec expr-level API to accept proto_converter and documents intended usage.
datafusion/proto/src/physical_plan/to_proto.rs Forwards the active converter into codec.try_encode_expr from the extension-expression fallback path.
datafusion/proto/src/physical_plan/from_proto.rs Forwards the active converter into codec.try_decode_expr when decoding extension expressions.
datafusion/proto/tests/cases/roundtrip_physical_plan.rs Adds a proto-roundtrip test validating deduplication works across an extension-expr boundary; updates an existing codec impl for the new signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/proto/src/physical_plan/mod.rs
…pr / try_decode_expr

Closes apache#22920.

apache#21807 introduced the `DynamicFilterPhysicalExpr` dedup pipeline so that
identical references on the wire reconstruct to one shared `Arc<Inner>`
on the decode side via `expression_id` cache keys. The pipeline works
end-to-end for plans built from upstream types, and apache#22011 hooked it
through Sort/Aggregate/HashJoin plan codecs.

However, the pipeline stops at the `PhysicalExtensionCodec` boundary.
Downstream users with custom file sources or plan extension types
encode/decode their nested `PhysicalExprNode` fields via
`try_encode_expr` / `try_decode_expr`, which did not receive the active
`PhysicalProtoConverterExtension`. The only entry points available
inside such a codec were the free `serialize_physical_expr` /
`parse_physical_expr` helpers, hardwired to
`DefaultPhysicalProtoConverter`. So any expression inside an extension
codec's serialized blob got `expr_id = None` on the wire, and the
deduplicating deserializer's cache never hit for refs that crossed an
extension boundary -- even when an outer `DeduplicatingProtoConverter`
was in effect on the rest of the plan. Two references to the same
`DynamicFilterPhysicalExpr` (one inside an extension expr, one outside)
reconstructed as distinct `Inner` allocations, so heap-max updates from
a TopK SortExec failed to propagate to a FileScan predicate carried by
a custom file source.

This matches the existing plan-level pattern (`try_encode` and
`try_decode` already take a `proto_converter` parameter, added when the
plan codec was wired into the converter pipeline). This change extends
the expr-level methods the same way:

  fn try_decode_expr(
      &self,
      buf: &[u8],
      inputs: &[Arc<dyn PhysicalExpr>],
      proto_converter: &dyn PhysicalProtoConverterExtension,  // new
  ) -> Result<Arc<dyn PhysicalExpr>>;

  fn try_encode_expr(
      &self,
      node: &Arc<dyn PhysicalExpr>,
      buf: &mut Vec<u8>,
      proto_converter: &dyn PhysicalProtoConverterExtension,  // new
  ) -> Result<()>;

The two internal call sites in `serialize_physical_expr_with_converter`
and `parse_physical_expr_with_converter` are updated to thread the
active converter through. Downstream codecs can now route nested
expression encode/decode through `proto_converter.physical_expr_to_proto`
/ `proto_converter.proto_to_physical_expr`, picking up dedup
automatically.

This is a breaking change for downstream codecs that override
`try_encode_expr` / `try_decode_expr` -- they will need to add the
new parameter (and may pass `_proto_converter` if they don't carry
nested expressions). Codecs that only override `try_encode` /
`try_decode` (plan-level) are unaffected. Wire format is unchanged
(`PhysicalExprNode.expr_id` is unchanged); the fix is purely on the
Rust trait API.

Test: a new `extension_codec_expr_participates_in_deduplication`
proto-roundtrip test constructs a `BinaryExpr` whose left operand is a
bare `DynamicFilterPhysicalExpr` and right operand is a custom
`WrapperExpr` whose extension codec embeds the same dynamic filter
inside its serialized blob. After a `DeduplicatingProtoConverter`
roundtrip, an `update()` applied to the bare-side decoded filter is
observed by `current()` on the wrapped-side decoded filter, proving
both refs back the same `Inner`. Without the converter-aware extension
codec, this test fails because the two refs end up with distinct
`Inner` allocations.
@adriangb

Copy link
Copy Markdown
Contributor

I want to take a deep look at this. It points out a real problem but I worry that there are like 4 different traits now going around... and I want to make sure we pass the right one / the API shapes are what we need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Follow-up to #21807: thread PhysicalProtoConverterExtension into PhysicalExtensionCodec expression-level methods

3 participants