Skip to content

Migrate PhysicalExprAdapter to unified CastExpr and remove CastColumnExpr usage#21493

Open
kosiew wants to merge 5 commits intoapache:mainfrom
kosiew:cast-03-20164
Open

Migrate PhysicalExprAdapter to unified CastExpr and remove CastColumnExpr usage#21493
kosiew wants to merge 5 commits intoapache:mainfrom
kosiew:cast-03-20164

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Apr 9, 2026

Which issue does this PR close?


Rationale for this change

The current adapter emits CastColumnExpr, duplicating functionality already provided by CastExpr. Maintaining two cast representations introduces unnecessary complexity, branching logic, and potential inconsistencies in behavior depending on where casts are constructed.

With recent improvements to CastExpr (field-aware casting), it is now capable of preserving logical field metadata, nullability, and type semantics. This enables the adapter to emit a single, unified cast representation.

This change simplifies the expression layer, reduces maintenance overhead, and ensures consistent casting behavior across the execution pipeline.


What changes are included in this PR?

  • Replace all usages of CastColumnExpr in schema_rewriter.rs with CastExpr.

  • Remove the create_cast_column_expr helper and inline its logic using CastExpr::new_with_target_field.

  • Add validation via validate_data_type_compatibility before constructing cast expressions.

  • Improve rewrite logic:

    • Avoid unnecessary rewrites when both index and field match.
    • Allow direct column substitution when fields match but index differs.
  • Ensure physical column resolution is based on column name rather than index.

  • Update tests to:

    • Assert usage of CastExpr instead of CastColumnExpr.
    • Validate inner column resolution and target field correctness.
    • Verify logical metadata and nullability propagation via return_field.
    • Improve robustness by checking expression structure instead of string equality.
  • Add helper assertions for validating cast expressions in tests.


Are these changes tested?

Yes.

  • Existing adapter and schema evolution tests have been updated to use CastExpr.

  • New assertions validate:

    • Correct physical column resolution by name.
    • Proper wrapping of expressions in CastExpr when required.
    • Preservation of logical schema metadata and nullability.
    • Correct structure of rewritten expressions.
  • Regression coverage added for stale column index scenarios.


Are there any user-facing changes?

No direct user-facing changes.

This is an internal refactor that unifies cast expression handling. However, it improves consistency and correctness of schema evolution and expression rewriting, which may indirectly benefit users.


LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 5 commits April 9, 2026 10:29
Update adapter rewriter to emit field-aware CastExpr instead of
CastColumnExpr in schema_rewriter.rs. Rename helper from
create_cast_column_expr to create_cast_expr. Adjust adapter tests
to validate unified CastExpr behavior and maintain target_field()
metadata. Modify one test to check nullability via return_field()
rather than the old wrapper’s nullability approach.
Change create_cast_expr to accept physical DataType instead of
FieldRef to align with the unified CastExpr adapter. Replace
the outdated helper-only regression test with
test_rewrite_resolves_physical_column_by_name_before_casting.
This new test verifies that the name-based resolution still
correctly addresses stale column indices prior to building the
cast expression.
Inline cast helper in rewrite_column and streamline the column/field
matching logic for clarity. Simplify the construction of results in
resolve_physical_column. Reduce test duplication with local helpers
for CastExpr and inner Column assertions. Utilize a helper adapter
factory to reuse stale-index test setup. Simplify metadata/nullability
test by asserting the return_field() contract, and replace complex
string-based expectations with precise structural assertions.
Simplify the control flow in the rewrite_column fast path with
straightforward if checks around fields_match. Replace
make_stale_index_cast_adapter() with a reusable
stale_index_cast_schemas() fixture, allowing tests to choose
between schemas or an adapter as needed.
- Refactored the allocation of `physical_field` for better clarity.
- Enhanced the readability of the metadata assertions in tests for improved code understanding.
@kosiew kosiew marked this pull request as ready for review April 9, 2026 03:02
@kosiew kosiew requested a review from adriangb April 9, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant