Skip to content

[Data] Support non-mutating add_transform_fns and use it for limit wrapping#61525

Open
weimingdiit wants to merge 2 commits intoray-project:masterfrom
weimingdiit:data/map-transformer-non-mutating-add-transform-fns
Open

[Data] Support non-mutating add_transform_fns and use it for limit wrapping#61525
weimingdiit wants to merge 2 commits intoray-project:masterfrom
weimingdiit:data/map-transformer-non-mutating-add-transform-fns

Conversation

@weimingdiit
Copy link

@weimingdiit weimingdiit commented Mar 5, 2026

Description

MapOperator._wrap_transformer_with_limit previously rebuilt a new MapTransformer manually to append the per-block limit transform without mutating the original transformer. That duplicated internal wiring (init_fn and output_block_size_option_override) and left a TODO to move the behavior into MapTransformer itself.

Related issues

#61524

Additional information

This change updates MapTransformer.add_transform_fns to support both modes:

  • Default (backward-compatible): modify in place.
  • New mode: modify_in_place=False returns a new transformer with the appended transforms, leaving the original untouched.

Implementation details:

  • add_transform_fns now computes combined transforms once.
  • In non-mutating mode, it uses a fast clone path via new to avoid re-running init and unnecessary recombination.
  • The cloned transformer preserves _init_fn and _output_block_size_option_override, and resets _udf_time_s.

Then _wrap_transformer_with_limit is simplified to call: add_transform_fns([limit_transform_fn], modify_in_place=False)

Results:

  • Removes the TODO.
  • Centralizes append logic in MapTransformer.
  • Avoids call-site manual reconstruction and mutation risk.
  • Keeps existing behavior unchanged for current in-place callers.

…apping

MapOperator._wrap_transformer_with_limit previously rebuilt a new
MapTransformer manually to append the per-block limit transform without
mutating the original transformer. That duplicated internal wiring
(init_fn and output_block_size_option_override) and left a TODO to move
the behavior into MapTransformer itself.

This change updates MapTransformer.add_transform_fns to support both
modes:

- Default (backward-compatible): modify in place.
- New mode: modify_in_place=False returns a new transformer with the
  appended transforms, leaving the original untouched.

Implementation details:
- add_transform_fns now computes combined transforms once.
- In non-mutating mode, it uses a fast clone path via __new__ to avoid
  re-running __init__ and unnecessary recombination.
- The cloned transformer preserves _init_fn and
  _output_block_size_option_override, and resets _udf_time_s.

Then _wrap_transformer_with_limit is simplified to call:
add_transform_fns([limit_transform_fn], modify_in_place=False)

Results:
- Removes the TODO.
- Centralizes append logic in MapTransformer.
- Avoids call-site manual reconstruction and mutation risk.
- Keeps existing behavior unchanged for current in-place callers.

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a non-mutating mode to MapTransformer.add_transform_fns, simplifying the implementation of _wrap_transformer_with_limit in MapOperator. No security vulnerabilities were found. It is suggested to improve the maintainability of the cloning logic in MapTransformer by using copy.copy() instead of manually copying attributes.

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit marked this pull request as ready for review March 5, 2026 12:37
@weimingdiit weimingdiit requested a review from a team as a code owner March 5, 2026 12:37
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Mar 5, 2026
@weimingdiit
Copy link
Author

@slfan1989 @alexeykudinkin @TimothySeah Hi, could you help review this PR?

@weimingdiit weimingdiit changed the title [Data] Support non-mutating add_transform_fns and use it for limit wr… [Data] Support non-mutating add_transform_fns and use it for limit wrapping Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant