[Data] Support non-mutating add_transform_fns and use it for limit wrapping#61525
Open
weimingdiit wants to merge 2 commits intoray-project:masterfrom
Open
Conversation
…apping MapOperator._wrap_transformer_with_limit previously rebuilt a new MapTransformer manually to append the per-block limit transform without mutating the original transformer. That duplicated internal wiring (init_fn and output_block_size_option_override) and left a TODO to move the behavior into MapTransformer itself. This change updates MapTransformer.add_transform_fns to support both modes: - Default (backward-compatible): modify in place. - New mode: modify_in_place=False returns a new transformer with the appended transforms, leaving the original untouched. Implementation details: - add_transform_fns now computes combined transforms once. - In non-mutating mode, it uses a fast clone path via __new__ to avoid re-running __init__ and unnecessary recombination. - The cloned transformer preserves _init_fn and _output_block_size_option_override, and resets _udf_time_s. Then _wrap_transformer_with_limit is simplified to call: add_transform_fns([limit_transform_fn], modify_in_place=False) Results: - Removes the TODO. - Centralizes append logic in MapTransformer. - Avoids call-site manual reconstruction and mutation risk. - Keeps existing behavior unchanged for current in-place callers. Signed-off-by: weimingdiit <weimingdiit@gmail.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a non-mutating mode to MapTransformer.add_transform_fns, simplifying the implementation of _wrap_transformer_with_limit in MapOperator. No security vulnerabilities were found. It is suggested to improve the maintainability of the cloning logic in MapTransformer by using copy.copy() instead of manually copying attributes.
python/ray/data/_internal/execution/operators/map_transformer.py
Outdated
Show resolved
Hide resolved
Signed-off-by: weimingdiit <weimingdiit@gmail.com>
Author
|
@slfan1989 @alexeykudinkin @TimothySeah Hi, could you help review this PR? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
MapOperator._wrap_transformer_with_limit previously rebuilt a new MapTransformer manually to append the per-block limit transform without mutating the original transformer. That duplicated internal wiring (init_fn and output_block_size_option_override) and left a TODO to move the behavior into MapTransformer itself.
Related issues
#61524
Additional information
This change updates MapTransformer.add_transform_fns to support both modes:
Implementation details:
Then _wrap_transformer_with_limit is simplified to call: add_transform_fns([limit_transform_fn], modify_in_place=False)
Results: