Skip to content

[SPARK-56244][PYTHON] Refine benchmark class layout in bench_eval_type.py#55040

Closed
Yicong-Huang wants to merge 4 commits intoapache:masterfrom
Yicong-Huang:SPARK-56244/refine-bench-layout
Closed

[SPARK-56244][PYTHON] Refine benchmark class layout in bench_eval_type.py#55040
Yicong-Huang wants to merge 4 commits intoapache:masterfrom
Yicong-Huang:SPARK-56244/refine-bench-layout

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

Refines the benchmark class layout in bench_eval_type.py:

  1. Move scenarios and UDF definitions into EvalType mixin classes - each mixin now owns its _scenarios, _udfs, params, and param_names, so Time/Peakmem benchmark classes are zero-copy (pass).
  2. Extract MockProtocolWriter and MockDataFactory utility classes - consolidates 17 scattered module-level helper functions into two organized classes with @staticmethod methods.

Shared scenarios between related eval types (e.g., ScalarArrow/ScalarArrowIter) use inheritance rather than cross-mixin references.

Why are the changes needed?

The current layout has significant repetition: every Time/Peakmem class duplicates _scenarios, _udfs, params, and param_names. Helper functions are scattered across the file with no clear organization.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Verified all 18 benchmark classes import correctly, have correct params/param_names, and pass smoke tests (setup + time_worker) for every eval type.

Was this patch authored or co-authored using generative AI tooling?

No

@zhengruifeng
Copy link
Contributor

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants