Skip to content

string_agg: introduce deferred copying for GroupsAccumulator with mixed eager/deferred paths#21469

Draft
kosiew wants to merge 13 commits intoapache:mainfrom
kosiew:deferredcopying-02-21156
Draft

string_agg: introduce deferred copying for GroupsAccumulator with mixed eager/deferred paths#21469
kosiew wants to merge 13 commits intoapache:mainfrom
kosiew:deferredcopying-02-21156

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Apr 8, 2026

Which issue does this PR close?


Rationale for this change

The current StringAggGroupsAccumulator eagerly copies string data during update_batch, which can lead to unnecessary memory duplication and increased CPU overhead—especially for large payloads and high-cardinality group sets.

This PR introduces a hybrid approach that selectively defers copying by retaining references to input batches and materializing results only during evaluate(). This aligns with the proposal in #21156 while balancing implementation complexity and performance.

Key motivations:

  • Reduce memory overhead by avoiding premature string materialization
  • Improve performance for large strings and many groups
  • Maintain efficiency for small payloads via an eager fast path

What changes are included in this PR?

1. Hybrid accumulation strategy

  • Introduces eager and deferred execution paths

  • Uses heuristics to decide when to switch:

    • DEFER_GROUP_THRESHOLD (number of groups)
    • DEFER_PAYLOAD_LEN_THRESHOLD (average string size)

2. Deferred storage model

  • Adds DeferredRows structure:

    • Stores input ArrayRefs (Arc-backed)
    • Tracks (group_idx, row_idx) pairs instead of copying strings

3. Lazy materialization

  • Strings are only concatenated during evaluate()
  • Deferred batches are replayed into output buffers

4. Efficient batch handling

  • Introduces StringInputArray abstraction to unify:

    • Utf8
    • LargeUtf8
    • Utf8View

5. Partial emit support

  • Handles EmitTo::First(n) correctly:

    • Retains only un-emitted group entries
    • Renumbers group indices after emission

6. Memory accounting improvements

  • size() now includes:

    • Deferred batch memory
    • Entry bookkeeping
    • Capacity-based allocations

7. Refactoring and utilities

  • Extracted helpers:

    • append_batch_typed
    • append_batch_values_typed
    • append_rows_typed
  • Added clear_state() for accurate memory reset

8. Tests

  • Added coverage for:

    • Mixed eager + deferred execution
    • Threshold-based promotion behavior

Are these changes tested?

Yes.

New tests include:

  • groups_mixed_eager_and_deferred_batches

    • Validates correctness when transitioning between eager and deferred modes
  • groups_short_payloads_do_not_promote_to_deferred

    • Ensures small payloads remain on the eager path

These tests verify:

  • Correct aggregation output
  • Proper state retention across partial emits
  • Promotion logic correctness

Are there any user-facing changes?

No direct user-facing API changes.

However, users may observe:

  • Improved performance for large string aggregations
  • Reduced memory usage in high-cardinality scenarios

No breaking changes are introduced.


LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.


kosiew added 6 commits April 7, 2026 23:17
Optimize StringAggGroupsAccumulator to retain input and state
batches with metadata instead of building a Vec<Option<String>>
on every update. Assemble concatenated strings lazily in
evaluate() and state(). Adjust size() to reflect retained
arrays and metadata. Support EmitTo::First(n) by
emitting the required prefix and renumbering retained groups.
Include note for future mixed-batch compaction work.
Remove unnecessary &mut self from append_rows. Consolidate
repeated string-append loop into a typed private helper using
ArrayAccessor. Eliminate redundant runtime null checks in favor
of non-null entry invariant with debug_assert!. Simplify
retain_after_emit into a single filter-and-renumber pass. Trim
local ceremony in evaluate() and state() for clarity.
Consolidate string-like array routing through a single
StringInputArray abstraction to improve maintainability.
Rename the slot appender to append_group_value for
better readability of the lazy-assembly path.
Update append_rows_typed and append_batch_values_typed to
accept array references instead of values. Modify call sites
in StringInputArray to pass references, improving memory
efficiency and consistency across function calls.
Adjust string_agg to implement a hybrid accumulator, offering
eager updates for lightweight workloads and switching to
deferred row tracking for larger batches. This change
enhances performance while maintaining efficiency.
Included mixed-mode regression tests to cover various
batch scenarios and ensure correctness.
@github-actions github-actions bot added the functions Changes to functions implementation label Apr 8, 2026
@kosiew
Copy link
Copy Markdown
Contributor Author

kosiew commented Apr 8, 2026

Benchmark (#21437)

              Criterion Benchmark Summary (Statistically Significant Changes)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark                                                                                     ┃ Mean Change ┃  P-value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ aggregate_query_approx_percentile_cont_on_f32                                                 │      -4.93% │ 0.000000 │
│ aggregate_query_approx_percentile_cont_on_u64                                                 │      -4.68% │ 0.000000 │
│ aggregate_query_distinct_median                                                               │      -1.59% │ 0.000000 │
│ aggregate_query_group_by                                                                      │      -4.48% │ 0.000000 │
│ aggregate_query_group_by_u64 15 12                                                            │      -1.34% │ 0.000000 │
│ aggregate_query_group_by_u64_multiple_keys                                                    │      -3.66% │ 0.000000 │
│ aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expressions                       │      -5.07% │ 0.000000 │
│ (aggregate_query_group_by_wide_u64_and_f32_without_aggregate_expr)                            │             │          │
│ aggregate_query_group_by_wide_u64_and_string_without_aggregate_expressions                    │      -4.94% │ 0.000000 │
│ (aggregate_query_group_by_wide_u64_and_string_without_aggregate_e)                            │             │          │
│ aggregate_query_group_by_with_filter                                                          │      -3.93% │ 0.000000 │
│ aggregate_query_no_group_by_count_distinct_wide                                               │      -4.73% │ 0.000000 │
│ aggregate_query_no_group_by_min_max_f64                                                       │      -3.28% │ 0.000000 │
│ array_agg_query_group_by_few_groups                                                           │      -3.85% │ 0.000000 │
│ array_agg_query_group_by_many_groups                                                          │      -3.06% │ 0.000000 │
│ array_agg_query_group_by_mid_groups                                                           │      -5.56% │ 0.000000 │
│ array_agg_struct_query_group_by_mid_groups                                                    │      -3.55% │ 0.000000 │
│ first_last_ignore_nulls                                                                       │      -1.65% │ 0.000000 │
│ first_last_many_columns                                                                       │      -1.76% │ 0.000000 │
│ first_last_one_column                                                                         │      -4.00% │ 0.000000 │
│ string_agg_payloads/few_groups/large_1024b (large_1024b)                                      │      -4.48% │ 0.000000 │
│ string_agg_payloads/few_groups/medium_64b (medium_64b)                                        │      -2.78% │ 0.000000 │
│ string_agg_payloads/few_groups/small_3b (small_3b)                                            │      -4.02% │ 0.000000 │
│ string_agg_payloads/many_groups/large_1024b (large_1024b)                                     │      -2.12% │ 0.000000 │
│ string_agg_payloads/many_groups/medium_64b (medium_64b)                                       │      -5.19% │ 0.000000 │
│ string_agg_payloads/many_groups/small_3b (small_3b)                                           │      -4.37% │ 0.000000 │
│ string_agg_payloads/mid_groups/large_1024b (large_1024b)                                      │      -4.01% │ 0.000000 │
│ string_agg_payloads/mid_groups/medium_64b (medium_64b)                                        │     -12.10% │ 0.000000 │
└───────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┴──────────┘

Summary: 26 improvements, 0 regressions (p < 0.05)

@kosiew kosiew force-pushed the deferredcopying-02-21156 branch 2 times, most recently from 51ac58a to baa8054 Compare April 8, 2026 14:20
@github-actions github-actions bot added core Core DataFusion crate and removed core Core DataFusion crate labels Apr 8, 2026
@kosiew kosiew changed the title Hybrid eager/deferred accumulation for string_agg GroupsAccumulator to reduce copying and memory usage string_agg: introduce deferred copying for GroupsAccumulator with mixed eager/deferred paths Apr 9, 2026
kosiew added 6 commits April 9, 2026 14:37
Eliminate repeated match arms in string_agg.rs by introducing a local
dispatch macro. This enhances clarity and readability, allowing each
method to focus on intent while simplifying maintenance for future
changes. The refactor preserves existing static dispatch behavior,
ensuring that all targeted tests continue to pass.
Rework the string_agg accumulator to initiate in eager mode,
reducing unnecessary allocations. Restore an eager append
helper for the hot path and enhance promotion logic to use
lightweight size estimates from Arrow buffers. This allows
short payloads to remain on the eager path while enabling
deferred copying for larger batches.

Add regression tests to ensure short payloads do not
promote and mixed eager/deferred batches operate correctly.
@kosiew kosiew force-pushed the deferredcopying-02-21156 branch from 80352ce to ca5d685 Compare April 9, 2026 06:37
… todo

- Revised comment to indicate a future task for compacting mixed batches in the StringAggGroupsAccumulator implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant