Shuffle memory pool accounting: peak RSS significantly exceeds configured memory limit

## Description

When running the shuffle benchmark with a configured memory pool limit, the actual peak RSS (resident set size) significantly exceeds the configured limit. This suggests there are significant memory allocations happening outside of the memory pool accounting.

## Benchmark Data

**Setup:** TPCH SF100 lineitem (100M rows, 16 columns), hash partitioning (200 partitions), lz4 compression, single iteration.

| Memory Limit | Peak RSS | RSS / Limit | Write Time | Throughput |
|---|---|---|---|---|
| 2 GB | 3.8 GB | 1.78x | 43.4s | 2.30M rows/s |
| 4 GB | 6.7 GB | 1.56x | 45.1s | 2.22M rows/s |
| 8 GB | 11.0 GB | 1.28x | 46.4s | 2.15M rows/s |
| 16 GB | 13.1 GB | 0.76x | 51.9s | 1.93M rows/s |

## Observations

1. **Peak RSS exceeds the memory limit by up to 1.78x.** At a 2 GB memory limit, the process uses 3.8 GB of RSS. This means nearly half the memory usage is untracked by the memory pool.

2. **Overhead outside the pool grows with the limit.** Subtracting the memory limit from peak RSS: 2g has ~1.8 GB overhead, 4g has ~2.7 GB, 8g has ~3.0 GB. This suggests some internal structures scale with available memory rather than being fixed-size.

3. **Higher memory limits are slower, not faster.** The 2 GB run (43.4s) is 20% faster than the 16 GB run (51.9s). Smaller pools may keep the working set more cache-friendly, while larger buffers may cause more memory pressure at flush time.

## Expected Behavior

The memory pool limit should more closely bound the total process memory usage. Untracked allocations (Arrow buffers, Parquet reader state, partition writer buffers, etc.) should either be accounted for in the pool or documented as expected overhead.

## Reproduction

```sh
cargo build --release --bin shuffle_bench --features shuffle-bench

/usr/bin/time -l ./native/target/release/shuffle_bench \
    --input /opt/tpch/sf100/lineitem \
    --codec lz4 \
    --partitions 200 \
    --hash-columns 0,3 \
    --memory-limit 2147483648 \
    --limit 100000000
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuffle memory pool accounting: peak RSS significantly exceeds configured memory limit #3821

Description

Benchmark Data

Observations

Expected Behavior

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory Limit	Peak RSS	RSS / Limit	Write Time	Throughput
2 GB	3.8 GB	1.78x	43.4s	2.30M rows/s
4 GB	6.7 GB	1.56x	45.1s	2.22M rows/s
8 GB	11.0 GB	1.28x	46.4s	2.15M rows/s
16 GB	13.1 GB	0.76x	51.9s	1.93M rows/s

Shuffle memory pool accounting: peak RSS significantly exceeds configured memory limit #3821

Description

Description

Benchmark Data

Observations

Expected Behavior

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions