Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ae0741b
feat: add standalone shuffle benchmark binary for profiling
andygrove Mar 21, 2026
9b5b305
feat: add --limit option to shuffle benchmark (default 1M rows)
andygrove Mar 21, 2026
e1ab490
perf: apply limit during parquet read to avoid scanning all files
andygrove Mar 21, 2026
b7682f4
feat: move shuffle_bench binary into shuffle crate
andygrove Mar 23, 2026
ca36cbd
chore: add comment explaining parquet/rand deps in shuffle crate
andygrove Mar 23, 2026
7225afd
Merge remote-tracking branch 'apache/main' into shuffle-bench-binary
andygrove Mar 26, 2026
6e8bed2
perf: add max_buffered_batches config and stream shuffle bench from p…
andygrove Mar 26, 2026
16ce30f
merge apache/main, remove max_buffered_batches changes
andygrove Mar 27, 2026
2ef57e7
cargo fmt
andygrove Mar 27, 2026
9136e10
prettier
andygrove Mar 27, 2026
7e16819
machete
andygrove Mar 27, 2026
e7a3661
feat: add immediate-mode shuffle partitioner
andygrove Mar 28, 2026
4e6c026
perf: optimize immediate shuffle with single-take-then-slice
andygrove Mar 28, 2026
97b0fe0
feat: make immediate shuffle mode the default
andygrove Mar 28, 2026
7ccda15
refactor: rename Default shuffle mode to Buffered
andygrove Mar 28, 2026
fb426cf
perf: replace per-partition temp files with in-memory buffers
andygrove Mar 28, 2026
705130d
feat: add memory accounting and spilling to immediate shuffle
andygrove Mar 28, 2026
862d01e
refactor: encapsulate buffer access and extract shared index writer
andygrove Mar 28, 2026
e7d38da
fix: use per-partition take instead of full-batch reorder
andygrove Mar 28, 2026
86bae92
revert: default shuffle mode back to buffered
andygrove Mar 28, 2026
3aade30
perf: single take() + zero-copy slice() instead of per-partition take()
andygrove Mar 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions common/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,19 @@ object CometConf extends ShimCometConf {
.checkValue(v => v > 0, "Write buffer size must be positive")
.createWithDefault(1)

val COMET_NATIVE_SHUFFLE_MODE: ConfigEntry[String] =
conf(s"$COMET_EXEC_CONFIG_PREFIX.shuffle.nativeMode")
.category(CATEGORY_SHUFFLE)
.doc(
"Selects which native shuffle implementation to use for multi-partition shuffles. " +
"'buffered' buffers input batches and tracks per-partition row indices, writing all " +
"partitions at the end with memory-pressure-driven spilling. " +
"'immediate' repartitions each incoming batch using take and writes per-partition " +
"data directly to individual files, avoiding in-memory buffering of input batches.")
.stringConf
.checkValues(Set("buffered", "immediate"))
.createWithDefault("buffered")

val COMET_SHUFFLE_PREFER_DICTIONARY_RATIO: ConfigEntry[Double] = conf(
"spark.comet.shuffle.preferDictionary.ratio")
.category(CATEGORY_SHUFFLE)
Expand Down
Loading
Loading