Automatically drop caches when running benchmarks #189

misiugodfrey · 2026-01-24T20:58:09Z

Altered run_benchmark.sh to automatically drop caches when running (can be turned off via option).

Also added a script to run benchmarks with multiple configurations - useful for comparing benchmark settings quickly.

… misiug/jinjaDockerCompose

….sh. Argument -g or --gpu-ids allows to pass in a comma delimited set of GPU ids. This way more that one developer can run a multi GPU cluster on the same 8 GPU server. For example one dev can set -g 0,1,2,3 and another -g 4,5,6,7

…lox-testing into misiug/tempjinja

…tempjinja

misiugodfrey · 2026-01-24T21:02:32Z

presto/scripts/generate_presto_config.sh

-  # make cudf.exchange=true if we are running multiple workers
+    # make cudf.exchange=true if we are running multiple workers
    sed -i "s+cudf.exchange=false+cudf.exchange=true+g" ${worker_config}/config_native.properties
+    if [[ -n ${MEMORY_PERCENT} ]]; then


Memory percent is something that IBM usually sets to 50% in their multi-worker runs. If nothing else we should be experimenting with this limit (default is 0).

paul-aiyedun

Changes overall look good to me. However, I think cache dropping should be integrated into the benchmarking test suite.

paul-aiyedun · 2026-01-28T18:39:40Z

presto/scripts/run_benchmark.sh


+echo "Dropping cache"
+if [[ -z ${SKIP_DROP_CACHE} ]]; then
+    dropcache;


Can we have cache dropping be driven by the performance test suite (i.e. logic in presto/testing/performance_benchmarks/common_fixtures.py), similar to what we do for profiling? This should allow for an easier extension for cold runs.

paul-aiyedun · 2026-01-28T18:41:36Z

presto/scripts/run_multiple_benchmarks.sh

+KVIKIO_ARRAY=(8)
+DRIVERS_ARRAY=(2)
+WORKERS_ARRAY=(1)
+SCHEMA_ARRAY=(sf1k_64mb)


I think we should remove this default value.

paul-aiyedun · 2026-01-28T18:42:07Z

presto/scripts/run_multiple_benchmarks.sh

+                exit 1
+            fi
+            ;;
+        -s|--schema)


--schemas

paul-aiyedun · 2026-01-28T18:45:43Z

presto/scripts/run_multiple_benchmarks.sh

+  done
+}
+
+export PRESTO_DATA_DIR=/raid/ocs_benchmark_data/tpch/experimental


This should probably be set by a parameter if not set already.

paul-aiyedun · 2026-01-28T18:50:12Z

presto/scripts/run_multiple_benchmarks.sh

+                for memory in "${MEMORY_ARRAY[@]}"; do
+                    echo "Running combo: num_workers = $workers, kvikio_threads = $kvikio, num_drivers = $drivers, schema = $schema, memory = $memory"
+                    ./start_native_gpu_presto.sh -w $workers --kvikio-threads $kvikio --num-drivers $drivers --memory-percent $memory
+                    ./run_benchmark.sh -b tpch -s ${schema} --tag "${schema}_${workers}w_${drivers}d_${kvikio}k_${memory}m_dropcache"


Consider ${schema}_${workers}wk_${drivers}dr_${kvikio}kv_${memory}mp. The dropcache suffix should no longer be needed after extending the benchmark script to drop caches by default.

paul-aiyedun · 2026-01-28T18:52:32Z

presto/scripts/start_presto_helper_parse_args.sh

 export NUM_WORKERS=1
 export KVIKIO_THREADS=8
 export VCPU_PER_WORKER=""
+export MEMORY_PERCENT=""


I think config generation/updates should be separate from building/deploying the server, but we can revisit this later.

mbrobbel

Maybe we should/can also use the following endpoints?



    GET: /v1/operation/server/clearCache?type=memory: It clears the memory cache on worker node. Here is an example:

    curl -X GET "http://localhost:7777/v1/operation/server/clearCache?type=memory"

    Cleared memory cache

    GET: /v1/operation/server/clearCache?type=ssd: It clears the ssd cache on worker node. Here is an example:

    curl -X GET "http://localhost:7777/v1/operation/server/clearCache?type=ssd"

    Cleared ssd cache

    GET: /v1/operation/server/writeSsd: It writes data from memory cache to the ssd cache on worker node. Here is an example:

    curl -X GET "http://localhost:7777/v1/operation/server/writeSsd"

    Succeeded write ssd cache

misiugodfrey and others added 30 commits November 19, 2025 10:59

Added support for multiple gpu workers on the same node

0ad9ed7

Changed scripts to use services instead of all

a496b61

fixed worker id

4a21359

Merge branch 'main' into misiug/MultiNodeDockerContainers

bf7ac7d

Replace docker-compose.native-gpu.yml with a jinja template

36f5019

Merge branch 'main' of https://github.com/rapidsai/velox-testing into…

d2d85d3

… misiug/jinjaDockerCompose

undo worker port change

5f4395d

fixed mis-merge

0b7662e

removed dead code

97eb901

remove --remove-orphans

7df3fa9

add missing hive.properties for new config gen

97e1a90

adjustments to support exchange

5b918c2

copied @misiugodfrey ucx installation changes

75e389b

added cudf.exchange.server.port

9a03e72

skip cluster-tag for multiple workers

6dae676

bind nsys to older version because of QdstrmImporter crashes

5fe9e00

move duplicate_worker_configs to generate_presto_config.sh

dad4324

fix minor profile path issue on failure

a2ee8a1

Add multi-gpu profiler support

39e7796

add profile environment variables

acdfc7a

enable cudf.exchange

9c21575

Merge branch 'main' into misiug/jinjaDockerCompose

413fde4

ucx already installed in presto/prestissimo-dependency:centos9 image

8c09809

add ucx related config and environment variables

8677a74

add presto-function-server-executable.jar fix

81c371d

add 2 more config, join partitioned, disable async data cache

b1a3e95

rewrote to run multiple presto workers in single container

fe84284

commented out GDS settings in devices

96bb73a

skip fail if generated yml is not found in stop_presto.sh

1574fda

misiugodfrey and others added 23 commits January 20, 2026 16:27

--single-container option to allow both varaints

c245060

dash to underscore

13ba62b

correct single-container false

9a0cf4d

print single cont

bfdb8f0

single-container bool

ccc6e39

add missing colon

766ec76

default to one container

6c5e6b6

fix typo

8673e46

duplicate worker configs every time

7fb0ba6

Reserve only assigned gpus

e1ec703

Join gpu_ids

bb0e702

backslash

75e8eff

join gpu_ids

c3f1133

fix docker template

82ca005

PR feedback

d12a649

copy env

b34d985

Add lukewarm times and sum to benchmark report

eef91bb

Merge branch 'paul/update_benchmark_script' of github.com:rapidsai/ve…

38cf62e

…lox-testing into misiug/tempjinja

Write text report to file

a55fb7c

Add schema name and tag to benchmark report

7f22774

Merge branch 'paul/update_benchmark_script' of github.com:rapidsai/ve…

36a0195

…lox-testing into misiug/tempjinja

automatically drop cache and add multiple-benchmark runner script

4897104

Merge branch 'main' of github.com:rapidsai/velox-testing into misiug/…

7f3beac

…tempjinja

misiugodfrey requested review from karthikeyann and paul-aiyedun January 24, 2026 20:58

misiugodfrey commented Jan 24, 2026

View reviewed changes

fix formatting

62f2a6d

paul-aiyedun reviewed Jan 28, 2026

View reviewed changes

mbrobbel reviewed Feb 2, 2026

View reviewed changes

misiugodfrey mentioned this pull request Feb 4, 2026

Add script for posting nigthly benchmark results #203

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically drop caches when running benchmarks #189

Automatically drop caches when running benchmarks #189

misiugodfrey commented Jan 24, 2026

Uh oh!

misiugodfrey Jan 24, 2026

Uh oh!

paul-aiyedun left a comment

Uh oh!

paul-aiyedun Jan 28, 2026

Uh oh!

paul-aiyedun Jan 28, 2026

Uh oh!

paul-aiyedun Jan 28, 2026

Uh oh!

paul-aiyedun Jan 28, 2026

Uh oh!

paul-aiyedun Jan 28, 2026

Uh oh!

paul-aiyedun Jan 28, 2026

Uh oh!

mbrobbel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Automatically drop caches when running benchmarks #189

Are you sure you want to change the base?

Automatically drop caches when running benchmarks #189

Conversation

misiugodfrey commented Jan 24, 2026

Uh oh!

misiugodfrey Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun left a comment

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

mbrobbel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants