-
Notifications
You must be signed in to change notification settings - Fork 12
Add script for posting nigthly benchmark results #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| This script operates on the parsed output of the benchmark runner. The | ||
| expected directory structure is: | ||
|
|
||
| ../velox-testing-benchmark-viz/2026/01/28/01/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct directory structure to target? Or is there an earlier, raw-er version we should use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the current format of the runner on that particular cluster (it will generate a commit under the date directory). Not all of our benchmarking is consistent with such a structure.
Standardizing on one directory structure is something we need to do, and hopefully we can get there and then layer this on top.
| │ └── slurm-4575179.out | ||
| └── result_dir | ||
| ├── benchmark_cold.json | ||
| ├── benchmark_full.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, I'm only using benchmark_full.json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine. IIRC the others are just subsets of the full data for organization's sake. Currently that benchmark_full.json is generated from a branch misiug/cluster, but if that's the format we want it should not be hard to get that upstream.
Whatever output we want to generate should be in that same directory, although we can alter the format to whatever would best fit with the database API.
| --sku-name PDX-H100 \ | ||
| --storage-configuration-name pdx-lustre-sf-100 \ | ||
| --cache-state warm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values are currently required.
The --sku-name and --storage-configuration-name needs to match a name in the database, so I think it's OK to require them.
--cache-state warm is probably incorrect, and not needed to be provided here. The DB also has a field for is_os_page_cache_cleared at the query log level. I'd recommend using that, but we need to somehow know whether or not the os page cache is actually cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a PR up to make it so that the run_benchmark.sh script will automatically drop caches before running benchmarks (unless told otherwise): #189. With that we should be able to assume the first iteration of every benchmark is always "lukewarm" and any subsequent iterations will be "warm". For true "cold" runs we would need to run each query as a separate run, or add a "cold" option to the benchmark script.
| data = json.loads(file_path.read_text()) | ||
|
|
||
| keys = list(data) | ||
| if keys != ["tpch"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results_dir/benchmark_full.json file looks like
❯ head result_dir/benchmark_full.json
{
"tpch": {
"raw_times_ms": {
"Q1": [
3982,
1509,
1497,
1315,
1498
],
Are there any other value we should be prepared for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be all for now. We usually just use these base values and then calculate the statistics from them. The exception would be NULL values if any errors occured.
| # Generate or use provided identifier hash | ||
| if identifier_hash is None: | ||
| identifier_hash = generate_identifier_hash(benchmark_metadata.timestamp, engine) | ||
|
|
||
| # Use placeholders for version info if not provided | ||
| if version is None: | ||
| version = "unknown" | ||
| if commit_hash is None: | ||
| commit_hash = "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to sort out these version fields.
The identifier_hash is extremely important, since it helps uniquely identify the engine / software being run. I'd recommend using something like the sha256 digest of the container image, if we have one available. Do we?
commit_hash is the commit of the "primary" software, so presumably velox-cudf? Can we get that? Likewise for version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we store all our benchmarked images in the GitLab registry, we should have both a Manifest digest and a Configuration digest we can use.
For presto-gpu we are probably going to need to include a unique identifier for both the presto and velox version (as they can be mixed/matched). If we only use the velox-cudf hash then we could theoretically have images built with the same hash that could vary substantially from different presto commits. For our images, we are planning on tagging them with both hashes. Would we be able to fit both hashes into the entry here (something like <presto_hash>-<velox_hash>)?
| { | ||
| "query_name": query_name, | ||
| "execution_order": execution_order, | ||
| "runtime_ms": 0.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love putting 0.0 here (someone will forget to filter to only successes and include 0 in an aggregation). Check if that's allowed to be None in the API / DB.
misiugodfrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, Tom.
I think most of this should work as-is. The only issue is that we haven't standardized some of this expected output in upstream main yet (the output this is based on is from an experimental set of branches). This should be easy now that we have an upstream schema to store in.
I think we just need to standardize our output and then sync this on top.
| │ └── slurm-4575179.out | ||
| └── result_dir | ||
| ├── benchmark_cold.json | ||
| ├── benchmark_full.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine. IIRC the others are just subsets of the full data for organization's sake. Currently that benchmark_full.json is generated from a branch misiug/cluster, but if that's the format we want it should not be hard to get that upstream.
Whatever output we want to generate should be in that same directory, although we can alter the format to whatever would best fit with the database API.
| data = json.loads(file_path.read_text()) | ||
|
|
||
| keys = list(data) | ||
| if keys != ["tpch"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be all for now. We usually just use these base values and then calculate the statistics from them. The exception would be NULL values if any errors occured.
|
|
||
| Expects coordinator.config and worker.config files. | ||
| """ | ||
| coordinator_config = parse_config_file(configs_dir / "coordinator.config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our presto environments these files are going to be in a configs directory under <configs>/etc_worker/config_native.properties and <configs>/etc_coordinator/config_native.properties
| # Generate or use provided identifier hash | ||
| if identifier_hash is None: | ||
| identifier_hash = generate_identifier_hash(benchmark_metadata.timestamp, engine) | ||
|
|
||
| # Use placeholders for version info if not provided | ||
| if version is None: | ||
| version = "unknown" | ||
| if commit_hash is None: | ||
| commit_hash = "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we store all our benchmarked images in the GitLab registry, we should have both a Manifest digest and a Configuration digest we can use.
For presto-gpu we are probably going to need to include a unique identifier for both the presto and velox version (as they can be mixed/matched). If we only use the velox-cudf hash then we could theoretically have images built with the same hash that could vary substantially from different presto commits. For our images, we are planning on tagging them with both hashes. Would we be able to fit both hashes into the entry here (something like <presto_hash>-<velox_hash>)?
| This script operates on the parsed output of the benchmark runner. The | ||
| expected directory structure is: | ||
|
|
||
| ../velox-testing-benchmark-viz/2026/01/28/01/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the current format of the runner on that particular cluster (it will generate a commit under the date directory). Not all of our benchmarking is consistent with such a structure.
Standardizing on one directory structure is something we need to do, and hopefully we can get there and then layer this on top.
| --sku-name PDX-H100 \ | ||
| --storage-configuration-name pdx-lustre-sf-100 \ | ||
| --cache-state warm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a PR up to make it so that the run_benchmark.sh script will automatically drop caches before running benchmarks (unless told otherwise): #189. With that we should be able to assume the first iteration of every benchmark is always "lukewarm" and any subsequent iterations will be "warm". For true "cold" runs we would need to run each query as a separate run, or add a "cold" option to the benchmark script.
This adds a script for posting the results of a benchmark run to a database. Each run would result in
Here's a sample of the document we'd post:
{ "sku_name": "pdx-h100", "storage_configuration_name": "test", "benchmark_definition_name": "tpch-100", "cache_state": "warm", "query_engine": { "engine_name": "velox", "identifier_hash": "121e1e62b5426195", "version": "unknown", "commit_hash": "unknown" }, "run_at": "2026-01-28T18:38:02+00:00", "node_count": 1, "query_logs": [ { "query_name": "1", "execution_order": 0, "runtime_ms": 3982.0, "status": "success", "extra_info": { "execution_number": 1 } }, { "query_name": "1", "execution_order": 1, "runtime_ms": 1509.0, "status": "success", "extra_info": { "execution_number": 2 } // repeated for each execution of each query } ], "concurrency_streams": 1, "engine_config": { "coordinator": { "coordinator": "true", "node-scheduler.include-coordinator": "false", "http-server.http.port": "9200", "discovery-server.enabled": "true", "discovery.uri": "http://gpu-h100-0017:9200" // ... }, "worker": { "coordinator": "false", "http-server.http.port": "9000", "discovery.uri": "http://gpu-h100-0017:9200", "presto.version": "testversion", "system-memory-gb": "240", "query-memory-gb": "228", // ... } }, "extra_info": { "kind": "single-node", "gpu_count": 1, "gpu_name": "H100", "num_drivers": 2, "worker_image": "presto-native-worker-gpu", "execution_number": 1 }, "is_official": false }