Skip to content

Conversation

@TomAugspurger
Copy link

This adds a script for posting the results of a benchmark run to a database. Each run would result in

  1. A new "QueryEngine" configuration, which identifies the exact software versions used to run the benchmark
  2. A new "BenchmarkRun", which identifies the hardware, query engine, storage configuration, benchmark definition, query engine configuration, and more
  3. Many new "QueryLogs", one per individual query iteration executed

Here's a sample of the document we'd post:

{
  "sku_name": "pdx-h100",
  "storage_configuration_name": "test",
  "benchmark_definition_name": "tpch-100",
  "cache_state": "warm",
  "query_engine": {
    "engine_name": "velox",
    "identifier_hash": "121e1e62b5426195",
    "version": "unknown",
    "commit_hash": "unknown"
  },
  "run_at": "2026-01-28T18:38:02+00:00",
  "node_count": 1,
  "query_logs": [
    {
      "query_name": "1",
      "execution_order": 0,
      "runtime_ms": 3982.0,
      "status": "success",
      "extra_info": {
        "execution_number": 1
      }
    },
    {
      "query_name": "1",
      "execution_order": 1,
      "runtime_ms": 1509.0,
      "status": "success",
      "extra_info": {
        "execution_number": 2
      }
      // repeated for each execution of each query
    }
  ],
  "concurrency_streams": 1,
  "engine_config": {
    "coordinator": {
      "coordinator": "true",
      "node-scheduler.include-coordinator": "false",
      "http-server.http.port": "9200",
      "discovery-server.enabled": "true",
      "discovery.uri": "http://gpu-h100-0017:9200"
      // ...
    },
    "worker": {
      "coordinator": "false",
      "http-server.http.port": "9000",
      "discovery.uri": "http://gpu-h100-0017:9200",
      "presto.version": "testversion",
      "system-memory-gb": "240",
      "query-memory-gb": "228",
      // ...
    }
  },
  "extra_info": {
    "kind": "single-node",
    "gpu_count": 1,
    "gpu_name": "H100",
    "num_drivers": 2,
    "worker_image": "presto-native-worker-gpu",
    "execution_number": 1
  },
  "is_official": false
}

This script operates on the parsed output of the benchmark runner. The
expected directory structure is:

../velox-testing-benchmark-viz/2026/01/28/01/
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct directory structure to target? Or is there an earlier, raw-er version we should use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the current format of the runner on that particular cluster (it will generate a commit under the date directory). Not all of our benchmarking is consistent with such a structure.

Standardizing on one directory structure is something we need to do, and hopefully we can get there and then layer this on top.

│ └── slurm-4575179.out
└── result_dir
├── benchmark_cold.json
├── benchmark_full.json
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I'm only using benchmark_full.json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. IIRC the others are just subsets of the full data for organization's sake. Currently that benchmark_full.json is generated from a branch misiug/cluster, but if that's the format we want it should not be hard to get that upstream.

Whatever output we want to generate should be in that same directory, although we can alter the format to whatever would best fit with the database API.

Comment on lines +30 to +32
--sku-name PDX-H100 \
--storage-configuration-name pdx-lustre-sf-100 \
--cache-state warm
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are currently required.

The --sku-name and --storage-configuration-name needs to match a name in the database, so I think it's OK to require them.

--cache-state warm is probably incorrect, and not needed to be provided here. The DB also has a field for is_os_page_cache_cleared at the query log level. I'd recommend using that, but we need to somehow know whether or not the os page cache is actually cleared.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR up to make it so that the run_benchmark.sh script will automatically drop caches before running benchmarks (unless told otherwise): #189. With that we should be able to assume the first iteration of every benchmark is always "lukewarm" and any subsequent iterations will be "warm". For true "cold" runs we would need to run each query as a separate run, or add a "cold" option to the benchmark script.

data = json.loads(file_path.read_text())

keys = list(data)
if keys != ["tpch"]:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results_dir/benchmark_full.json file looks like

❯ head result_dir/benchmark_full.json 
{
  "tpch": {
    "raw_times_ms": {
      "Q1": [
        3982,
        1509,
        1497,
        1315,
        1498
      ],

Are there any other value we should be prepared for here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be all for now. We usually just use these base values and then calculate the statistics from them. The exception would be NULL values if any errors occured.

Comment on lines +275 to +283
# Generate or use provided identifier hash
if identifier_hash is None:
identifier_hash = generate_identifier_hash(benchmark_metadata.timestamp, engine)

# Use placeholders for version info if not provided
if version is None:
version = "unknown"
if commit_hash is None:
commit_hash = "unknown"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to sort out these version fields.

The identifier_hash is extremely important, since it helps uniquely identify the engine / software being run. I'd recommend using something like the sha256 digest of the container image, if we have one available. Do we?

commit_hash is the commit of the "primary" software, so presumably velox-cudf? Can we get that? Likewise for version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we store all our benchmarked images in the GitLab registry, we should have both a Manifest digest and a Configuration digest we can use.

For presto-gpu we are probably going to need to include a unique identifier for both the presto and velox version (as they can be mixed/matched). If we only use the velox-cudf hash then we could theoretically have images built with the same hash that could vary substantially from different presto commits. For our images, we are planning on tagging them with both hashes. Would we be able to fit both hashes into the entry here (something like <presto_hash>-<velox_hash>)?

{
"query_name": query_name,
"execution_order": execution_order,
"runtime_ms": 0.0,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love putting 0.0 here (someone will forget to filter to only successes and include 0 in an aggregation). Check if that's allowed to be None in the API / DB.

Copy link
Contributor

@misiugodfrey misiugodfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, Tom.

I think most of this should work as-is. The only issue is that we haven't standardized some of this expected output in upstream main yet (the output this is based on is from an experimental set of branches). This should be easy now that we have an upstream schema to store in.

I think we just need to standardize our output and then sync this on top.

│ └── slurm-4575179.out
└── result_dir
├── benchmark_cold.json
├── benchmark_full.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. IIRC the others are just subsets of the full data for organization's sake. Currently that benchmark_full.json is generated from a branch misiug/cluster, but if that's the format we want it should not be hard to get that upstream.

Whatever output we want to generate should be in that same directory, although we can alter the format to whatever would best fit with the database API.

data = json.loads(file_path.read_text())

keys = list(data)
if keys != ["tpch"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be all for now. We usually just use these base values and then calculate the statistics from them. The exception would be NULL values if any errors occured.


Expects coordinator.config and worker.config files.
"""
coordinator_config = parse_config_file(configs_dir / "coordinator.config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our presto environments these files are going to be in a configs directory under <configs>/etc_worker/config_native.properties and <configs>/etc_coordinator/config_native.properties

Comment on lines +275 to +283
# Generate or use provided identifier hash
if identifier_hash is None:
identifier_hash = generate_identifier_hash(benchmark_metadata.timestamp, engine)

# Use placeholders for version info if not provided
if version is None:
version = "unknown"
if commit_hash is None:
commit_hash = "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we store all our benchmarked images in the GitLab registry, we should have both a Manifest digest and a Configuration digest we can use.

For presto-gpu we are probably going to need to include a unique identifier for both the presto and velox version (as they can be mixed/matched). If we only use the velox-cudf hash then we could theoretically have images built with the same hash that could vary substantially from different presto commits. For our images, we are planning on tagging them with both hashes. Would we be able to fit both hashes into the entry here (something like <presto_hash>-<velox_hash>)?

This script operates on the parsed output of the benchmark runner. The
expected directory structure is:

../velox-testing-benchmark-viz/2026/01/28/01/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the current format of the runner on that particular cluster (it will generate a commit under the date directory). Not all of our benchmarking is consistent with such a structure.

Standardizing on one directory structure is something we need to do, and hopefully we can get there and then layer this on top.

Comment on lines +30 to +32
--sku-name PDX-H100 \
--storage-configuration-name pdx-lustre-sf-100 \
--cache-state warm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR up to make it so that the run_benchmark.sh script will automatically drop caches before running benchmarks (unless told otherwise): #189. With that we should be able to assume the first iteration of every benchmark is always "lukewarm" and any subsequent iterations will be "warm". For true "cold" runs we would need to run each query as a separate run, or add a "cold" option to the benchmark script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants