Skip to content

Commit 090ab51

Browse files
committed
Merge branch 'medium-bench'
2 parents 3a1d8e5 + 03d39a6 commit 090ab51

5 files changed

Lines changed: 72057 additions & 43 deletions

File tree

benches/README.md

Lines changed: 47 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ cargo bench json
4040
The argument to `cargo bench` is a substring match against the full benchmark
4141
names of the form `{size}_{format}/{source}`.
4242

43-
- **size**: `small` (see below)
43+
- **size**: `small` or `medium` (see below)
4444
- **format**: A full format name as given to xt's `-f` or `-t` (e.g. `json`)
4545
- **source**: `buffer` (non-streaming) or `reader` (streaming)
4646

@@ -50,22 +50,58 @@ benchmark run, including charts and comparisons with any previous run.
5050

5151
## Test Inputs
5252

53-
The small input, `k8s-job.json`, is a simple Kubernetes `Job` that runs the
53+
Each benchmark loads test data into an in-memory buffer by translating a
54+
"default" version of the input with xt. This approach limits the size of the xt
55+
repository and ensures that disk I/O performance doesn't influence the results.
56+
However, it allows changes to xt's output formatting (e.g. whitespace, quoting)
57+
to influence the results. I expect such changes to be rare, at least compared
58+
to other changes whose impact is worth benchmarking.
59+
60+
### Small
61+
62+
The small input, `k8s-job.yaml`, is a simple Kubernetes `Job` that runs the
5463
Docker `hello-world` image. Translation time is usually a few microseconds for
5564
even the slowest input formats, so each benchmark runs in just a few seconds.
5665
This provides relatively fast feedback as you work.
5766

67+
### Medium
68+
69+
The medium input, `k8s-kyverno.yaml`, is a full set of Kubernetes manifests for
70+
deploying [Kyverno][kyverno] v1.16.2, generated from version 3.6.2 of the
71+
official chart using Helm v4.1.0 on `darwin/arm64`:
72+
73+
```sh
74+
helm template kyverno kyverno/kyverno \
75+
--version 3.6.2 \
76+
--set admissionController.replicas=1 \
77+
--set backgroundController.replicas=1 \
78+
--set reportsController.replicas=1 \
79+
--set cleanupController.replicas=1 \
80+
--set webhooksCleanup.image.pullPolicy=IfNotPresent
81+
```
82+
83+
To ensure TOML compatibility:
84+
85+
1. The above `--set` options were chosen to eliminate all `null` values.
86+
2. The benchmark harness processes the raw Helm output by turning the stream of
87+
YAML documents into a single object, with a single `manifests` field
88+
containing an array of the documents. It does this by creating a small
89+
MessagePack "header" to set up the object structure and type-length marker
90+
for an array, then translating the YAML documents with xt. It then
91+
translates the complete object to the final format for benchmarking.
92+
93+
The strategy for generating the medium input is intended to be reproducible and
94+
auditable. The size of the input was chosen to balance space requirements for
95+
an xt repository checkout with the desire to avoid non-human-readable encodings.
96+
97+
### Large (removed)
98+
5899
The benchmarks previously included a 20 - 30 MB large input based on a sample of
59100
GitHub events, which was included in the xt repository (and remains in its
60101
history) as a Zstandard compressed archive of MessagePack data. Based on the
61102
reveal of the xz-utils backdoor that was obfuscated in part as compressed test
62-
data, **I have chosen to temporarily eliminate the large benchmarks** until they
63-
are reimplemented to rely exclusively on human-readable inputs, ideally without
64-
bloating the size of xt repository checkouts.
103+
data, **I have chosen to eliminate the large benchmarks** until they are
104+
reimplemented to rely exclusively on human-readable inputs, ideally without
105+
bloating the size of xt repository checkouts too much.
65106

66-
Each benchmark loads test data into an in-memory buffer by translating a
67-
"default" version of the input with xt. This approach reduces the size of the xt
68-
repository and ensures that disk I/O performance does not influence the
69-
benchmark results. However, it allows changes to xt's output formatting
70-
(whitespace, quoting, etc.) to influence the results. I expect such changes to
71-
be rare, at least compared to other changes whose impact is worth benchmarking.
107+
[kyverno]: https://kyverno.io/

benches/criterion.rs

Lines changed: 72 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,24 @@
11
use std::hint::black_box;
2+
use std::time::Duration;
23

34
use criterion::{Criterion, criterion_group, criterion_main};
45

56
use xt::Format;
67

7-
criterion_main!(small);
8+
criterion_main!(small, medium);
89

910
criterion_group! {
1011
name = small;
1112
config = Criterion::default();
1213
targets = small_json, small_yaml, small_toml, small_msgpack
1314
}
1415

16+
criterion_group! {
17+
name = medium;
18+
config = Criterion::default().measurement_time(Duration::from_secs(20));
19+
targets = medium_json, medium_yaml, medium_toml, medium_msgpack
20+
}
21+
1522
macro_rules! xt_benchmark {
1623
(
1724
name = $name:ident;
@@ -73,13 +80,71 @@ xt_benchmark! {
7380
}
7481

7582
fn load_small_data(format: Format) -> Vec<u8> {
76-
// The Kubernetes Job expands to a few hundred bytes regardless of format.
77-
load_test_data(include_bytes!("k8s-job.json"), format, 512)
83+
let input: &[u8] = include_bytes!("k8s-job.yaml");
84+
85+
let mut output = Vec::with_capacity(512);
86+
xt::translate_slice(input, Some(Format::Yaml), format, &mut output)
87+
.expect("k8s-job.yaml should be valid YAML");
88+
89+
output
7890
}
7991

80-
fn load_test_data(input: &[u8], format: Format, capacity: usize) -> Vec<u8> {
81-
let mut output = Vec::with_capacity(capacity);
82-
xt::translate_slice(input, Some(Format::Json), format, &mut output)
83-
.expect("failed to translate test data");
92+
xt_benchmark! {
93+
name = medium_json;
94+
sources = buffer, reader;
95+
loader = load_medium_data;
96+
translation = Format::Json => Format::Msgpack;
97+
}
98+
99+
xt_benchmark! {
100+
name = medium_yaml;
101+
sources = buffer, reader;
102+
loader = load_medium_data;
103+
translation = Format::Yaml => Format::Json;
104+
}
105+
106+
xt_benchmark! {
107+
name = medium_toml;
108+
sources = buffer;
109+
loader = load_medium_data;
110+
translation = Format::Toml => Format::Json;
111+
}
112+
113+
xt_benchmark! {
114+
name = medium_msgpack;
115+
sources = buffer, reader;
116+
loader = load_medium_data;
117+
translation = Format::Msgpack => Format::Json;
118+
}
119+
120+
fn load_medium_data(format: Format) -> Vec<u8> {
121+
// These manifests were generated using a `helm template` command that should be reproducible
122+
// given the correct version of the original chart.
123+
let input: &[u8] = include_bytes!("k8s-kyverno.yaml");
124+
125+
// For TOML compatibility, we need to take this stream of Kubernetes manifests and put them
126+
// into a single object. Since MessagePack doesn't use characters or indentation for structure,
127+
// it's (surprisingly) the easiest way I can think to do this.
128+
//
129+
// See https://github.com/msgpack/msgpack/blob/master/spec.md for a description of the bytes.
130+
let mut packed = Vec::new();
131+
132+
packed.push(0x81); // Map of 1 element; key and value follow.
133+
134+
packed.push(0xa9); // String of 9 characters.
135+
packed.extend(b"manifests");
136+
137+
packed.push(0xdc); // Array; 16-bit size to follow.
138+
packed.extend(79u16.to_be_bytes()); // `xt k8s-kyverno.yaml | jq -s length`
139+
140+
// The 79 elements of the array.
141+
xt::translate_slice(input, Some(Format::Yaml), Format::Msgpack, &mut packed)
142+
.expect("k8s-kyverno.yaml should be valid YAML");
143+
144+
// Now, translate that {"manifests": [...]} object to the final output format.
145+
let mut output = Vec::new();
146+
xt::translate_slice(&packed, Some(Format::Msgpack), format, &mut output)
147+
.expect("packed object should be valid");
148+
84149
output
85150
}

benches/k8s-job.json

Lines changed: 0 additions & 25 deletions
This file was deleted.

benches/k8s-job.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
apiVersion: batch/v1
3+
kind: Job
4+
metadata:
5+
name: hello-world
6+
spec:
7+
template:
8+
metadata:
9+
labels:
10+
job: hello-world
11+
spec:
12+
restartPolicy: Never
13+
containers:
14+
- name: hello-world
15+
image: docker.io/library/hello-world:latest

0 commit comments

Comments
 (0)