Description
BenchGen currently generates executable benchmarks that manipulate data structures but do not produce any observable output in normal (non-debug) mode. As a consequence:
- There is no easy way to validate that two compiled versions of the same benchmark behave equivalently.
- Differential testing across compilers, optimization levels, or languages is harder than necessary.
- Benchmark executions are effectively “opaque”: they exercise computation, but produce no result.
- Compiler optimizations might remove large chunks of code, provided that the compiler can conclude that it's dead code.
In contrast, tools like CSmith address this by computing a deterministic hash (e.g., MD5) over program state at the end of execution and printing it, enabling correctness checks without affecting control flow.
Motivation
BenchGen is explicitly positioned as a complement to fuzzers like CSmith, focusing on performance evaluation rather than bug finding. However, the lack of observable output limits its usefulness in several scenarios:
- Compiler validation: ensuring that aggressive optimizations do not silently miscompile benchmarks.
- Cross-language comparisons: checking semantic equivalence between C, C++, Julia, Go, etc.
- Regression testing: detecting unintended semantic changes in BenchGen itself or in its language backends.
Currently, debug mode prints a trace of operations (NEW, INSERT, CONTAINS, REMOVE, etc.), but this is not suitable for optimized builds or performance experiments.
Proposed Solution
Introduce a deterministic global execution hash, inspired by CSmith, that is:
- Updated during program execution.
- Printed once at the end of
main.
- Independent of external libraries (important for multi-language support).
High-level idea
- BenchGen generates a global variable (e.g.,
uint64_t exec_hash).
- Certain behavioral operations (notably
contains) update this variable deterministically.
- At the end of
main, the final hash value is printed.
For example, the current C code generated for contains looks like:
for (int i = 0; i < array2->size; i++) {
if (array2->data[i] == 62) {
return array2;
}
}
This could be extended to something like:
for (int i = 0; i < array2->size; i++) {
if (array2->data[i] == 62) {
// Update global execution hash using array2->data[i]
exec_hash = hash_update(exec_hash, array2->data[i]);
return array2;
}
}
At the end of execution:
int main(int argc, char** argv) {
// Initialize exec_hash
// ... generated benchmark code ...
printf("%llu\n", exec_hash);
return 0;
}
Hash function
-
The hash does not need to be cryptographic (MD5 is only a conceptual reference).
-
It should be:
- Deterministic
- Simple
- Easy to reimplement across languages (C, C++, Julia, Go, Rust, etc.)
-
A lightweight rolling hash (e.g., xor + multiplication) would likely suffice.
The key requirement is semantic observability, not security. Below we have an example:
#include <stdint.h>
#include <stdio.h>
/* Global execution hash */
static uint64_t exec_hash;
/*
* Simple, deterministic hash update.
* - No library calls
* - Easy to reimplement in other languages
* - Cheap enough to not distort performance measurements
*/
static inline void hash_update_u64(uint64_t value) {
/* Large odd constant (same across languages) */
const uint64_t MUL = 0x9e3779b97f4a7c15ULL; // 2^64 / golden ratio
exec_hash ^= value + MUL + (exec_hash << 6) + (exec_hash >> 2);
}
If we use this example, then contains would generate code like:
for (int i = 0; i < array2->size; i++) {
if (array2->data[i] == 62) {
/* Record the observed value in the global execution hash */
hash_update_u64((uint64_t)array2->data[i]);
return array2;
}
}
Why contains?
contains is a natural candidate because:
- It observes data without mutating it.
- It already branches on concrete values.
- It exists uniformly across supported data structures and languages.
That said, the mechanism could later be generalized to other operations if desired.
Benefits
- Enables semantic equivalence checking across compilers and languages.
- Improves BenchGen’s suitability for compiler correctness experiments.
- Preserves existing performance characteristics (hash updates are cheap and local).
- Aligns BenchGen more closely with established practices in compiler testing tools like CSmith.
Open questions
- Which behavior blocks should update the hash (only
contains, or others)?
- Should this be optional (e.g., enabled via a flag)?
- Preferred default hash width (32-bit vs 64-bit)?
Description
BenchGen currently generates executable benchmarks that manipulate data structures but do not produce any observable output in normal (non-debug) mode. As a consequence:
In contrast, tools like CSmith address this by computing a deterministic hash (e.g., MD5) over program state at the end of execution and printing it, enabling correctness checks without affecting control flow.
Motivation
BenchGen is explicitly positioned as a complement to fuzzers like CSmith, focusing on performance evaluation rather than bug finding. However, the lack of observable output limits its usefulness in several scenarios:
Currently, debug mode prints a trace of operations (NEW, INSERT, CONTAINS, REMOVE, etc.), but this is not suitable for optimized builds or performance experiments.
Proposed Solution
Introduce a deterministic global execution hash, inspired by CSmith, that is:
main.High-level idea
uint64_t exec_hash).contains) update this variable deterministically.main, the final hash value is printed.For example, the current C code generated for
containslooks like:This could be extended to something like:
At the end of execution:
Hash function
The hash does not need to be cryptographic (MD5 is only a conceptual reference).
It should be:
A lightweight rolling hash (e.g., xor + multiplication) would likely suffice.
The key requirement is semantic observability, not security. Below we have an example:
If we use this example, then
containswould generate code like:Why
contains?containsis a natural candidate because:That said, the mechanism could later be generalized to other operations if desired.
Benefits
Open questions
contains, or others)?