Add deterministic program output to BenchGen benchmarks (CSmith-style global hash)

### Description

BenchGen currently generates executable benchmarks that manipulate data structures but **do not produce any observable output** in normal (non-debug) mode. As a consequence:

* There is no easy way to validate that two compiled versions of the same benchmark behave equivalently.
* Differential testing across compilers, optimization levels, or languages is harder than necessary.
* Benchmark executions are effectively “opaque”: they exercise computation, but produce no result.
* Compiler optimizations might remove large chunks of code, provided that the compiler can conclude that it's dead code.

In contrast, tools like [CSmith](https://github.com/csmith-project/csmith) address this by computing a deterministic hash (e.g., MD5) over program state at the end of execution and printing it, enabling correctness checks without affecting control flow.

### Motivation

BenchGen is explicitly positioned as a complement to fuzzers like CSmith, focusing on *performance evaluation* rather than bug finding. However, the lack of observable output limits its usefulness in several scenarios:

* **Compiler validation**: ensuring that aggressive optimizations do not silently miscompile benchmarks.
* **Cross-language comparisons**: checking semantic equivalence between C, C++, Julia, Go, etc.
* **Regression testing**: detecting unintended semantic changes in BenchGen itself or in its language backends.

Currently, debug mode prints a trace of operations (NEW, INSERT, CONTAINS, REMOVE, etc.), but this is not suitable for optimized builds or performance experiments.

### Proposed Solution

Introduce a **deterministic global execution hash**, inspired by CSmith, that is:

* Updated during program execution.
* Printed once at the end of `main`.
* Independent of external libraries (important for multi-language support).

#### High-level idea

1. BenchGen generates a **global variable** (e.g., `uint64_t exec_hash`).
2. Certain behavioral operations (notably `contains`) update this variable deterministically.
3. At the end of `main`, the final hash value is printed.

For example, the current C code generated for `contains` looks like:

```c
for (int i = 0; i < array2->size; i++) {
  if (array2->data[i] == 62) {
    return array2;
  }
}
```

This could be extended to something like:

```c
for (int i = 0; i < array2->size; i++) {
  if (array2->data[i] == 62) {
    // Update global execution hash using array2->data[i]
    exec_hash = hash_update(exec_hash, array2->data[i]);
    return array2;
  }
}
```

At the end of execution:

```c
int main(int argc, char** argv) {
  // Initialize exec_hash
  // ... generated benchmark code ...
  printf("%llu\n", exec_hash);
  return 0;
}
```

### Hash function

* The hash **does not need to be cryptographic** (MD5 is only a conceptual reference).
* It should be:

  * Deterministic
  * Simple
  * Easy to reimplement across languages (C, C++, Julia, Go, Rust, etc.)
* A lightweight rolling hash (e.g., xor + multiplication) would likely suffice.

The key requirement is **semantic observability**, not security. Below we have an example:

```c
#include <stdint.h>
#include <stdio.h>

/* Global execution hash */
static uint64_t exec_hash;

/*
 * Simple, deterministic hash update.
 * - No library calls
 * - Easy to reimplement in other languages
 * - Cheap enough to not distort performance measurements
 */
static inline void hash_update_u64(uint64_t value) {
    /* Large odd constant (same across languages) */
    const uint64_t MUL = 0x9e3779b97f4a7c15ULL;  // 2^64 / golden ratio
    exec_hash ^= value + MUL + (exec_hash << 6) + (exec_hash >> 2);
}
```

If we use this example, then `contains` would generate code like:

```c
for (int i = 0; i < array2->size; i++) {
  if (array2->data[i] == 62) {
    /* Record the observed value in the global execution hash */
    hash_update_u64((uint64_t)array2->data[i]);
    return array2;
  }
}
```

### Why `contains`?

`contains` is a natural candidate because:

* It observes data without mutating it.
* It already branches on concrete values.
* It exists uniformly across supported data structures and languages.

That said, the mechanism could later be generalized to other operations if desired.

### Benefits

* Enables **semantic equivalence checking** across compilers and languages.
* Improves BenchGen’s suitability for **compiler correctness experiments**.
* Preserves existing performance characteristics (hash updates are cheap and local).
* Aligns BenchGen more closely with established practices in compiler testing tools like CSmith.

### Open questions

* Which behavior blocks should update the hash (only `contains`, or others)?
* Should this be optional (e.g., enabled via a flag)?
* Preferred default hash width (32-bit vs 64-bit)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deterministic program output to BenchGen benchmarks (CSmith-style global hash) #51

Description

Motivation

Proposed Solution

High-level idea

Hash function

Why `contains`?

Benefits

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add deterministic program output to BenchGen benchmarks (CSmith-style global hash) #51

Description

Description

Motivation

Proposed Solution

High-level idea

Hash function

Why contains?

Benefits

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why `contains`?