Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions benches/engine_control/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,22 @@ This replaces the in-firmware histogram+mean approach whose mean
divisor (reader `count`) diverged from the numerator (ISR event sum)
when the sweep truncated early, invalidating the published deltas.

## Silicon-anchor protocol

Renode is the CI workhorse; **silicon captures are manual**, periodic,
and recorded directly into the repo as immutable evidence. See
[`silicon/README.md`](silicon/README.md) for the procedure, board
notes, and the `capture.sh` wrapper. Per-board configs live under
`silicon/boards/`; recorded captures land under `silicon/runs/<dated>/`
with a manifest, the firmware ELF, and the tagged events CSV.

The first supported board is the NUCLEO-G474RE (STM32G474, Cortex-M4F
+ FPU, 170 MHz) — closest production-shape silicon to the
`stm32f4_disco` Renode target. The ratio `silicon_median /
renode_median` per RPM step is what the anchor establishes; once
consistent across multiple captures it can be cited as the
Renode-silicon multiplier.

## Building

```sh
Expand Down
145 changes: 145 additions & 0 deletions benches/engine_control/silicon/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Silicon-anchor protocol — engine_control

CI runs Renode (deterministic, parallel-safe). **Silicon runs are
manual**, periodic, and hand-driven on a single shared board.
This directory contains the protocol for taking a silicon capture,
recording it as immutable evidence in the repo, and citing it as
the anchor for Renode-headlined published numbers.

## Why

Renode is per-translated-block instruction-cost simulation, not
microarchitectural simulation: no cache, no memory contention, no
pipeline modeling. The cross-Renode A/B (1.16.0 vs nightly = 0.0%
drift) ruled out simulator-version drift but did NOT rule out
Renode being systematically off vs real silicon by a fixed
multiplier. The silicon anchor settles that.

The relationship `silicon_cycles / renode_cycles = R` is what the
silicon anchor establishes. Once `R` is consistent across
multiple silicon captures over time, it can be cited as the
Renode-silicon multiplier for that bench/board combination.

## Recorded-run-in-git protocol

Every silicon run lives in `silicon/runs/<YYYY-MM-DD>-<board>-<gale-sha>-<variant>/`
and contains:

- `output.csv` — the raw UART capture (firmware-emitted)
- `events.csv` — same data, tagged through `tag_events.py`
- `manifest.txt` — board, MCU, clock, rustc/cargo versions, gale
commit SHA, ELF sha256, capture timestamp
- `firmware.elf` — the exact binary that produced the capture
- `firmware.elf.sha256` — checksum file

These directories are **immutable** once committed. To re-run the
same capture, create a new dated directory; never overwrite an
existing one. This makes any silicon citation in a blog post or
report point to a stable git URL.

CSV row counts are small (~50–500 KB per run, ~7,750 rows long
sweep). At one capture per board per major bench-relevant commit,
the repo growth is modest.

## Boards

| Board | Status | Anchors |
|---|---|---|
| `nucleo_g474re` (STM32G474RE, Cortex-M4F, 170 MHz) | scaffold ready | the existing Renode `stm32f4_disco` Cortex-M numbers |
| `esp32c3_devkit_rust1` (ESP32-C3, RV32IMC, 160 MHz) | not started | the *future* RISC-V Renode lane (separate work) |

## Capture procedure (NUCLEO-G474RE)

Hardware:
- Hardware: STMicroelectronics NUCLEO-G474RE
- Connection: USB to host (ST-Link integrated, virtual COM port at 115200 8N1)
- Programming: `west flash` via OpenOCD or pyOCD (ST-Link backend)

Host setup (one-time):
- Zephyr SDK with `arm-zephyr-eabi` toolchain
- OpenOCD or pyOCD installed (`brew install open-ocd` on macOS, or `apt install openocd`)
- Python with `pyserial` for the capture script: `pip3 install pyserial`

To take a baseline capture (stock Zephyr):

```sh
cd $GALE_ROOT
bash benches/engine_control/silicon/capture.sh \
--board nucleo_g474re \
--variant baseline \
--sweep long
```

To take a gale capture:

```sh
bash benches/engine_control/silicon/capture.sh \
--board nucleo_g474re \
--variant gale \
--sweep long
```

Both invocations:

1. Build the firmware locally (no Bazel; `west build -b <board>`).
2. Compute the firmware ELF sha256.
3. Flash via `west flash`.
4. Open the board's USB CDC serial port and read until `=== END ===`
(default timeout: 30 minutes for `--sweep long`).
5. Generate `manifest.txt` from the build environment + capture
metadata.
6. Tag the raw output through `tag_events.py` (run-id auto-derived
from the date + board).
7. Write everything into a new `silicon/runs/<dir>/`.

The capture script does not commit. After both variants are
captured and you've eyeballed `output.csv` for sanity, commit:

```sh
git add benches/engine_control/silicon/runs/<YYYY-MM-DD>-nucleo_g474re-*-{baseline,gale}/
git commit -m "silicon: NUCLEO-G474RE anchor at gale@<short-sha>"
```

## Comparing silicon vs Renode

Once `silicon/runs/<dated-dir>-{baseline,gale}/` exist, run:

```sh
python3 benches/engine_control/analyze.py \
--baseline silicon/runs/<dir-baseline>/events.csv \
--gale silicon/runs/<dir-gale>/events.csv \
--runs 1 \
> /tmp/silicon-comparison.md
```

The analyzer renders the same baseline-vs-gale tables as for
Renode, but the metadata in the report header carries through the
silicon-run identifiers. Compare side-by-side with the Renode CI
output for the same gale SHA — the **ratio** `silicon_median /
renode_median` per RPM step is the calibration data.

If you want a single-call Renode-vs-silicon side-by-side rendering,
that's a planned analyzer extension (`--silicon-anchor <events.csv>`)
to be added once the first capture exists to test against.

## Anchor cadence

- One silicon capture per board per major bench-relevant gale
commit (e.g., when overhead compensation lands, when synth
pipeline changes, when a primitive's hot-path is rewritten).
- Each Renode-headlined publication cites the most recent matching
anchor by stable git URL.
- Three to four anchor points per board per year is enough to
claim the Renode-silicon relationship is monotonic.

## Don't

- Don't overwrite an existing `runs/<dated-dir>/` — start a new one.
- Don't combine pre-overhead-compensation and post-overhead-
compensation captures in the same comparison table; they're
different measurements (see `../SCOPE.md`).
- Don't claim WCET from silicon captures. Worst-case-observed is
not WCET. Same rule as the synthetic bench (see `../SCOPE.md`).
- Don't run silicon captures from a branch that isn't reproducible
(uncommitted changes). The manifest captures the working-tree
state, not just HEAD.
72 changes: 72 additions & 0 deletions benches/engine_control/silicon/boards/nucleo_g474re/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# NUCLEO-G474RE — silicon-anchor board notes

## Hardware

- **Board:** STMicroelectronics NUCLEO-G474RE
- **MCU:** STM32G474RET6 (Cortex-M4F + FPU + DSP, 170 MHz)
- **Memory:** 512 KB Flash, 128 KB RAM
- **Cycle counter:** DWT_CYCCNT (same as Cortex-M4F on `stm32f4_disco`)
- **Programmer:** integrated ST-Link/V3E over USB; exposes virtual
COM port for stdout
- **Upstream Zephyr support:** `nucleo_g474re` (already in the tree)

## Why this board for the anchor

Cortex-M4F + FPU at 170 MHz is the closest production-shape silicon
to the simulated `stm32f4_disco` (also Cortex-M4F + FPU at 168 MHz).
The architectural variables held constant between the synthetic and
silicon measurements are:

- ARMv7E-M instruction set (Thumb-2)
- DWT_CYCCNT cycle counter (same width, same definition)
- 3-stage in-order pipeline
- Single-cycle MUL, hardware DIV, single-precision FPU

What differs:

- Real cache effects (none on Cortex-M4 — no D-cache; flash
prefetch buffer behavior visible)
- Real bus arbitration with non-existent peripherals on this bench
(negligible — no DMA, no peripheral activity)
- Clock 170 vs 168 MHz (1.2% — accountable directly)

So the cycle ratio `silicon / renode` for `algo` and `handoff`
should be near 1.0 in steady state. Anything materially off is
information about Renode's cycle model, not about the silicon.

## Connection

USB cable from NUCLEO USB connector (CN1) to host. The ST-Link
virtual COM port appears as:

- macOS: `/dev/cu.usbmodem*`
- Linux: `/dev/ttyACM0`

Zephyr's default for this board uses LPUART1 for stdout, exposed
through ST-Link.

## Programming

`west flash` from a build directory works out of the box:

```sh
west flash -d /tmp/eng-nucleo-baseline
```

Default backend is OpenOCD. To force pyOCD:

```sh
west flash -d /tmp/eng-nucleo-baseline --runner pyocd
```

## Clock / cycle counter notes

On the G4 family, `k_cycle_get_32()` returns `SCB_DWT->CYCCNT`
directly, same as on F4. `sys_clock_hw_cycles_per_sec()` returns
the bus clock the cycle counter ticks at — verify this matches
170 MHz at runtime by reading the boot banner before relying on
absolute ns conversions.

## Known issues

None yet — populate as captures happen.
15 changes: 15 additions & 0 deletions benches/engine_control/silicon/boards/nucleo_g474re/prj.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# NUCLEO-G474RE — engine_control bench overlay
#
# Empty for now: Zephyr's nucleo_g474re defaults give us:
# - 170 MHz HCLK (PLL'd up)
# - LPUART1 console at 115200 8N1 via ST-Link VCP
# - DWT_CYCCNT enabled (Cortex-M4 default in Zephyr)
#
# Add overlay options here only if a future capture exposes a
# default that biases the measurement (e.g. interrupt priority of
# a peripheral we don't use; tickless idle behavior; etc.).
#
# Anything board-specific that *must* be on for the silicon
# measurement to be valid goes here. Anything project-wide
# (gale module enable, sweep size) stays in the main prj.conf
# overlay or the CMake invocation.
91 changes: 91 additions & 0 deletions benches/engine_control/silicon/capture.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#!/usr/bin/env python3
"""Cross-platform UART capture for the silicon-anchor protocol.

Reads lines from a serial port until either a sentinel line
(default '=== END ===') appears, the byte budget is exhausted, or
the wall-clock timeout fires. Writes the raw stream to stdout (or
to a file with --out).

Designed to be invoked by capture.sh — keep this script's
dependencies minimal: stdlib + pyserial.

Usage:
capture.py --port /dev/cu.usbmodem11403 --baud 115200 \\
--sentinel '=== END ===' --timeout 1800 \\
--out output.csv
"""
from __future__ import annotations

import argparse
import sys
import time

try:
import serial # type: ignore
except ImportError:
sys.stderr.write(
"ERROR: pyserial not installed. Run: pip3 install pyserial\n")
sys.exit(2)


def main() -> int:
p = argparse.ArgumentParser()
p.add_argument("--port", required=True,
help="serial device path (e.g. /dev/cu.usbmodem11403)")
p.add_argument("--baud", type=int, default=115200,
help="baud rate (default 115200)")
p.add_argument("--sentinel", default="=== END ===",
help="line marking end-of-capture")
p.add_argument("--timeout", type=int, default=1800,
help="wall-clock timeout in seconds (default 1800)")
p.add_argument("--out", default="-",
help="output path or '-' for stdout (default '-')")
p.add_argument("--max-bytes", type=int, default=64 * 1024 * 1024,
help="byte-budget ceiling (default 64 MiB)")
args = p.parse_args()

out = sys.stdout if args.out == "-" else open(args.out, "w")
deadline = time.monotonic() + args.timeout
bytes_written = 0
sentinel_seen = False

try:
# serial timeout = 1s so we wake periodically to check the
# wall-clock budget even if the firmware is silent.
ser = serial.Serial(args.port, args.baud, timeout=1)
except serial.SerialException as e:
sys.stderr.write(f"ERROR opening {args.port}: {e}\n")
return 3

try:
while time.monotonic() < deadline and bytes_written < args.max_bytes:
line_bytes = ser.readline()
if not line_bytes:
continue # serial timeout, loop back to check deadline
try:
line = line_bytes.decode("utf-8", errors="replace")
except Exception:
line = line_bytes.decode("latin-1", errors="replace")
out.write(line)
out.flush()
bytes_written += len(line_bytes)
if line.rstrip("\r\n") == args.sentinel:
sentinel_seen = True
break
finally:
ser.close()
if out is not sys.stdout:
out.close()

if not sentinel_seen:
sys.stderr.write(
f"WARN: sentinel '{args.sentinel}' not seen "
f"(timeout={args.timeout}s, bytes={bytes_written})\n")
return 1
sys.stderr.write(
f"OK: sentinel seen at {bytes_written} bytes\n")
return 0


if __name__ == "__main__":
raise SystemExit(main())
Loading
Loading