add macos bench

CppCXY · CppCXY · commit 9a3ddd737a8b · 2026-03-30T19:36:52.000+08:00
diff --git a/README.md b/README.md
@@ -127,41 +127,18 @@ The project is continuously validated against the official Lua 5.5 test suite. T
 ./run_benchmarks.sh
 ```
 
-The benchmark scripts cover arithmetic, control flow, coroutines, functions, iterators, metatables, strings, tables, and more. The README snapshot below comes from `run_benchmarks.ps1` on Windows with a Ryzen 7 5800X, comparing luars against native Lua 5.5 on the same machine.
+The benchmark scripts cover arithmetic, control flow, coroutines, functions, iterators, metatables, strings, tables, and more. The Windows snapshot linked below comes from `run_benchmarks.ps1` on Windows with a Ryzen 7 5800X, comparing luars against native Lua 5.5 on the same machine.
 
-On this project, Linux results are typically about 10% lower than the Windows snapshot below, while macOS tends to perform better on most workloads.
+On this project, Linux results are typically about 10% lower than the Windows snapshot, while macOS tends to perform better on most workloads.
 
 ### Benchmark Snapshot
 
-The chart below is a script-level summary from the current Windows run of `run_benchmarks.ps1`. Values are shown as `luars / native Lua * 100`, so `100` means parity with native Lua, `120` means luars is about 20% faster, and `80` means it is about 20% slower.
+The platform snapshots now live in dedicated documents:
 
-```mermaid
-xychart-beta
-    title "luars vs native Lua 5.5 on Windows (Ryzen 7 5800X)"
-    x-axis [arith, control, locals, funcs, closures, multiret, tables, tablelib, iters, math, meta, oop, coroutines, errors]
-    y-axis "Relative throughput (%)" 0 --> 160
-    bar [111, 87, 132, 92, 80, 78, 92, 118, 93, 98, 78, 92, 152, 103]
-    line [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
-```
+- [windows.md](docs/benchmarks/windows.md): the Windows snapshot with Ryzen 7 5800X
+- [macos.md](docs/benchmarks/macos.md): the macOS snapshot with an Apple M4 by @Bruce
 
-| Script | Relative throughput |
-|--------|---------------------|
-| `bench_arithmetic.lua` | 111% |
-| `bench_control_flow.lua` | 87% |
-| `bench_locals.lua` | 132% |
-| `bench_functions.lua` | 92% |
-| `bench_closures.lua` | 80% |
-| `bench_multiret.lua` | 78% |
-| `bench_tables.lua` | 92% |
-| `bench_table_lib.lua` | 118% |
-| `bench_iterators.lua` | 93% |
-| `bench_math.lua` | 98% |
-| `bench_metatables.lua` | 78% |
-| `bench_oop.lua` | 92% |
-| `bench_coroutines.lua` | 152% |
-| `bench_errors.lua` | 103% |
-
-String-heavy microbenchmarks are intentionally left out of the chart because several subtests complete too quickly on Windows timer resolution, which can produce distorted summary ratios. For full raw output, run `run_benchmarks.ps1` directly and inspect the per-subtest numbers.
+If you want the raw terminal output instead of the summarized charts, run `run_benchmarks.ps1` on Windows or `./run_benchmarks.sh` on macOS/Linux and inspect the per-subtest numbers directly.
 
 ## Cargo Features
 
diff --git a/docs/benchmarks/macos.md b/docs/benchmarks/macos.md
@@ -0,0 +1,50 @@
+# macOS Benchmark Snapshot
+
+This page summarizes the current macOS benchmark
+
+Environment:
+- Script runner: `run_benchmarks.sh`
+- Platform: macOS
+- Baseline: native Lua 5.5 on the same machine
+
+Method:
+- Values are shown as `luars / native Lua * 100`
+- Each script summary is computed as the geometric mean of the per-subtest throughput ratios parsed from [2.txt](2.txt)
+- `100` means parity with native Lua
+- `120` means luars is about 20% faster
+- `80` means luars is about 20% slower
+
+```mermaid
+xychart-beta
+    title "luars vs native Lua 5.5 on macOS"
+    x-axis [arith, control, locals, funcs, closures, multiret, tables, tablelib, iters, math, meta, oop, coroutines, errors]
+    y-axis "Relative throughput (%)" 0 --> 320
+    bar [136, 143, 143, 119, 99, 105, 141, 301, 123, 140, 121, 109, 113, 120]
+    line [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
+```
+
+| Script | Relative throughput |
+|--------|---------------------|
+| `bench_arithmetic.lua` | 136% |
+| `bench_control_flow.lua` | 143% |
+| `bench_locals.lua` | 143% |
+| `bench_functions.lua` | 119% |
+| `bench_closures.lua` | 99% |
+| `bench_multiret.lua` | 105% |
+| `bench_tables.lua` | 141% |
+| `bench_table_lib.lua` | 301% |
+| `bench_iterators.lua` | 123% |
+| `bench_math.lua` | 140% |
+| `bench_metatables.lua` | 121% |
+| `bench_oop.lua` | 109% |
+| `bench_coroutines.lua` | 113% |
+| `bench_errors.lua` | 120% |
+
+Highlights:
+- The strongest macOS win in this run is `bench_table_lib.lua`, mainly because `table.insert`, `table.remove`, `table.sort`, and `table.move` all outperform native Lua by a large margin in the raw run.
+- `bench_control_flow.lua`, `bench_locals.lua`, `bench_tables.lua`, and `bench_math.lua` also show broad wins across most subtests.
+- `bench_closures.lua` is effectively at parity in this run.
+
+String microbenchmarks from the same raw capture:
+- `bench_strings.lua`: about 99%
+- `bench_string_lib.lua`: about 120%
diff --git a/docs/benchmarks/windows.md b/docs/benchmarks/windows.md
@@ -0,0 +1,46 @@
+# Windows Benchmark Snapshot
+
+This page contains the current Windows benchmark snapshot that used to live in the main README.
+
+Environment:
+- Script runner: `run_benchmarks.ps1`
+- Platform: Windows
+- CPU: Ryzen 7 5800X
+- Baseline: native Lua 5.5 on the same machine
+
+Method:
+- Values are shown as `luars / native Lua * 100`
+- `100` means parity with native Lua
+- `120` means luars is about 20% faster
+- `80` means luars is about 20% slower
+
+```mermaid
+xychart-beta
+    title "luars vs native Lua 5.5 on Windows (Ryzen 7 5800X)"
+    x-axis [arith, control, locals, funcs, closures, multiret, tables, tablelib, iters, math, meta, oop, coroutines, errors]
+    y-axis "Relative throughput (%)" 0 --> 160
+    bar [111, 87, 132, 92, 80, 78, 92, 118, 93, 98, 78, 92, 152, 103]
+    line [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
+```
+
+| Script | Relative throughput |
+|--------|---------------------|
+| `bench_arithmetic.lua` | 111% |
+| `bench_control_flow.lua` | 87% |
+| `bench_locals.lua` | 132% |
+| `bench_functions.lua` | 92% |
+| `bench_closures.lua` | 80% |
+| `bench_multiret.lua` | 78% |
+| `bench_tables.lua` | 92% |
+| `bench_table_lib.lua` | 118% |
+| `bench_iterators.lua` | 93% |
+| `bench_math.lua` | 98% |
+| `bench_metatables.lua` | 78% |
+| `bench_oop.lua` | 92% |
+| `bench_coroutines.lua` | 152% |
+| `bench_errors.lua` | 103% |
+
+Notes:
+- This is a script-level summary from the current Windows run of `run_benchmarks.ps1`.
+- String-heavy microbenchmarks are intentionally left out of the chart because several subtests complete too quickly on Windows timer resolution, which can distort summary ratios.
+- For full raw output, run `run_benchmarks.ps1` directly and inspect the per-subtest numbers.