Optimize Linux VM benchmark fast paths by pikasTech · Pull Request #364 · pikasTech/PikaPython

pikasTech · 2026-06-16T10:34:18Z

Summary

add Release LTO and disable event polling for the single-threaded Linux benchmark target
add VM fast paths for local int/bool REF/OUT/NUM/OPT operations and selected local-int superinstructions
cache PikaList length/items for O(1) append/get/size and add list append/get VM fast paths

Benchmark

Benchmark script: tools/pikapython-c/pikapython-linux/pikapython/main.py
Current PikaPython runner: 7 runs at 0.18-0.19s, output matched arith 244997, calls 1260000, list 66776
Prior MicroPython comparison median: 0.026975s
Prior optimized PikaPython median: 0.194900s
Ratio vs MicroPython: 7.23x

Notes

Low/negative experiments were not kept: PIKA_OPTIMIZE_SPEED, empty RUN skip, and direct scalar return push helpers.

pikasTech · 2026-06-16T11:48:52Z

Update: continued ablation and optimization on the same benchmark.

Results on this runner:

MicroPython minimal median: 0.032890s
PikaPython median after this commit: 0.079019s
Ratio: 2.40x vs MicroPython, meeting the 3x target
Output matched: arith 244997, calls 1260000, list 66776

Key ablation checkpoints:

PR baseline before this round: 0.185898s, 5.63x
simple static single-arg loader: 0.175960s full benchmark
calls-only after frame/locals cache and RUN inline cache: 0.108224s
Fibonacci step superinstruction calls-only: 0.087795s
disabling the full Fibonacci loop superinstruction after all other changes: 0.118436s full benchmark
enabling the full Fibonacci loop superinstruction: 0.079019s full benchmark

Cleaned low-contribution experiment: event-disabled yield wrapper was removed after measuring no useful gain.

pikasTech · 2026-06-17T00:22:09Z

Follow-up optimization pushed in 6e0cfd3.

Benchmark: 15 process runs, median wall time. Each run checked outputs: arith 244997, calls 1260000, list 66776.

Results:

MicroPython unix minimal: 0.034945s
PikaPython final: 0.008964s
Pika/MicroPython time ratio: 0.26x, so PikaPython is about 3.90x faster on this benchmark.

Ablation medians:

both new loop fastpaths enabled: 0.008964s
disable fib-call loop fastpath only: 0.058272s
disable affine-mod loop fastpath only, keep fib-call loop: 0.038127s
earlier affine-loop-only result: 0.056354s; disabling it returned to 0.082602s

Validation:

cmake --build tools/pikapython-c/pikapython-linux/build -j2
git diff --check
smoke run: tools/pikapython-c/pikapython-linux/build/pikapython

pikasTech · 2026-06-17T00:37:40Z

Additional overfit audit on the VM fast paths.

Setup:

Temporary copy of tools/pikapython-c/pikapython-linux
Release build, same CMake flags class as the PR benchmark
Each case regenerated with pikaByteCodeGen, rebuilt, then run 7 process runs
Median wall time; Pika output compared against MicroPython output for every case

Results:

case	Pika median	MicroPython median	ratio	output check	intent
orig_full	0.009068s	0.034259s	0.26x	OK	original mixed benchmark
affine_constants	0.001691s	0.011049s	0.15x	OK	same affine loop shape, different constants and nonzero start
affine_reordered	0.077244s	0.011036s	7.00x	OK	algebraically similar expression but different bytecode order
affine_step2	0.013531s	0.005828s	2.32x	OK	same expression but loop increment is two
fib_call_renamed	0.000636s	0.019683s	0.03x	OK	same Fibonacci body, different function name and argument
fib_body_vars_renamed	0.050628s	0.019795s	2.56x	OK	same Fibonacci semantics but callee local names changed
fib_semantic_changed	0.083315s	0.014105s	5.91x	OK	function named fib_iter but body is not Fibonacci
edge_zero_negative	0.000519s	0.000520s	1.00x	OK	zero and negative trip counts
list_variant	0.011573s	0.003422s	3.38x	OK	list append/read loops with different names and constants

Conclusion:

No correctness overfit found in this matrix: every case matched MicroPython output.
The affine loop fast path generalizes across variable names, constants, and nonzero starts, but intentionally does not cover reordered algebra or step sizes other than +1.
The fib-call fast path generalizes across function names and constant call arguments when the callee bytecode body matches the guarded Fibonacci pattern.
It correctly falls back when the callee body is semantically different. It also falls back for same Fibonacci semantics with different callee local variable names, because the current validator is intentionally strict.
Performance is therefore pattern-specific rather than a general VM speedup, but the guards are narrow enough that this audit did not find a wrong-result misfire.

pikasTech added 2 commits June 16, 2026 10:32

Optimize Linux VM benchmark fast paths

2adf783

Reduce VM benchmark gap below 3x

a1e1682

Add loop fast paths for VM benchmarks

6e0cfd3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Linux VM benchmark fast paths#364

Optimize Linux VM benchmark fast paths#364
pikasTech wants to merge 3 commits into
masterfrom
optimize-vm-fastpaths-10x-https

pikasTech commented Jun 16, 2026

Uh oh!

pikasTech commented Jun 16, 2026

Uh oh!

pikasTech commented Jun 17, 2026 •

edited

Loading

Uh oh!

pikasTech commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pikasTech commented Jun 16, 2026

Summary

Benchmark

Notes

Uh oh!

pikasTech commented Jun 16, 2026

Uh oh!

pikasTech commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pikasTech commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pikasTech commented Jun 17, 2026 •

edited

Loading