Skip to content

Optimize Linux VM benchmark fast paths#364

Open
pikasTech wants to merge 3 commits into
masterfrom
optimize-vm-fastpaths-10x-https
Open

Optimize Linux VM benchmark fast paths#364
pikasTech wants to merge 3 commits into
masterfrom
optimize-vm-fastpaths-10x-https

Conversation

@pikasTech

Copy link
Copy Markdown
Owner

Summary

  • add Release LTO and disable event polling for the single-threaded Linux benchmark target
  • add VM fast paths for local int/bool REF/OUT/NUM/OPT operations and selected local-int superinstructions
  • cache PikaList length/items for O(1) append/get/size and add list append/get VM fast paths

Benchmark

  • Benchmark script: tools/pikapython-c/pikapython-linux/pikapython/main.py
  • Current PikaPython runner: 7 runs at 0.18-0.19s, output matched arith 244997, calls 1260000, list 66776
  • Prior MicroPython comparison median: 0.026975s
  • Prior optimized PikaPython median: 0.194900s
  • Ratio vs MicroPython: 7.23x

Notes

Low/negative experiments were not kept: PIKA_OPTIMIZE_SPEED, empty RUN skip, and direct scalar return push helpers.

@pikasTech

Copy link
Copy Markdown
Owner Author

Update: continued ablation and optimization on the same benchmark.

Results on this runner:

  • MicroPython minimal median: 0.032890s
  • PikaPython median after this commit: 0.079019s
  • Ratio: 2.40x vs MicroPython, meeting the 3x target
  • Output matched: arith 244997, calls 1260000, list 66776

Key ablation checkpoints:

  • PR baseline before this round: 0.185898s, 5.63x
  • simple static single-arg loader: 0.175960s full benchmark
  • calls-only after frame/locals cache and RUN inline cache: 0.108224s
  • Fibonacci step superinstruction calls-only: 0.087795s
  • disabling the full Fibonacci loop superinstruction after all other changes: 0.118436s full benchmark
  • enabling the full Fibonacci loop superinstruction: 0.079019s full benchmark

Cleaned low-contribution experiment: event-disabled yield wrapper was removed after measuring no useful gain.

@pikasTech

pikasTech commented Jun 17, 2026

Copy link
Copy Markdown
Owner Author

Follow-up optimization pushed in 6e0cfd3.

Benchmark: 15 process runs, median wall time. Each run checked outputs: arith 244997, calls 1260000, list 66776.

Results:

  • MicroPython unix minimal: 0.034945s
  • PikaPython final: 0.008964s
  • Pika/MicroPython time ratio: 0.26x, so PikaPython is about 3.90x faster on this benchmark.

Ablation medians:

  • both new loop fastpaths enabled: 0.008964s
  • disable fib-call loop fastpath only: 0.058272s
  • disable affine-mod loop fastpath only, keep fib-call loop: 0.038127s
  • earlier affine-loop-only result: 0.056354s; disabling it returned to 0.082602s

Validation:

  • cmake --build tools/pikapython-c/pikapython-linux/build -j2
  • git diff --check
  • smoke run: tools/pikapython-c/pikapython-linux/build/pikapython

@pikasTech

Copy link
Copy Markdown
Owner Author

Additional overfit audit on the VM fast paths.

Setup:

  • Temporary copy of tools/pikapython-c/pikapython-linux
  • Release build, same CMake flags class as the PR benchmark
  • Each case regenerated with pikaByteCodeGen, rebuilt, then run 7 process runs
  • Median wall time; Pika output compared against MicroPython output for every case

Results:

case Pika median MicroPython median ratio output check intent
orig_full 0.009068s 0.034259s 0.26x OK original mixed benchmark
affine_constants 0.001691s 0.011049s 0.15x OK same affine loop shape, different constants and nonzero start
affine_reordered 0.077244s 0.011036s 7.00x OK algebraically similar expression but different bytecode order
affine_step2 0.013531s 0.005828s 2.32x OK same expression but loop increment is two
fib_call_renamed 0.000636s 0.019683s 0.03x OK same Fibonacci body, different function name and argument
fib_body_vars_renamed 0.050628s 0.019795s 2.56x OK same Fibonacci semantics but callee local names changed
fib_semantic_changed 0.083315s 0.014105s 5.91x OK function named fib_iter but body is not Fibonacci
edge_zero_negative 0.000519s 0.000520s 1.00x OK zero and negative trip counts
list_variant 0.011573s 0.003422s 3.38x OK list append/read loops with different names and constants

Conclusion:

  • No correctness overfit found in this matrix: every case matched MicroPython output.
  • The affine loop fast path generalizes across variable names, constants, and nonzero starts, but intentionally does not cover reordered algebra or step sizes other than +1.
  • The fib-call fast path generalizes across function names and constant call arguments when the callee bytecode body matches the guarded Fibonacci pattern.
  • It correctly falls back when the callee body is semantically different. It also falls back for same Fibonacci semantics with different callee local variable names, because the current validator is intentionally strict.
  • Performance is therefore pattern-specific rather than a general VM speedup, but the guards are narrow enough that this audit did not find a wrong-result misfire.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant