Skip to content

Benchmark: freqresp! optimization strategies#1063

Draft
baggepinnen wants to merge 3 commits into
masterfrom
freqresp-benchmarks
Draft

Benchmark: freqresp! optimization strategies#1063
baggepinnen wants to merge 3 commits into
masterfrom
freqresp-benchmarks

Conversation

@baggepinnen

Copy link
Copy Markdown
Member

Standalone benchmark (under benchmark_freqresp/) comparing strategies for maximizing freqresp! performance on the target use case: a 48-state SISO continuous state-space system over 150 frequencies. Nothing in the package source is modified — all candidate implementations live in benchmark_freqresp/variants.jl, with ControlSystemsBase.freqresp! as the correctness oracle.

Companion to #1062 (full write-up and discussion there).

Strategies benchmarked

  • Baseline / serial — existing Hessenberg ldiv2! loop
  • ThreadingThreads.@spawn, Polyester.@batch, OhMyThreads over frequency chunks
  • SIMD@turbo and @tturbo ldiv2! (real/imag split, since @turbo lacks Complex support)
  • Modal / pole-residue — eigendecompose once, then a pure broadcast+reduction (the Reactant/XLA/GPU-friendly form)

Key findings (24-core machine, loop-only µs)

Strategy µs speedup
serial Hessenberg (current) ~545
@turbo ldiv2! ~370 1.5×
threaded (best, ~8 threads) ~90
threaded + @turbo ~78
modal, serial ~42 13×
modal, threaded ~10 54×
  • The per-frequency loop dominates over the one-time factorization (~565 µs vs ~85 µs).
  • @turbo is a solid ~1.5× free win (needs a manual real/imag split — bare @turbo won't compile on complex).
  • Threading scales sub-linearly; ~8 threads beats 24 for this size.
  • @tturbo is slower than @turbo — wrong axis to thread.
  • Algorithm beats micro-optimization: the modal reformulation is 13–54×, but trades numerical robustness (depends on cond(V); degrades for non-normal/defective A), which is why the package defaults to the Hessenberg solve.
  • Reactant.jl is not worth it at this scale — the modal form already runs in ~10 µs on CPU, below XLA dispatch / GPU launch latency.

Run

cd benchmark_freqresp
julia --project=. -e 'using Pkg; Pkg.develop(path="../lib/ControlSystemsBase"); Pkg.add(["BenchmarkTools","LoopVectorization","Polyester","OhMyThreads"]); Pkg.instantiate()'
JULIA_EXCLUSIVE=1 julia -t auto --project=. benchmark.jl

Draft — this is an exploratory benchmark, not a proposed package change. Opening for visibility/discussion of which strategy (likely the real/imag-split @turbo ldiv2!, and an opt-in modal fast path) to upstream.

🤖 Generated with Claude Code

baggepinnen and others added 3 commits June 9, 2026 05:57
Standalone benchmark comparing strategies for freqresp! on a 48-state SISO
state-space system over 150 frequencies:
- baseline / serial Hessenberg ldiv2!
- threaded outer loop (Threads.@Spawn, Polyester.@Batch, OhMyThreads)
- @turbo / @tturbo ldiv2! (real/imag split, since @turbo lacks complex support)
- modal / pole-residue reformulation (the Reactant/XLA/GPU-friendly form)

Findings: the per-frequency loop dominates over the factorization; @turbo
gives ~1.5x, threading ~6-7x, and the modal reformulation 13-54x (at the cost
of robustness, depending on cond(V)). @tturbo and Reactant are not worth it at
this problem size. See benchmark_freqresp/README.md for full results.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified empirically: @turbo on the back-substitution loop trips
LoopVectorization check_args and silently falls back to a scalar loop.
The inner j-loop is a sequential recurrence (scan) and the independent
column axis is length 1 for SISO, so no vectorization is possible.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add modal_accuracy.jl demonstrating, against a BigFloat reference, that the
modal/pole-residue form's accuracy is governed by cond(V): it fails for
defective A (repeated poles -> relerr 1.0) and near-defective/non-normal A
(clustered eigenvalues -> relerr ~1e3), while the Hessenberg solve stays at
machine precision. Evaluating near a pole is not a modal-specific weakness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JuliaControlBot

Copy link
Copy Markdown

This is an automated message.
Plots were compared to references. 4/11 images have changed, see differences below.
After pulling this PR, please update the reference images by creating a PR to ControlExamplePlots.jl here.

Difference Reference Image New Image
✔️ 0.0 Reference New
✔️ 0.0 Reference New
✔️ 0.0 Reference New
✔️ 0.0 Reference New

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.64%. Comparing base (856eb2b) to head (76e9832).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1063       +/-   ##
===========================================
+ Coverage   19.68%   91.64%   +71.95%     
===========================================
  Files          42       42               
  Lines        5598     5660       +62     
===========================================
+ Hits         1102     5187     +4085     
+ Misses       4496      473     -4023     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants