Skip to content

Performance redesign: type-stable, zero-allocation SLH core #3

@oameye

Description

@oameye

We need to redesign the internals of QuantumInputOutput.jl. The current implementation works but leaves a lot of performance on the table due to type instabilities, unnecessary allocations, and duplicated code paths. Since systems are small (1-4 ports) and the ODE inner loop evaluates our closures thousands of times, every allocation and type-instability compounds.

Problem 1: Two parallel SLH types with duplicated logic

We currently have SLH (symbolic) and SLHqo (numeric/time-dependent). Both represent the same mathematical object — an (S, L, H) triple — and the composition rules (cascade, concatenate, feedback) are mathematically identical. Yet we maintain two complete implementations of , , and feedback, each ~100 lines, because the symbolic path needs SQA._adjoint and simplify while the numeric path needs adjoint and handles Function types.

This duplication is a maintenance burden and a source of subtle divergence bugs.

Solution: A single parametric SLH{N, ST, LT, HT} type. The symbolic-vs-numeric difference reduces to four lines of dispatch:

_adj(x::SQA.QNumber) = SQA._adjoint(x)
_adj(x) = adjoint(x)

_post(x::BasicSymbolic) = simplify(x)
_post(x) = x

One implementation of cascade, concatenate, and feedback uses _adj and _post instead of hardcoding the symbolic or numeric variant. Julia's dispatch handles the rest.

Problem 2: Heap-allocated scattering matrices

The scattering matrix is currently a Matrix (heap-allocated). In concatenation it's even Matrix{Any}:

S_t = Matrix{Any}(undef, lS1+lS2, lS1+lS2)  # SLH.jl line 262

Matrix{Any} means every element access boxes/unboxes, killing type inference for everything downstream. Even the non-Any paths use heap-allocated Matrix and Vector for objects that are 1x1 to 4x4 in practice.

Why this matters: Scattering matrices are accessed during cascade and feedback — operations that happen at model construction time. More importantly, the Lindblad vector is also a plain Vector, and its elements get composed into closures that run at every ODE timestep. Heap allocation for a 2-element vector is pure overhead.

Solution: Use StaticArrays. The port count N becomes a type parameter:

struct SLH{N, ST, LT, HT}
    scattering::SMatrix{N, N, ST}
    lindblad::SVector{N, LT}
    hamiltonian::HT
end

All scattering and Lindblad storage lives on the stack. N is 1-4 in all real use cases (confirmed by scanning every test and example) — well within StaticArrays' sweet spot. Concatenation produces SLH{N1+N2} at compile time. Cascade enforces matching N at compile time (no more runtime @assert).

Convenience constructors preserve the current API:

SLH(1, γ * a, H)                           # scalar → SLH{1}
SLH([-r η; η r], [0, 0], 0)                 # matrix/vector → SLH{2}

Problem 3: Time-dependent closures are type-unstable

The current SLHqo cascade creates closures like:

return t -> begin
    L1_vec = [f(t) for f in L1f]   # allocates a new Vector every call
    L2_vec = [f(t) for f in L2f]   # allocates another Vector every call
    (L2_vec + S2 * L1_vec)[i]      # allocates S2*L1_vec, then sum, then indexes
end

These closures run at every ODE timestep. Each call allocates 3+ vectors that immediately become garbage. Worse, because each closure is a unique anonymous type, storing multiple of them in a Vector{Function} is type-erased — the compiler can't specialize on the element type.

Solution: Two changes:

  1. Composition uses _add/_mul helpers that build composed closures without intermediate vector allocation:
_add(f::Function, g::Function) = t -> f(t) + g(t)
_add(f::Function, x) = t -> f(t) + x
_add(x, f::Function) = t -> x + f(t)
_add(x, y) = x + y

_mul(s, f::Function) = t -> s * f(t)
_mul(f::Function, g::Function) = t -> f(t) * g(t)
_mul(x, y) = x * y
  1. FunctionWrappers.jl gives all time-dependent elements a single concrete type:
const TimeDep{T} = FunctionWrapper{T, Tuple{Float64}}

When any Lindblad element is a Function, all elements are wrapped uniformly as TimeDep{Operator}. This makes SVector{N, TimeDep{Operator}} fully concrete — the compiler sees one type, not N different anonymous closure types. For fully static systems (symbolic or numeric without time-dependence), no wrapping happens.

Problem 4: Pulse functions return opaque closures

Every coupling_input, coupling_output, etc. (currently u_to_gu, v_to_gv, ...) computes a LinearInterpolation then wraps it:

gu_int = LinearInterpolation(gu_t, T; extrapolation = _extrapolate)
return t -> gu_int(t)  # why?

LinearInterpolation is already callable. The wrapper creates an opaque closure type that prevents the compiler from specializing downstream code (e.g., coupling_matrix receiving a concrete LinearInterpolation vs. a generic Function).

Solution: Return the LinearInterpolation directly. All downstream code calls g(t) — works unchanged, but now Julia knows the concrete type.

Problem 5: coupling_matrix (currently interaction_picture_A) allocates every ODE step

function A(t)
    g = [gf(t) for gf in gfs]           # allocates
    M = zeros(ComplexF64, n, n)          # allocates
    # ... fill M ...
    return 0.5 * M                       # allocates again
end

This function is the RHS of the mode evolution ODE. It's called at every timestep. Three allocations per call, thousands of calls.

Solution: Pre-allocate buffers in the closure:

function coupling_matrix(gs)
    gfs = _as_time_function.(gs)
    n = length(gfs)
    M = zeros(ComplexF64, n, n)
    g = Vector{ComplexF64}(undef, n)
    function A(t)
        for k in 1:n; g[k] = gfs[k](t); end
        fill!(M, zero(ComplexF64))
        for i in 1:n, j in (i+1):n
            val = 0.5 * g[i] * conj(g[j])
            M[i, j] = val
            M[j, i] = -conj(val)
        end
        return M
    end
    return A
end

Zero allocations per call.

Problem 6: solve_mode_evolution (currently interaction_picture_M) uses out-of-place ODE

f_M(u, p, t) = A(t) * u   # allocates a new matrix every step

Solution: In-place form, and return the ODE solution directly (it's already callable as sol(t)):

function solve_mode_evolution(A::Function, T; alg = Tsit5(), kwargs...)
    T0, Tend = T[1], T[end]
    n = size(A(T0), 1)
    M0 = Matrix{ComplexF64}(I, n, n)
    function f_M!(du, u, p, t)
        mul!(du, A(t), u)
    end
    prob = ODEProblem(f_M!, M0, (T0, Tend))
    sol = solve(prob, alg; saveat = T, kwargs...)
    return sol
end

Problem 7: Non-const module globals

_tol_div = 1e-10
_extrapolate = ExtrapolationType.Extension
_ϵu = 1e-10
_ϵv = 1e-10

Non-const globals in Julia are type-unstable. The compiler inserts type checks at every access because the value could change type at any time.

Solution: Add const.

Problem 8: translate (currently translate_qo) redundantly normalizes time parameters

_normalize_time_parameter(time_parameter) is called in every translate method, including recursive calls from QAdd → its arguments. For an expression with 20 terms, it runs 20 times doing the same work.

Solution: Normalize once at the public entry point, then pass the result through internal _translate methods:

function translate(op, b::Basis; parameter=Dict(), time_parameter=Dict(), kwargs...)
    tp = _normalize_time_parameter(time_parameter)
    return _translate(op, b, parameter, tp, kwargs...)
end

Also type the internal dict (Dict{KT, Function} instead of Dict{Any, Any}).

Problem 9: Correlation matrix allocates intermediate vectors

g1 = [expect(Ls_ls_dag[it+i-1], ρ_bar_τ[i]) for i = 1:length(τ_)]  # new vector per outer iteration
g1_m[it, it:end] = g1

Solution: Write directly into the output matrix:

for i in eachindex(ρ_bar)
    val = expect(Ls_dag, ρ_bar[i])
    g1_m[it, it+i-1] = val
    g1_m[it+i-1, it] = conj(val)
end

Problem 10: Cryptic API names

The current function names (u_to_gu, v_to_gv, u_eff, interaction_picture_A, etc.) are shorthand from specific papers. They mean nothing without reading Kiilerich & Mølmer. A package API should be self-documenting.

Solution: Rename everything to describe what it computes, not what variable letter it uses.

Full rename table

Current New Why
Types
SLH SLH Well-established in quantum optics
SLHqo removed Unified into SLH
(new) Gaussian Pulse shape descriptor — enables dispatch instead of _Gauss suffix functions
Accessors
get_scattering(G) scattering(G) Julia convention: no get_ prefix (length, size, basis, not get_length)
get_lindblad(G) lindblad(G) same
get_hamiltonian(G) hamiltonian(G) same
Composition
/ cascade / series "cascade" is jargon; series is universally understood
/ concatenate / parallel "concatenate" suggests appending strings; parallel matches the physics
feedback feedback already clear
Pulse coupling
u_to_gu(u, T) coupling_input(mode, T) describes what you compute, not variable letters
v_to_gv(v, T) coupling_output(mode, T) same
u_to_gu_Gauss(τ, σ; δ) coupling_input(g::Gaussian) dispatch on pulse type instead of name suffix
v_to_gv_Gauss(τ, σ; δ) coupling_output(g::Gaussian) same
uv_to_gin(u, v, T) coupling_delay_in(u, v, T) explicit about delay cavity context
uv_to_gout(u, v, T) coupling_delay_out(u, v, T) same
Effective modes
u_eff(modes, T, i) effective_input_mode(modes, T, i) says what it returns
v_eff(modes, T, i) effective_output_mode(modes, T, i) same
Interaction picture
interaction_picture_A(gs) coupling_matrix(gs) describes what it builds; the interaction picture context is implicit
interaction_picture_M(A, T) solve_mode_evolution(A, T) describes what it does, not the textbook symbol
interaction_picture_M_2modes_equal(u, T) solve_mode_evolution_symmetric(u, T) explicit about the u=v symmetry assumption
Translation
translate_qo(op, b) translate(op, b) shorter; the _qo suffix is redundant since the basis argument already implies QuantumOptics
substitute_operators(op, dict) substitute_operators(op, dict) already clear
Correlations
two_time_corr_matrix(T, ρt, ...) correlation_matrix(T, ρt, ...) shorter; "two-time" is implied by returning a matrix over a time grid

Gaussian pulse type

Instead of separate _Gauss functions, introduce a lightweight pulse descriptor:

struct Gaussian{T}
    τ::T    # center time
    σ::T    # width
    δ::T    # detuning
end
Gaussian(τ, σ; δ=0) = Gaussian(τ, σ, δ)

coupling_input(g::Gaussian)   # analytical formula
coupling_output(g::Gaussian)  # analytical formula
coupling_input(mode, T)       # numerical from samples/function
coupling_output(mode, T)      # numerical from samples/function

This is extensible — adding Exponential or SechSquared pulse shapes later just means adding new types and methods, not new function names.

File organization

Split the current files by concern:

src/
├── QuantumInputOutput.jl    # module, exports, imports
├── types.jl                 # SLH, Gaussian, constructors, _adj, _post, _add, _mul
├── compose.jl               # ▷, ⊞, feedback (one implementation each)
├── translate.jl             # translate, _translate internal dispatch
├── pulses.jl                # coupling_input, coupling_output, coupling_delay_in, coupling_delay_out
├── modes.jl                 # effective_input_mode, effective_output_mode
├── interaction_picture.jl   # coupling_matrix, solve_mode_evolution, solve_mode_evolution_symmetric
├── correlations.jl          # correlation_matrix
└── utils.jl                 # substitute_operators, numeric_average, expect, constants

The current pulses.jl mixes coupling computation with ODE-based effective modes — these are separate concerns. The current utils.jl mixes virtual cavity code with operator substitution.

New dependencies

  • StaticArraysSMatrix/SVector for scattering and Lindblad storage
  • FunctionWrappersFunctionWrapper for concrete time-dependent element types

Both lightweight, widely used, zero transitive dependencies.

Breaking changes

What changes Why Migration
SLHqo removed Unified into SLH SLHqo(...)SLH(...), same arguments
SLH.scattering is SMatrix Stack allocation, compile-time dimension safety Matrix(G.scattering) if plain matrix needed
SLH.lindblad is SVector Stack allocation, concrete element types Indexing/iteration unchanged
All function renames Self-documenting API See rename table above
coupling_input etc. return LinearInterpolation Removes opaque closure wrapper Already callable as f(t), transparent
solve_mode_evolution returns ODE solution Removes redundant wrapper Already callable as sol(t), transparent
cascadeseries Clearer terminology Find-and-replace
concatenateparallel Clearer terminology Find-and-replace
get_* → bare name Julia convention Drop get_ prefix

Exports

export SLH, Gaussian,
    scattering, lindblad, hamiltonian,
    , series, , parallel, feedback,
    translate,
    coupling_input, coupling_output,
    coupling_delay_in, coupling_delay_out,
    effective_input_mode, effective_output_mode,
    coupling_matrix, solve_mode_evolution, solve_mode_evolution_symmetric,
    correlation_matrix,
    substitute_operators

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions