Skip to content

[WIP] Parallel MCMC via DEER algorithm#487

Draft
xukai92 wants to merge 7 commits intomainfrom
kx/pscan
Draft

[WIP] Parallel MCMC via DEER algorithm#487
xukai92 wants to merge 7 commits intomainfrom
kx/pscan

Conversation

@xukai92
Copy link
Member

@xukai92 xukai92 commented Jan 21, 2026

Parallel MCMC Implementation (DEER Algorithm)

Implements parallel MCMC sampling based on:

Parallelizing MCMC Across the Sequence Length
https://arxiv.org/abs/2508.18413

Status: Phases 1-5 Complete (420 tests passing)

Completed Phases

Phase 1: Core Infrastructure ✅

  • Parallel scan with O(log T) complexity
  • Affine transforms (Matrix, Diagonal, Block 2×2)
  • Jacobian computation utilities (FD, Hutchinson estimator)

Phase 2: DEER Algorithm Core ✅

  • Newton iteration framework
  • Full DEER (complete Jacobians)
  • Quasi-DEER (diagonal approximation)
  • Stochastic Quasi-DEER (Hutchinson estimator)

Phase 3: MALA Integration ✅

  • Parallel MALA sampler
  • Soft gating for differentiability

Phase 4: HMC Integration ✅

  • Approach A: Parallelize across HMC steps
  • Approach B: Parallelize leapfrog integration
  • Block Quasi-DEER for 2×2 leapfrog Jacobians

Phase 5: AdvancedHMC.jl Integration ✅

  • ParallelHMCSampler / ParallelHMC
  • ParallelMALASampler / ParallelMALA
  • parallel_sample() with LogDensityProblems interface
  • ParallelSamplerState with trajectory and convergence info

Remaining Phases

  • Phase 6: Advanced features (sliding window, GPU support)
  • Phase 7: Testing and validation

Usage Example

using AdvancedHMC.Parallel

# Define target distribution
logp = SimpleLogDensity(2, x -> -0.5 * sum(x.^2), x -> -x)

# Create parallel sampler
sampler = ParallelHMC(0.1, 10; method=QuasiDEER())

# Sample
state = parallel_sample(logp, sampler, 1000)
samples = get_samples(state, 100)  # discard burn-in

Test Summary

Component Tests
Parallel Scan 141
Jacobian Utilities 57
DEER Algorithm 67
MALA 31
HMC 49
AbstractMCMC 75
Total 420

xukai92 and others added 5 commits January 20, 2026 17:01
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the foundation for parallelizing MCMC across sequence length
based on Zoltowski et al. "Parallelizing MCMC Across the Sequence Length".

Core components:
- Parallel scan solving linear recurrences s_t = J_t * s_{t-1} + u_t
- Three affine transform types for different DEER variants:
  - MatrixAffineTransform (full Jacobian for DEER)
  - DiagonalAffineTransform (diagonal approx for Quasi-DEER)
  - Block2x2AffineTransform (2x2 blocks for leapfrog/Block Quasi-DEER)
- Associative composition operator enabling O(log T) parallel time
- Method types: FullDEER, QuasiDEER, StochasticQuasiDEER, BlockQuasiDEER

Files added:
- src/parallel/Parallel.jl - Module definition
- src/parallel/types.jl - Type definitions
- src/parallel/scan.jl - Parallel scan implementation
- test/parallel/test_scan.jl - 141 tests
- docs/parallel_mcmc_implementation_plan.md - Implementation tracker
- docs/parallel_mcmc_implementation_note.md - Algorithm reference

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement utilities for computing Jacobians and their approximations,
completing Phase 1 (Core Infrastructure) of the parallel MCMC implementation.

New utilities in src/parallel/jacobian.jl:
- jacobian_fd: Full Jacobian via finite differences
- jacobian_diagonal_full: Extract Jacobian diagonal
- jvp_fd / vjp_fd: Jacobian-vector and vector-Jacobian products
- rademacher_vector: Generate ±1 random vectors
- hutchinson_diagonal: Stochastic diagonal estimator using Hutchinson's method
- hessian_diagonal: Hessian diagonal for leapfrog integration
- Batch versions for all operations (batch_jacobians, batch_hutchinson_diagonals, etc.)

The implementation is AD-agnostic with finite difference fallbacks.
Users can plug in ForwardDiff/Zygote/Enzyme-based implementations.

Tests: 57 new tests for Jacobian utilities (all passing)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the main DEER (Doubly Efficient Estimation via Recursion)
algorithm that solves MCMC chains in parallel via Newton's method.

Core algorithm in src/parallel/deer.jl:
- deer(): Main entry point with method dispatch
- DEERResult: Result type with trajectory, convergence info, residual history
- sequential_mcmc(): Reference implementation for testing

Three DEER variants implemented:
- FullDEER: Full D×D Jacobian matrices, O(T*D²) memory
- QuasiDEER: Diagonal Jacobian approximation, O(T*D) memory
- StochasticQuasiDEER: Hutchinson estimator for diagonal, only needs JVPs

The algorithm:
1. Initialize trajectory (forward pass or zeros)
2. Newton iteration loop:
   - Evaluate f_t at all timesteps (parallelizable)
   - Compute Jacobians (method-dependent)
   - Compute offsets u_t = f_t(s_{t-1}) - J_t * s_{t-1}
   - Solve linear system via parallel scan
   - Check convergence (max absolute difference)
3. Return DEERResult with trajectory and diagnostics

Tests: 67 new tests covering linear/nonlinear systems, all variants,
convergence tracking, edge cases, and numerical stability.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement parallelized Metropolis-Adjusted Langevin Algorithm using DEER.

MALA components in src/parallel/mala.jl:
- MALARandomInputs: Pre-sampled (ξ, u) pairs for proposals and accept-reject
- MALAConfig: Configuration struct with step size, log density, gradient
- mala_proposal(): Langevin proposal x̃ = x + ε∇log p(x) + √(2ε)ξ
- mala_log_acceptance_ratio(): MH ratio with forward/backward densities
- soft_gate(): Differentiable accept-reject using sigmoid + straight-through
- mala_transition(): Complete MALA step combining proposal and accept-reject
- parallel_mala(): Run full MALA chain in parallel via DEER
- sequential_mala(): Reference implementation for testing

The stop-gradient trick enables computing Jacobians through the
non-differentiable accept-reject step by using a soft sigmoid gate
in the backward pass while keeping hard decisions in the forward pass.

Tests: 31 new tests covering proposal mechanics, sequential/parallel
equivalence, various target distributions, acceptance behavior, and
sample statistics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] reported by reviewdog 🐶

- Good when L is large
- Use **block quasi-DEER** for this (Section 2.4)
- Convergence in ~L/2 iterations typically


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

s_prev = (t == 1) ? s0 : s[t-1, :]


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

s_prev = (t == 1) ? s0 : s[t-1, :]


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

1. Initialize window at the start of the sequence
2. Apply DEER update only within the window
3. Shift window forward to the first non-converged timestep
4. Repeat until the entire sequence converges


[JuliaFormatter] reported by reviewdog 🐶

- Run fewer Newton iterations for faster but approximate samples
- The paper shows early-stopped parallel MALA can achieve comparable MMD to fully converged samples


[JuliaFormatter] reported by reviewdog 🐶

- Smaller step sizes ε → faster DEER convergence
- Standard tuning (e.g., 80% acceptance rate) works well


[JuliaFormatter] reported by reviewdog 🐶

- Larger step sizes → more iterations to converge
- Block quasi-DEER: ~2× fewer iterations than diagonal quasi-DEER
- Parallelizing leapfrog is most beneficial when L (number of leapfrog steps) is large


[JuliaFormatter] reported by reviewdog 🐶

- Traditional: Parallelize across independent chains (batch size B)
- New: Also parallelize across chain length T


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- **ForwardDiff.jl** for full Jacobians (small D)
- **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER


[JuliaFormatter] reported by reviewdog 🐶

- Use `CUDA.jl` with `@cuda` kernels
- Or `KernelAbstractions.jl` for portable GPU code
- Ensure f is GPU-compatible (no scalar indexing, etc.)


[JuliaFormatter] reported by reviewdog 🐶

- Implement custom CUDA kernel for efficiency
- Or use `Transducers.jl` with GPU support


[JuliaFormatter] reported by reviewdog 🐶

1. **New sampler type**: `ParallelHMC` or `DEER_HMC`
2. **Modify `Leapfrog` integrator**: Add parallel mode
3. **New `sample` method**: Returns full chain in one call


[JuliaFormatter] reported by reviewdog 🐶

window_size::Union{Int, Nothing} # for sliding window


[JuliaFormatter] reported by reviewdog 🐶

rng::AbstractRNG,
model,
sampler::ParallelHMC,
n_samples::Int;
initial_θ = nothing


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

| Scenario | Recommended Method |
|----------|-------------------|
| Low-D, need fastest convergence | DEER (full Jacobian) |
| High-D, memory-constrained | Stochastic quasi-DEER |
| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |
| Very long chains | Add sliding window |
| Approximate samples OK | Early stopping |


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- **Code**: https://github.com/lindermanlab/parallel-mcmc
- **DEER theory**: Lim et al. [7], Danieli et al. [6]
- **Quasi-DEER**: Gonzalez et al. [8]
- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Foundation components needed by all DEER variants


[JuliaFormatter] reported by reviewdog 🐶

- [x] **1.1 Parallel Scan Implementation**
- [x] Associative operator for affine transforms: `(J₂, u₂) ⊕ (J₁, u₁) = (J₂*J₁, J₂*u₁ + u₂)`
- [x] Full matrix version (for DEER) - `MatrixAffineTransform`
- [x] Diagonal version (for Quasi-DEER) - `DiagonalAffineTransform`
- [x] Block-diagonal version (for Block Quasi-DEER / leapfrog) - `Block2x2AffineTransform`
- [x] CPU implementation using `Base.accumulate`
- [ ] GPU implementation (CUDA.jl) - deferred to Phase 6
- [x] **1.2 Jacobian Computation Utilities**
- [x] Full Jacobian computation (finite differences, AD-agnostic interface)
- [x] Diagonal extraction from Jacobian
- [x] Stochastic diagonal estimation (Hutchinson's method with Rademacher vectors)
- [x] JVP (Jacobian-vector product) wrapper
- [x] VJP (Vector-Jacobian product) wrapper
- [x] Batch versions for computing at multiple points
- [x] Hessian diagonal utilities (for leapfrog)
- [x] **1.3 Core Types and Interfaces**
- [x] `AbstractParallelMethod` with subtypes: `FullDEER`, `QuasiDEER`, `StochasticQuasiDEER`, `BlockQuasiDEER`
- [x] `ParallelMCMCState` to hold trajectory and convergence info
- [x] `ParallelMCMCSettings` configuration struct
---


[JuliaFormatter] reported by reviewdog 🐶

> The main Newton iteration loop
- [x] **2.1 Newton Iteration Framework**
- [x] Main loop structure with convergence checking
- [x] Parallel evaluation of f_t at all timesteps
- [x] Parallel computation of Jacobians (method-dependent)
- [x] Compute inputs u_t = f_t(s_{t-1}) - J_t * s_{t-1}
- [x] Solve linear system via parallel scan
- [x] Convergence criterion (max absolute difference)
- [x] DEERResult type with trajectory, convergence info, residual history
- [x] **2.2 Full DEER Implementation**
- [x] Store and manipulate T × D × D Jacobian matrices
- [x] Full matrix parallel scan


[JuliaFormatter] reported by reviewdog 🐶

- [x] **2.3 Quasi-DEER Implementation**
- [x] Diagonal Jacobian storage (T × D)
- [x] Elementwise parallel scan
- [x] **2.4 Stochastic Quasi-DEER Implementation**
- [x] Hutchinson estimator with configurable number of samples
- [x] Rademacher vector generation (from Phase 1.2)
- [x] JVP-based diagonal estimation


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Parallelized Metropolis-Adjusted Langevin Algorithm
- [x] **3.1 MALA Transition Function**
- [x] Proposal: x̃ = x + ε∇log p(x) + √(2ε)ξ
- [x] Acceptance ratio computation (forward and backward proposal densities)
- [x] Stop-gradient trick for differentiable accept-reject
- [x] Soft gating with sigmoid and straight-through estimator


[JuliaFormatter] reported by reviewdog 🐶

- [x] **3.2 Parallel MALA Sampler**
- [x] MALARandomInputs type for pre-sampled (ξ, u) pairs
- [x] sample_mala_inputs() for batch sampling
- [x] parallel_mala() integrating with DEER framework
- [x] sequential_mala() for reference/testing
- [x] Convenience API with automatic input sampling
- [ ] **3.3 MALA-specific Optimizations** (deferred)
- [ ] Preconditioning with Hessian eigendecomposition (optional)


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Two approaches for parallelizing HMC


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **4.1 Approach A: Parallelize Across HMC Steps**
- [ ] Treat full HMC step as transition function f_t
- [ ] Sequential leapfrog within each step
- [ ] Parallel Newton across T HMC samples
- [ ] Good when T >> L (many samples, few leapfrog steps)
- [ ] **4.2 Approach B: Parallelize Leapfrog Integration**
- [ ] State is s = [x, v] (position + momentum)
- [ ] Apply DEER to L leapfrog steps within each HMC step
- [ ] Block Quasi-DEER with 2×2 block structure per dimension
- [ ] Good when L is large
- [ ] **4.3 Block Quasi-DEER for Leapfrog**
- [ ] Block Jacobian structure:
```
J = [ I_D ε*I_D ]
[ ε*diag(H) I_D + ε²*diag(H) ]
```
- [ ] Efficient 2×2 block scan per dimension
- [ ] Hessian diagonal computation
- [ ] **4.4 Accept-Reject for HMC**
- [ ] Stop-gradient trick (same as MALA)
- [ ] Momentum refresh handling
---


[JuliaFormatter] reported by reviewdog 🐶

> Integrate with existing library architecture


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **5.1 New Sampler Types**
- [ ] `ParallelHMC <: AbstractMCMCSampler`
- [ ] `ParallelMALA <: AbstractMCMCSampler`
- [ ] **5.2 AbstractMCMC Interface**
- [ ] Implement `AbstractMCMC.step` (or batch variant)
- [ ] Implement `AbstractMCMC.sample` returning full chain
- [ ] Handle RNG properly for reproducibility
- [ ] **5.3 Integration with Existing Components**
- [ ] Use existing `Hamiltonian` type
- [ ] Use existing `Metric` types
- [ ] Use existing `PhasePoint` structure
- [ ] Compatibility with existing gradient computation


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Optimizations and extensions
- [ ] **6.1 Sliding Window**
- [ ] For memory-constrained high-dimensional problems
- [ ] Window initialization and shifting logic
- [ ] Convergence tracking per window
- [ ] **6.2 Early Stopping**
- [ ] Return approximate samples before full convergence
- [ ] Quality-vs-time tradeoff parameter
- [ ] **6.3 GPU Support**
- [ ] CUDA.jl parallel scan kernel
- [ ] GPU-compatible transition functions
- [ ] Batch Jacobian computation on GPU


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **6.4 Diagnostics and Monitoring**
- [ ] Iteration count tracking
- [ ] Convergence history
- [ ] Per-timestep residual monitoring


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Ensure correctness and performance
- [ ] **7.1 Unit Tests**
- [ ] Parallel scan correctness (compare to sequential)
- [ ] Jacobian computation accuracy
- [ ] Hutchinson estimator convergence
- [ ] **7.2 Integration Tests**
- [ ] DEER converges to sequential MALA trace
- [ ] DEER converges to sequential HMC trace
- [ ] Quasi-DEER matches DEER (eventually)


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **7.3 Statistical Validation**
- [ ] Samples match target distribution (known distributions)
- [ ] Compare sample quality metrics (ESS, R-hat) to sequential
- [ ] **7.4 Performance Benchmarks**
- [ ] Wall-clock time vs sequential
- [ ] Scaling with chain length T
- [ ] Memory usage profiling


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- ForwardDiff.jl - Jacobian computation
- LinearAlgebra - Matrix operations


[JuliaFormatter] reported by reviewdog 🐶

- CUDA.jl - GPU acceleration
- Zygote.jl / Enzyme.jl - Alternative AD for JVPs
- ChainRulesCore.jl - Custom rrules for stop-gradient


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

1. **Start with Phase 1.1** - Parallel scan is the foundation
2. **Then Phase 1.2-1.3** - Jacobian utilities and types
3. **Phase 2** - Core DEER loop (test with simple linear system first)
4. **Phase 3** - MALA (simpler than HMC, good first MCMC target)
5. **Phase 4** - HMC integration
6. **Phase 5** - AdvancedHMC.jl integration
7. **Phase 6-7** - Advanced features and thorough testing


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

| Date | Phase | Item | Status | Notes |
|------|-------|------|--------|-------|
| 2026-01-20 | - | Initial plan created | ✅ | Based on implementation note |
| 2026-01-20 | 1.1 | Parallel scan implementation | ✅ | All 3 variants (matrix, diagonal, block 2x2) |
| 2026-01-20 | 1.3 | Core types and interfaces | ✅ | Types for methods, settings, state |
| 2026-01-20 | 7.1 | Parallel scan unit tests | ✅ | 141 tests passing |
| 2026-01-20 | 1.2 | Jacobian computation utilities | ✅ | FD-based, Hutchinson estimator, JVP/VJP |
| 2026-01-20 | 7.1 | Jacobian utility tests | ✅ | 57 tests passing |
| 2026-01-20 | 1 | **Phase 1 Complete** | ✅ | All core infrastructure done |
| 2026-01-20 | 2 | DEER algorithm core | ✅ | Newton iteration, Full/Quasi/Stochastic DEER |
| 2026-01-20 | 7.1 | DEER algorithm tests | ✅ | 67 tests passing |
| 2026-01-20 | 2 | **Phase 2 Complete** | ✅ | Core DEER algorithm working |
| 2026-01-20 | 3 | MALA transition function | ✅ | Proposal, acceptance, soft gating |
| 2026-01-20 | 3 | Parallel MALA sampler | ✅ | Integrated with DEER framework |
| 2026-01-20 | 7.1 | MALA tests | ✅ | 31 tests passing |
| 2026-01-20 | 3 | **Phase 3 Complete** | ✅ | Parallel MALA working |
---


[JuliaFormatter] reported by reviewdog 🐶

1. **AD Backend Choice**: ForwardDiff for full Jacobians, but what for JVPs in stochastic quasi-DEER? Zygote vs Enzyme?
- **Decision**: Use whichever works best; start with ForwardDiff
2. **GPU Priority**: Should GPU support be Phase 1 or deferred to Phase 6?
- **Decision**: Deferred to Phase 6; focus on CPU correctness first
3. **Integration Approach**: Should we modify existing HMC/Leapfrog types or create entirely new parallel variants?
- **Decision**: Be compatible with existing abstractions and API design as much as possible
4. **Testing Strategy**: What target distributions should we use for validation? Gaussian (analytical), banana, Neal's funnel?


[JuliaFormatter] reported by reviewdog 🐶

5. **API Design**: Return full chain always, or support incremental parallel blocks?


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- Reference implementation: https://github.com/lindermanlab/parallel-mcmc (JAX)
- DEER theory: Lim et al., Danieli et al.
- Hutchinson estimator: Hutchinson (1990), Bekas et al.


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method;
jacobian_fn=jacobian_fn, jvp_fn=jvp_fn, rng=rng


[JuliaFormatter] reported by reviewdog 🐶

function _deer_iteration(
f, s0, trajectory, ω, method::FullDEER;
jacobian_fn, jvp_fn, rng
)


[JuliaFormatter] reported by reviewdog 🐶

function _deer_iteration(
f, s0, trajectory, ω, method::QuasiDEER;
jacobian_fn, jvp_fn, rng
)


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method::StochasticQuasiDEER;
jacobian_fn, jvp_fn, rng


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω;
jvp_fn=jvp_fn, rng=rng, n_samples=method.n_samples


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector{T},
trajectory::AbstractMatrix{T},
ω;
jacobian_fn=jacobian_fd,


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector{T},
trajectory::AbstractMatrix{T},
ω;
jacobian_fn=jacobian_fd,


[JuliaFormatter] reported by reviewdog 🐶

J_diag[t, :] = hutchinson_diagonal(f_t, s_prev, jvp_fn; rng=rng, n_samples=n_samples)


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector,
T_len::Int,
ω,
settings::ParallelMCMCSettings;
kwargs...


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

function sequential_mcmc(
f,
s0::AbstractVector{T},
T_len::Int,
ω,
) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function batch_jacobian_diagonals(f, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function rademacher_vector(rng::AbstractRNG, n::Int, ::Type{T}=Float64) where {T}


[JuliaFormatter] reported by reviewdog 🐶

f,
x::AbstractVector{T},
jvp_fn;
rng::AbstractRNG=default_rng(),
n_samples::Int=1,


[JuliaFormatter] reported by reviewdog 🐶

f,
xs::AbstractMatrix{T},
jvp_fn;
rng::AbstractRNG=default_rng(),
n_samples::Int=1,


[JuliaFormatter] reported by reviewdog 🐶

function hessian_diagonal(grad_log_p, x::AbstractVector{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function batch_hessian_diagonals(grad_log_p, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

rng::AbstractRNG,
D::Int,
T_len::Int,
::Type{T}=Float64,


[JuliaFormatter] reported by reviewdog 🐶

sample_mala_inputs(D::Int, T_len::Int, T::Type=Float64) =


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
∇logp_x::AbstractVector{T},
ε::T,
ξ::AbstractVector{T},


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
::AbstractVector{T},
logp,
∇logp,
ε::T,


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
ω::MALARandomInputs{T},
logp,
∇logp,
ε::T,


[JuliaFormatter] reported by reviewdog 🐶

logp,
∇logp,
ε::T,
s0::AbstractVector{T},
T_len::Int;
rng::AbstractRNG=default_rng(),


[JuliaFormatter] reported by reviewdog 🐶

function make_block_transforms(H_diag::Matrix{T}, ε::T, u_x::Matrix{T}, u_v::Matrix{T}) where {T}


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶

rng=MersenneTwister(12345)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=200)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=FullDEER(), tol=1e-10, max_iters=100)


[JuliaFormatter] reported by reviewdog 🐶

config, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶

rng=MersenneTwister(999)


[JuliaFormatter] reported by reviewdog 🐶

logp, ∇logp, ε, s0, T_len;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=500)


[JuliaFormatter] reported by reviewdog 🐶

t_block = Block2x2AffineTransform(rand(D), rand(D), rand(D), rand(D), rand(D), rand(D))

## Common Commands

### Testing
```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
```bash
```bash

Tests use ReTest.jl. Test files are in `test/` and cover: metric, hamiltonian, integrator, trajectory, adaptation, sampler, abstractmcmc, mcmcchains, constructors.

### Code Formatting
```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
```bash
```bash

```bash
julia -e 'using JuliaFormatter; format(".")'
```
Uses Blue style (see `.JuliaFormatter.toml`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
Uses Blue style (see `.JuliaFormatter.toml`).
Uses Blue style (see `.JuliaFormatter.toml`).

Comment on lines +52 to +59
- `src/metric.jl` - Metric types: `UnitEuclideanMetric`, `DiagEuclideanMetric`, `DenseEuclideanMetric`
- `src/hamiltonian.jl` - `Hamiltonian`, `PhasePoint`, `DualValue` (caches value+gradient)
- `src/integrator.jl` - `Leapfrog`, `JitteredLeapfrog`, `TemperedLeapfrog`
- `src/trajectory.jl` - Termination criteria (`ClassicNoUTurn`, `GeneralisedNoUTurn`, `FixedNSteps`) and trajectory samplers (`EndPointTS`, `SliceTS`, `MultinomialTS`)
- `src/adaptation/` - Step size (`NesterovDualAveraging`), mass matrix (`WelfordVar`, `WelfordCov`, `NutpieVar`), composite (`StanHMCAdaptor`, `NaiveHMCAdaptor`)
- `src/constructors.jl` - Convenience constructors: `NUTS`, `HMC`, `HMCDA`
- `src/abstractmcmc.jl` - AbstractMCMC.jl interface implementation
- `src/riemannian/` - Riemannian manifold support with `GeneralizedLeapfrog`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- `src/metric.jl` - Metric types: `UnitEuclideanMetric`, `DiagEuclideanMetric`, `DenseEuclideanMetric`
- `src/hamiltonian.jl` - `Hamiltonian`, `PhasePoint`, `DualValue` (caches value+gradient)
- `src/integrator.jl` - `Leapfrog`, `JitteredLeapfrog`, `TemperedLeapfrog`
- `src/trajectory.jl` - Termination criteria (`ClassicNoUTurn`, `GeneralisedNoUTurn`, `FixedNSteps`) and trajectory samplers (`EndPointTS`, `SliceTS`, `MultinomialTS`)
- `src/adaptation/` - Step size (`NesterovDualAveraging`), mass matrix (`WelfordVar`, `WelfordCov`, `NutpieVar`), composite (`StanHMCAdaptor`, `NaiveHMCAdaptor`)
- `src/constructors.jl` - Convenience constructors: `NUTS`, `HMC`, `HMCDA`
- `src/abstractmcmc.jl` - AbstractMCMC.jl interface implementation
- `src/riemannian/` - Riemannian manifold support with `GeneralizedLeapfrog`
- `src/metric.jl` - Metric types: `UnitEuclideanMetric`, `DiagEuclideanMetric`, `DenseEuclideanMetric`
- `src/hamiltonian.jl` - `Hamiltonian`, `PhasePoint`, `DualValue` (caches value+gradient)
- `src/integrator.jl` - `Leapfrog`, `JitteredLeapfrog`, `TemperedLeapfrog`
- `src/trajectory.jl` - Termination criteria (`ClassicNoUTurn`, `GeneralisedNoUTurn`, `FixedNSteps`) and trajectory samplers (`EndPointTS`, `SliceTS`, `MultinomialTS`)
- `src/adaptation/` - Step size (`NesterovDualAveraging`), mass matrix (`WelfordVar`, `WelfordCov`, `NutpieVar`), composite (`StanHMCAdaptor`, `NaiveHMCAdaptor`)
- `src/constructors.jl` - Convenience constructors: `NUTS`, `HMC`, `HMCDA`
- `src/abstractmcmc.jl` - AbstractMCMC.jl interface implementation
- `src/riemannian/` - Riemannian manifold support with `GeneralizedLeapfrog`

Comment on lines +64 to +68
- `AdvancedHMCOrdinaryDiffEqExt` - DiffEqIntegrator support
- `AdvancedHMCMCMCChainsExt` - MCMCChains.jl integration
- `AdvancedHMCCUDAExt` - CUDA array support
- `AdvancedHMCComponentArraysExt` - ComponentArrays support
- `AdvancedHMCADTypesExt` - ADTypes.jl integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- `AdvancedHMCOrdinaryDiffEqExt` - DiffEqIntegrator support
- `AdvancedHMCMCMCChainsExt` - MCMCChains.jl integration
- `AdvancedHMCCUDAExt` - CUDA array support
- `AdvancedHMCComponentArraysExt` - ComponentArrays support
- `AdvancedHMCADTypesExt` - ADTypes.jl integration
- `AdvancedHMCOrdinaryDiffEqExt` - DiffEqIntegrator support
- `AdvancedHMCMCMCChainsExt` - MCMCChains.jl integration
- `AdvancedHMCCUDAExt` - CUDA array support
- `AdvancedHMCComponentArraysExt` - ComponentArrays support
- `AdvancedHMCADTypesExt` - ADTypes.jl integration

function hmc_step(x_prev, ω_momentum, ω_leapfrog, u_accept, ε, L, ∇logp)
# Sample momentum
v = ω_momentum # ~ N(0, I)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change


# Leapfrog integration (sequential within this step)
x, v = leapfrog(x_prev, v, ε, L, ∇logp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change


# Accept-reject (with stop-gradient trick as in MALA)
# ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

Comment on lines +227 to +229
- Parallelize across T HMC samples
- Each parallel iteration evaluates L leapfrog steps sequentially
- Good when T >> L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- Parallelize across T HMC samples
- Each parallel iteration evaluates L leapfrog steps sequentially
- Good when T >> L
- Parallelize across T HMC samples
- Each parallel iteration evaluates L leapfrog steps sequentially
- Good when T >> L

Run HMC steps sequentially, but parallelize the inner leapfrog loop:

**Leapfrog step** (Equation 8):
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
```
```

@github-actions
Copy link
Contributor

AdvancedHMC.jl documentation for PR #487 is available at:
https://TuringLang.github.io/AdvancedHMC.jl/previews/PR487/

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 0% with 566 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.75%. Comparing base (6bc0c74) to head (c041fe9).

Files with missing lines Patch % Lines
src/parallel/deer.jl 0.00% 151 Missing ⚠️
src/parallel/hmc.jl 0.00% 109 Missing ⚠️
src/parallel/scan.jl 0.00% 89 Missing ⚠️
src/parallel/abstractmcmc.jl 0.00% 71 Missing ⚠️
src/parallel/jacobian.jl 0.00% 67 Missing ⚠️
src/parallel/mala.jl 0.00% 62 Missing ⚠️
src/parallel/types.jl 0.00% 17 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (6bc0c74) and HEAD (c041fe9). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (6bc0c74) HEAD (c041fe9)
9 6
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #487       +/-   ##
===========================================
- Coverage   77.71%   53.75%   -23.96%     
===========================================
  Files          21       28        +7     
  Lines        1270     1836      +566     
===========================================
  Hits          987      987               
- Misses        283      849      +566     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Implements parallel HMC using DEER algorithm with two approaches:

Approach A - Parallelize across HMC steps:
- parallel_hmc() runs T HMC steps in parallel
- hmc_transition_soft() with soft MH gating for differentiability

Approach B - Parallelize leapfrog integration:
- parallel_leapfrog() parallelizes L leapfrog steps
- leapfrog_transition() as DEER transition function

Block Quasi-DEER for leapfrog:
- BlockQuasiDEER type exploiting 2x2 block Jacobian structure
- _deer_iteration_block() for efficient block-diagonal Newton
- parallel_scan_block() using Block2x2AffineTransform

49 new tests (345 total parallel tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] reported by reviewdog 🐶

rng::AbstractRNG,
model,
sampler::ParallelHMC,
n_samples::Int;
initial_θ = nothing


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

| Scenario | Recommended Method |
|----------|-------------------|
| Low-D, need fastest convergence | DEER (full Jacobian) |
| High-D, memory-constrained | Stochastic quasi-DEER |
| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |
| Very long chains | Add sliding window |
| Approximate samples OK | Early stopping |


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- **Code**: https://github.com/lindermanlab/parallel-mcmc
- **DEER theory**: Lim et al. [7], Danieli et al. [6]
- **Quasi-DEER**: Gonzalez et al. [8]
- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Foundation components needed by all DEER variants


[JuliaFormatter] reported by reviewdog 🐶

- [x] **1.1 Parallel Scan Implementation**
- [x] Associative operator for affine transforms: `(J₂, u₂) ⊕ (J₁, u₁) = (J₂*J₁, J₂*u₁ + u₂)`
- [x] Full matrix version (for DEER) - `MatrixAffineTransform`
- [x] Diagonal version (for Quasi-DEER) - `DiagonalAffineTransform`
- [x] Block-diagonal version (for Block Quasi-DEER / leapfrog) - `Block2x2AffineTransform`
- [x] CPU implementation using `Base.accumulate`
- [ ] GPU implementation (CUDA.jl) - deferred to Phase 6
- [x] **1.2 Jacobian Computation Utilities**
- [x] Full Jacobian computation (finite differences, AD-agnostic interface)
- [x] Diagonal extraction from Jacobian
- [x] Stochastic diagonal estimation (Hutchinson's method with Rademacher vectors)
- [x] JVP (Jacobian-vector product) wrapper
- [x] VJP (Vector-Jacobian product) wrapper
- [x] Batch versions for computing at multiple points
- [x] Hessian diagonal utilities (for leapfrog)
- [x] **1.3 Core Types and Interfaces**
- [x] `AbstractParallelMethod` with subtypes: `FullDEER`, `QuasiDEER`, `StochasticQuasiDEER`, `BlockQuasiDEER`
- [x] `ParallelMCMCState` to hold trajectory and convergence info
- [x] `ParallelMCMCSettings` configuration struct
---


[JuliaFormatter] reported by reviewdog 🐶

> The main Newton iteration loop
- [x] **2.1 Newton Iteration Framework**
- [x] Main loop structure with convergence checking
- [x] Parallel evaluation of f_t at all timesteps
- [x] Parallel computation of Jacobians (method-dependent)
- [x] Compute inputs u_t = f_t(s_{t-1}) - J_t * s_{t-1}
- [x] Solve linear system via parallel scan
- [x] Convergence criterion (max absolute difference)
- [x] DEERResult type with trajectory, convergence info, residual history
- [x] **2.2 Full DEER Implementation**
- [x] Store and manipulate T × D × D Jacobian matrices
- [x] Full matrix parallel scan


[JuliaFormatter] reported by reviewdog 🐶

- [x] **2.3 Quasi-DEER Implementation**
- [x] Diagonal Jacobian storage (T × D)
- [x] Elementwise parallel scan
- [x] **2.4 Stochastic Quasi-DEER Implementation**
- [x] Hutchinson estimator with configurable number of samples
- [x] Rademacher vector generation (from Phase 1.2)
- [x] JVP-based diagonal estimation


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Parallelized Metropolis-Adjusted Langevin Algorithm
- [x] **3.1 MALA Transition Function**
- [x] Proposal: x̃ = x + ε∇log p(x) + √(2ε)ξ
- [x] Acceptance ratio computation (forward and backward proposal densities)
- [x] Stop-gradient trick for differentiable accept-reject
- [x] Soft gating with sigmoid and straight-through estimator


[JuliaFormatter] reported by reviewdog 🐶

- [x] **3.2 Parallel MALA Sampler**
- [x] MALARandomInputs type for pre-sampled (ξ, u) pairs
- [x] sample_mala_inputs() for batch sampling
- [x] parallel_mala() integrating with DEER framework
- [x] sequential_mala() for reference/testing
- [x] Convenience API with automatic input sampling
- [ ] **3.3 MALA-specific Optimizations** (deferred)
- [ ] Preconditioning with Hessian eigendecomposition (optional)


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Two approaches for parallelizing HMC


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **4.1 Approach A: Parallelize Across HMC Steps**
- [ ] Treat full HMC step as transition function f_t
- [ ] Sequential leapfrog within each step
- [ ] Parallel Newton across T HMC samples
- [ ] Good when T >> L (many samples, few leapfrog steps)
- [ ] **4.2 Approach B: Parallelize Leapfrog Integration**
- [ ] State is s = [x, v] (position + momentum)
- [ ] Apply DEER to L leapfrog steps within each HMC step
- [ ] Block Quasi-DEER with 2×2 block structure per dimension
- [ ] Good when L is large
- [ ] **4.3 Block Quasi-DEER for Leapfrog**
- [ ] Block Jacobian structure:
```
J = [ I_D ε*I_D ]
[ ε*diag(H) I_D + ε²*diag(H) ]
```
- [ ] Efficient 2×2 block scan per dimension
- [ ] Hessian diagonal computation
- [ ] **4.4 Accept-Reject for HMC**
- [ ] Stop-gradient trick (same as MALA)
- [ ] Momentum refresh handling
---


[JuliaFormatter] reported by reviewdog 🐶

> Integrate with existing library architecture


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **5.1 New Sampler Types**
- [ ] `ParallelHMC <: AbstractMCMCSampler`
- [ ] `ParallelMALA <: AbstractMCMCSampler`
- [ ] **5.2 AbstractMCMC Interface**
- [ ] Implement `AbstractMCMC.step` (or batch variant)
- [ ] Implement `AbstractMCMC.sample` returning full chain
- [ ] Handle RNG properly for reproducibility
- [ ] **5.3 Integration with Existing Components**
- [ ] Use existing `Hamiltonian` type
- [ ] Use existing `Metric` types
- [ ] Use existing `PhasePoint` structure
- [ ] Compatibility with existing gradient computation


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Optimizations and extensions
- [ ] **6.1 Sliding Window**
- [ ] For memory-constrained high-dimensional problems
- [ ] Window initialization and shifting logic
- [ ] Convergence tracking per window
- [ ] **6.2 Early Stopping**
- [ ] Return approximate samples before full convergence
- [ ] Quality-vs-time tradeoff parameter
- [ ] **6.3 GPU Support**
- [ ] CUDA.jl parallel scan kernel
- [ ] GPU-compatible transition functions
- [ ] Batch Jacobian computation on GPU


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **6.4 Diagnostics and Monitoring**
- [ ] Iteration count tracking
- [ ] Convergence history
- [ ] Per-timestep residual monitoring


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

> Ensure correctness and performance
- [ ] **7.1 Unit Tests**
- [ ] Parallel scan correctness (compare to sequential)
- [ ] Jacobian computation accuracy
- [ ] Hutchinson estimator convergence
- [ ] **7.2 Integration Tests**
- [ ] DEER converges to sequential MALA trace
- [ ] DEER converges to sequential HMC trace
- [ ] Quasi-DEER matches DEER (eventually)


[JuliaFormatter] reported by reviewdog 🐶

- [ ] **7.3 Statistical Validation**
- [ ] Samples match target distribution (known distributions)
- [ ] Compare sample quality metrics (ESS, R-hat) to sequential
- [ ] **7.4 Performance Benchmarks**
- [ ] Wall-clock time vs sequential
- [ ] Scaling with chain length T
- [ ] Memory usage profiling


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- ForwardDiff.jl - Jacobian computation
- LinearAlgebra - Matrix operations


[JuliaFormatter] reported by reviewdog 🐶

- CUDA.jl - GPU acceleration
- Zygote.jl / Enzyme.jl - Alternative AD for JVPs
- ChainRulesCore.jl - Custom rrules for stop-gradient


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

1. **Start with Phase 1.1** - Parallel scan is the foundation
2. **Then Phase 1.2-1.3** - Jacobian utilities and types
3. **Phase 2** - Core DEER loop (test with simple linear system first)
4. **Phase 3** - MALA (simpler than HMC, good first MCMC target)
5. **Phase 4** - HMC integration
6. **Phase 5** - AdvancedHMC.jl integration
7. **Phase 6-7** - Advanced features and thorough testing


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

| Date | Phase | Item | Status | Notes |
|------|-------|------|--------|-------|
| 2026-01-20 | - | Initial plan created | ✅ | Based on implementation note |
| 2026-01-20 | 1.1 | Parallel scan implementation | ✅ | All 3 variants (matrix, diagonal, block 2x2) |
| 2026-01-20 | 1.3 | Core types and interfaces | ✅ | Types for methods, settings, state |
| 2026-01-20 | 7.1 | Parallel scan unit tests | ✅ | 141 tests passing |
| 2026-01-20 | 1.2 | Jacobian computation utilities | ✅ | FD-based, Hutchinson estimator, JVP/VJP |
| 2026-01-20 | 7.1 | Jacobian utility tests | ✅ | 57 tests passing |
| 2026-01-20 | 1 | **Phase 1 Complete** | ✅ | All core infrastructure done |
| 2026-01-20 | 2 | DEER algorithm core | ✅ | Newton iteration, Full/Quasi/Stochastic DEER |
| 2026-01-20 | 7.1 | DEER algorithm tests | ✅ | 67 tests passing |
| 2026-01-20 | 2 | **Phase 2 Complete** | ✅ | Core DEER algorithm working |
| 2026-01-20 | 3 | MALA transition function | ✅ | Proposal, acceptance, soft gating |
| 2026-01-20 | 3 | Parallel MALA sampler | ✅ | Integrated with DEER framework |
| 2026-01-20 | 7.1 | MALA tests | ✅ | 31 tests passing |
| 2026-01-20 | 3 | **Phase 3 Complete** | ✅ | Parallel MALA working |
---


[JuliaFormatter] reported by reviewdog 🐶

1. **AD Backend Choice**: ForwardDiff for full Jacobians, but what for JVPs in stochastic quasi-DEER? Zygote vs Enzyme?
- **Decision**: Use whichever works best; start with ForwardDiff
2. **GPU Priority**: Should GPU support be Phase 1 or deferred to Phase 6?
- **Decision**: Deferred to Phase 6; focus on CPU correctness first
3. **Integration Approach**: Should we modify existing HMC/Leapfrog types or create entirely new parallel variants?
- **Decision**: Be compatible with existing abstractions and API design as much as possible
4. **Testing Strategy**: What target distributions should we use for validation? Gaussian (analytical), banana, Neal's funnel?


[JuliaFormatter] reported by reviewdog 🐶

5. **API Design**: Return full chain always, or support incremental parallel blocks?


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- Reference implementation: https://github.com/lindermanlab/parallel-mcmc (JAX)
- DEER theory: Lim et al., Danieli et al.
- Hutchinson estimator: Hutchinson (1990), Bekas et al.


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method;
jacobian_fn=jacobian_fn, jvp_fn=jvp_fn, rng=rng


[JuliaFormatter] reported by reviewdog 🐶

function _deer_iteration(
f, s0, trajectory, ω, method::FullDEER;
jacobian_fn, jvp_fn, rng
)


[JuliaFormatter] reported by reviewdog 🐶

function _deer_iteration(
f, s0, trajectory, ω, method::QuasiDEER;
jacobian_fn, jvp_fn, rng
)


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method::StochasticQuasiDEER;
jacobian_fn, jvp_fn, rng


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω;
jvp_fn=jvp_fn, rng=rng, n_samples=method.n_samples


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector{T},
trajectory::AbstractMatrix{T},
ω;
jacobian_fn=jacobian_fd,


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector{T},
trajectory::AbstractMatrix{T},
ω;
jacobian_fn=jacobian_fd,


[JuliaFormatter] reported by reviewdog 🐶

J_diag[t, :] = hutchinson_diagonal(f_t, s_prev, jvp_fn; rng=rng, n_samples=n_samples)


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method::BlockQuasiDEER;
jacobian_fn, jvp_fn, rng


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω;


[JuliaFormatter] reported by reviewdog 🐶

M⁻¹=method.M⁻¹


[JuliaFormatter] reported by reviewdog 🐶

r_prev = s_prev[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

f_r = f_vals[t, (D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

transforms = [Block2x2AffineTransform(
J_a[t, :], J_b[t, :], J_c[t, :], J_e[t, :],
u_x[t, :], u_v[t, :]
) for t in 1:T_len]


[JuliaFormatter] reported by reviewdog 🐶

r0 = s0[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

trajectory_new[:, (D+1):end] = trajectory_r


[JuliaFormatter] reported by reviewdog 🐶

prefix[t] = compose(transforms[t], prefix[t-1])


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector,
T_len::Int,
ω,
settings::ParallelMCMCSettings;
kwargs...


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

function sequential_mcmc(
f,
s0::AbstractVector{T},
T_len::Int,
ω,
) where {T}


[JuliaFormatter] reported by reviewdog 🐶

rng::AbstractRNG,
D::Int,
T_len::Int;
M⁻¹::AbstractVector=ones(D),


[JuliaFormatter] reported by reviewdog 🐶

config::HMCConfig{T},
s0::AbstractVector{T},
T_len::Int,
ω::Vector{<:HMCRandomInputs{T}},


[JuliaFormatter] reported by reviewdog 🐶

θ_new = hmc_transition(θ_curr, ω[t], config.logp, config.∇logp, config.ε, config.L; M⁻¹=M⁻¹)


[JuliaFormatter] reported by reviewdog 🐶

θ, ω_t, config.logp, config.∇logp, config.ε, config.L;
M⁻¹=M⁻¹, τ=T(0.01) # Small τ for near-hard gating


[JuliaFormatter] reported by reviewdog 🐶

r = s[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

function hessian_diagonal_fd(
∇logp,
θ::AbstractVector{T};
δ::T=T(1e-5),
) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function batch_jacobian_diagonals(f, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function rademacher_vector(rng::AbstractRNG, n::Int, ::Type{T}=Float64) where {T}


[JuliaFormatter] reported by reviewdog 🐶

f,
x::AbstractVector{T},
jvp_fn;
rng::AbstractRNG=default_rng(),
n_samples::Int=1,


[JuliaFormatter] reported by reviewdog 🐶

f,
xs::AbstractMatrix{T},
jvp_fn;
rng::AbstractRNG=default_rng(),
n_samples::Int=1,


[JuliaFormatter] reported by reviewdog 🐶

function hessian_diagonal(grad_log_p, x::AbstractVector{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function batch_hessian_diagonals(grad_log_p, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

rng::AbstractRNG,
D::Int,
T_len::Int,
::Type{T}=Float64,


[JuliaFormatter] reported by reviewdog 🐶

sample_mala_inputs(D::Int, T_len::Int, T::Type=Float64) =


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
∇logp_x::AbstractVector{T},
ε::T,
ξ::AbstractVector{T},


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
::AbstractVector{T},
logp,
∇logp,
ε::T,


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
ω::MALARandomInputs{T},
logp,
∇logp,
ε::T,


[JuliaFormatter] reported by reviewdog 🐶

logp,
∇logp,
ε::T,
s0::AbstractVector{T},
T_len::Int;
rng::AbstractRNG=default_rng(),


[JuliaFormatter] reported by reviewdog 🐶

function make_block_transforms(H_diag::Matrix{T}, ε::T, u_x::Matrix{T}, u_v::Matrix{T}) where {T}


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶

rng=MersenneTwister(12345)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_hmc(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-8, max_iters=200)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_hmc(config, s0, T_len, ω; method=FullDEER(), tol=1e-8, max_iters=100)


[JuliaFormatter] reported by reviewdog 🐶

logp, ∇logp, ε, L, s0, T_len;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

r_new = s_new[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

r_par = s_final[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

trajectory, acceptance_rate = sequential_hmc(logp, ∇logp, ε, L, s0, T_len; rng=rng, M⁻¹=M⁻¹)


[JuliaFormatter] reported by reviewdog 🐶

samples = trajectory[(burn_in+1):end, :]


[JuliaFormatter] reported by reviewdog 🐶

r_par = s_final[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

@test s_final[(D+1):end] r_seq atol = 1e-5


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=200)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=FullDEER(), tol=1e-10, max_iters=100)


[JuliaFormatter] reported by reviewdog 🐶

config, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶

rng=MersenneTwister(999)


[JuliaFormatter] reported by reviewdog 🐶

logp, ∇logp, ε, s0, T_len;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=500)


[JuliaFormatter] reported by reviewdog 🐶

t_block = Block2x2AffineTransform(rand(D), rand(D), rand(D), rand(D), rand(D), rand(D))

Comment on lines +243 to +245
- Good when L is large
- Use **block quasi-DEER** for this (Section 2.4)
- Convergence in ~L/2 iterations typically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- Good when L is large
- Use **block quasi-DEER** for this (Section 2.4)
- Convergence in ~L/2 iterations typically
- Good when L is large
- Use **block quasi-DEER** for this (Section 2.4)
- Convergence in ~L/2 iterations typically


Note: The update for coordinate d does **not** depend on `x_{t-1,d}`, which affects the Jacobian structure.

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
---
* * *

T: number of samples
method: :deer, :quasi_deer, :stochastic_quasi_deer, :block_quasi_deer
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

"""

D = length(s0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change


# Pre-sample all random inputs
ω = sample_random_inputs(T) # Shape: (T, ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

Comment on lines +420 to +421
- **ForwardDiff.jl** for full Jacobians (small D)
- **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- **ForwardDiff.jl** for full Jacobians (small D)
- **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER
- **ForwardDiff.jl** for full Jacobians (small D)
- **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER

Comment on lines +439 to +441
- Use `CUDA.jl` with `@cuda` kernels
- Or `KernelAbstractions.jl` for portable GPU code
- Ensure f is GPU-compatible (no scalar indexing, etc.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- Use `CUDA.jl` with `@cuda` kernels
- Or `KernelAbstractions.jl` for portable GPU code
- Ensure f is GPU-compatible (no scalar indexing, etc.)
- Use `CUDA.jl` with `@cuda` kernels
- Or `KernelAbstractions.jl` for portable GPU code
- Ensure f is GPU-compatible (no scalar indexing, etc.)

Comment on lines +444 to +445
- Implement custom CUDA kernel for efficiency
- Or use `Transducers.jl` with GPU support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- Implement custom CUDA kernel for efficiency
- Or use `Transducers.jl` with GPU support
- Implement custom CUDA kernel for efficiency
- Or use `Transducers.jl` with GPU support

Comment on lines +451 to +453
1. **New sampler type**: `ParallelHMC` or `DEER_HMC`
2. **Modify `Leapfrog` integrator**: Add parallel mode
3. **New `sample` method**: Returns full chain in one call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
1. **New sampler type**: `ParallelHMC` or `DEER_HMC`
2. **Modify `Leapfrog` integrator**: Add parallel mode
3. **New `sample` method**: Returns full chain in one call
1. **New sampler type**: `ParallelHMC` or `DEER_HMC`
2. **Modify `Leapfrog` integrator**: Add parallel mode
3. **New `sample` method**: Returns full chain in one call

method::Symbol # :deer, :quasi_deer, :stochastic_quasi_deer
tol::Float64
max_iters::Int
window_size::Union{Int, Nothing} # for sliding window
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
window_size::Union{Int, Nothing} # for sliding window
window_size::Union{Int,Nothing} # for sliding window

Implements parallel sampler types for AbstractMCMC-compatible interface:
- ParallelHMCSampler / ParallelHMC
- ParallelMALASampler / ParallelMALA
- ParallelSamplerState with trajectory and convergence info
- parallel_sample() for batch sampling with LogDensityProblems
- SimpleLogDensity wrapper for easy testing

75 new tests (420 total parallel tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

1. **Start with Phase 1.1** - Parallel scan is the foundation
2. **Then Phase 1.2-1.3** - Jacobian utilities and types
3. **Phase 2** - Core DEER loop (test with simple linear system first)
4. **Phase 3** - MALA (simpler than HMC, good first MCMC target)
5. **Phase 4** - HMC integration
6. **Phase 5** - AdvancedHMC.jl integration
7. **Phase 6-7** - Advanced features and thorough testing


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

| Date | Phase | Item | Status | Notes |
|------|-------|------|--------|-------|
| 2026-01-20 | - | Initial plan created | ✅ | Based on implementation note |
| 2026-01-20 | 1.1 | Parallel scan implementation | ✅ | All 3 variants (matrix, diagonal, block 2x2) |
| 2026-01-20 | 1.3 | Core types and interfaces | ✅ | Types for methods, settings, state |
| 2026-01-20 | 7.1 | Parallel scan unit tests | ✅ | 141 tests passing |
| 2026-01-20 | 1.2 | Jacobian computation utilities | ✅ | FD-based, Hutchinson estimator, JVP/VJP |
| 2026-01-20 | 7.1 | Jacobian utility tests | ✅ | 57 tests passing |
| 2026-01-20 | 1 | **Phase 1 Complete** | ✅ | All core infrastructure done |
| 2026-01-20 | 2 | DEER algorithm core | ✅ | Newton iteration, Full/Quasi/Stochastic DEER |
| 2026-01-20 | 7.1 | DEER algorithm tests | ✅ | 67 tests passing |
| 2026-01-20 | 2 | **Phase 2 Complete** | ✅ | Core DEER algorithm working |
| 2026-01-20 | 3 | MALA transition function | ✅ | Proposal, acceptance, soft gating |
| 2026-01-20 | 3 | Parallel MALA sampler | ✅ | Integrated with DEER framework |
| 2026-01-20 | 7.1 | MALA tests | ✅ | 31 tests passing |
| 2026-01-20 | 3 | **Phase 3 Complete** | ✅ | Parallel MALA working |
| 2026-01-29 | 4.1 | Approach A: Parallelize HMC steps | ✅ | parallel_hmc(), soft MH gating |
| 2026-01-29 | 4.2 | Approach B: Parallelize leapfrog | ✅ | parallel_leapfrog(), leapfrog_transition() |
| 2026-01-29 | 4.3 | Block Quasi-DEER for leapfrog | ✅ | 2×2 block structure, hessian_diagonal_fd |
| 2026-01-29 | 7.1 | HMC tests | ✅ | 49 tests passing |
| 2026-01-29 | 4 | **Phase 4 Complete** | ✅ | Parallel HMC working (345 total tests) |
| 2026-01-30 | 5.1 | New sampler types | ✅ | ParallelHMCSampler, ParallelMALASampler |
| 2026-01-30 | 5.2 | AbstractMCMC interface | ✅ | parallel_sample(), ParallelSamplerState |
| 2026-01-30 | 5.3 | Integration with components | ✅ | Metric types, LogDensityProblems |
| 2026-01-30 | 7.1 | AbstractMCMC tests | ✅ | 75 tests passing |
| 2026-01-30 | 5 | **Phase 5 Complete** | ✅ | AdvancedHMC.jl integration (420 total tests) |
---


[JuliaFormatter] reported by reviewdog 🐶

1. **AD Backend Choice**: ForwardDiff for full Jacobians, but what for JVPs in stochastic quasi-DEER? Zygote vs Enzyme?
- **Decision**: Use whichever works best; start with ForwardDiff
2. **GPU Priority**: Should GPU support be Phase 1 or deferred to Phase 6?
- **Decision**: Deferred to Phase 6; focus on CPU correctness first
3. **Integration Approach**: Should we modify existing HMC/Leapfrog types or create entirely new parallel variants?
- **Decision**: Be compatible with existing abstractions and API design as much as possible
4. **Testing Strategy**: What target distributions should we use for validation? Gaussian (analytical), banana, Neal's funnel?


[JuliaFormatter] reported by reviewdog 🐶

5. **API Design**: Return full chain always, or support incremental parallel blocks?


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

- Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- Reference implementation: https://github.com/lindermanlab/parallel-mcmc (JAX)
- DEER theory: Lim et al., Danieli et al.
- Hutchinson estimator: Hutchinson (1990), Bekas et al.


[JuliaFormatter] reported by reviewdog 🐶

const IS_SUBMODULE = parentmodule(@__MODULE__) !== Main &&
nameof(parentmodule(@__MODULE__)) === :AdvancedHMC


[JuliaFormatter] reported by reviewdog 🐶

import ..AdvancedHMC: AbstractMetric, DiagEuclideanMetric, UnitEuclideanMetric, DenseEuclideanMetric


[JuliaFormatter] reported by reviewdog 🐶

T<:AbstractFloat,
M<:AbstractParallelMethod,
I<:Union{Symbol,AbstractMetric},


[JuliaFormatter] reported by reviewdog 🐶

T<:AbstractFloat,
M<:AbstractParallelMethod,
I<:Union{Symbol,AbstractMetric},


[JuliaFormatter] reported by reviewdog 🐶

∇logp = function(x)


[JuliaFormatter] reported by reviewdog 🐶

config, s0, N, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

∇logp = function(x)


[JuliaFormatter] reported by reviewdog 🐶

config, s0, N, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

return state.trajectory[(burn_in+1):end, :]


[JuliaFormatter] reported by reviewdog 🐶

stat = (
iteration=i,
converged=state.converged,
)


[JuliaFormatter] reported by reviewdog 🐶

LogDensityProblems.capabilities(::Type{<:SimpleLogDensity}) = LogDensityProblems.LogDensityOrder{1}()


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method;
jacobian_fn=jacobian_fn, jvp_fn=jvp_fn, rng=rng


[JuliaFormatter] reported by reviewdog 🐶

function _deer_iteration(
f, s0, trajectory, ω, method::FullDEER;
jacobian_fn, jvp_fn, rng
)


[JuliaFormatter] reported by reviewdog 🐶

function _deer_iteration(
f, s0, trajectory, ω, method::QuasiDEER;
jacobian_fn, jvp_fn, rng
)


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method::StochasticQuasiDEER;
jacobian_fn, jvp_fn, rng


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω;
jvp_fn=jvp_fn, rng=rng, n_samples=method.n_samples


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector{T},
trajectory::AbstractMatrix{T},
ω;
jacobian_fn=jacobian_fd,


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector{T},
trajectory::AbstractMatrix{T},
ω;
jacobian_fn=jacobian_fd,


[JuliaFormatter] reported by reviewdog 🐶

J_diag[t, :] = hutchinson_diagonal(f_t, s_prev, jvp_fn; rng=rng, n_samples=n_samples)


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω, method::BlockQuasiDEER;
jacobian_fn, jvp_fn, rng


[JuliaFormatter] reported by reviewdog 🐶

f, s0, trajectory, ω;


[JuliaFormatter] reported by reviewdog 🐶

M⁻¹=method.M⁻¹


[JuliaFormatter] reported by reviewdog 🐶

r_prev = s_prev[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

f_r = f_vals[t, (D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

transforms = [Block2x2AffineTransform(
J_a[t, :], J_b[t, :], J_c[t, :], J_e[t, :],
u_x[t, :], u_v[t, :]
) for t in 1:T_len]


[JuliaFormatter] reported by reviewdog 🐶

r0 = s0[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

trajectory_new[:, (D+1):end] = trajectory_r


[JuliaFormatter] reported by reviewdog 🐶

prefix[t] = compose(transforms[t], prefix[t-1])


[JuliaFormatter] reported by reviewdog 🐶

f,
s0::AbstractVector,
T_len::Int,
ω,
settings::ParallelMCMCSettings;
kwargs...


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

function sequential_mcmc(
f,
s0::AbstractVector{T},
T_len::Int,
ω,
) where {T}


[JuliaFormatter] reported by reviewdog 🐶

rng::AbstractRNG,
D::Int,
T_len::Int;
M⁻¹::AbstractVector=ones(D),


[JuliaFormatter] reported by reviewdog 🐶

config::HMCConfig{T},
s0::AbstractVector{T},
T_len::Int,
ω::Vector{<:HMCRandomInputs{T}},


[JuliaFormatter] reported by reviewdog 🐶

θ_new = hmc_transition(θ_curr, ω[t], config.logp, config.∇logp, config.ε, config.L; M⁻¹=M⁻¹)


[JuliaFormatter] reported by reviewdog 🐶

θ, ω_t, config.logp, config.∇logp, config.ε, config.L;
M⁻¹=M⁻¹, τ=T(0.01) # Small τ for near-hard gating


[JuliaFormatter] reported by reviewdog 🐶

r = s[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

function hessian_diagonal_fd(
∇logp,
θ::AbstractVector{T};
δ::T=T(1e-5),
) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function batch_jacobian_diagonals(f, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function rademacher_vector(rng::AbstractRNG, n::Int, ::Type{T}=Float64) where {T}


[JuliaFormatter] reported by reviewdog 🐶

f,
x::AbstractVector{T},
jvp_fn;
rng::AbstractRNG=default_rng(),
n_samples::Int=1,


[JuliaFormatter] reported by reviewdog 🐶

f,
xs::AbstractMatrix{T},
jvp_fn;
rng::AbstractRNG=default_rng(),
n_samples::Int=1,


[JuliaFormatter] reported by reviewdog 🐶

function hessian_diagonal(grad_log_p, x::AbstractVector{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

function batch_hessian_diagonals(grad_log_p, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}


[JuliaFormatter] reported by reviewdog 🐶

rng::AbstractRNG,
D::Int,
T_len::Int,
::Type{T}=Float64,


[JuliaFormatter] reported by reviewdog 🐶

sample_mala_inputs(D::Int, T_len::Int, T::Type=Float64) =


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
∇logp_x::AbstractVector{T},
ε::T,
ξ::AbstractVector{T},


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
::AbstractVector{T},
logp,
∇logp,
ε::T,


[JuliaFormatter] reported by reviewdog 🐶

x::AbstractVector{T},
ω::MALARandomInputs{T},
logp,
∇logp,
ε::T,


[JuliaFormatter] reported by reviewdog 🐶

logp,
∇logp,
ε::T,
s0::AbstractVector{T},
T_len::Int;
rng::AbstractRNG=default_rng(),


[JuliaFormatter] reported by reviewdog 🐶

function make_block_transforms(H_diag::Matrix{T}, ε::T, u_x::Matrix{T}, u_v::Matrix{T}) where {T}


[JuliaFormatter] reported by reviewdog 🐶

@test LogDensityProblems.capabilities(typeof(ld)) == LogDensityProblems.LogDensityOrder{1}()


[JuliaFormatter] reported by reviewdog 🐶

sampler2 = ParallelHMCSampler(0.05, 20;
method=FullDEER(),
metric=:unit,
tol=1e-8,
max_iters=500


[JuliaFormatter] reported by reviewdog 🐶

sampler2 = ParallelMALASampler(0.05;
method=FullDEER(),
tol=1e-10
)


[JuliaFormatter] reported by reviewdog 🐶

0.85 # acceptance_rate


[JuliaFormatter] reported by reviewdog 🐶

state = ParallelSamplerState(
trajectory, true, 3, 1e-10, 0.9
)


[JuliaFormatter] reported by reviewdog 🐶

@test samples_after_burn == trajectory[(burn_in+1):end, :]


[JuliaFormatter] reported by reviewdog 🐶

state = ParallelSamplerState(
trajectory, true, 2, 1e-9, 0.95
)


[JuliaFormatter] reported by reviewdog 🐶

state_full = parallel_sample(MersenneTwister(101112), ld, sampler_full, N; initial_params=zeros(D))


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

f, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶

rng=MersenneTwister(12345)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_hmc(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-8, max_iters=200)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_hmc(config, s0, T_len, ω; method=FullDEER(), tol=1e-8, max_iters=100)


[JuliaFormatter] reported by reviewdog 🐶

logp, ∇logp, ε, L, s0, T_len;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

r_new = s_new[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

r_par = s_final[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

trajectory, acceptance_rate = sequential_hmc(logp, ∇logp, ε, L, s0, T_len; rng=rng, M⁻¹=M⁻¹)


[JuliaFormatter] reported by reviewdog 🐶

samples = trajectory[(burn_in+1):end, :]


[JuliaFormatter] reported by reviewdog 🐶

r_par = s_final[(D+1):end]


[JuliaFormatter] reported by reviewdog 🐶

@test s_final[(D+1):end] r_seq atol = 1e-5


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=200)


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=FullDEER(), tol=1e-10, max_iters=100)


[JuliaFormatter] reported by reviewdog 🐶

config, s0, T_len, ω;


[JuliaFormatter] reported by reviewdog 🐶

rng=MersenneTwister(999)


[JuliaFormatter] reported by reviewdog 🐶

logp, ∇logp, ε, s0, T_len;


[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=500)


[JuliaFormatter] reported by reviewdog 🐶

t_block = Block2x2AffineTransform(rand(D), rand(D), rand(D), rand(D), rand(D), rand(D))

Comment on lines +465 to +469
rng::AbstractRNG,
model,
sampler::ParallelHMC,
n_samples::Int;
initial_θ = nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
rng::AbstractRNG,
model,
sampler::ParallelHMC,
n_samples::Int;
initial_θ = nothing
rng::AbstractRNG, model, sampler::ParallelHMC, n_samples::Int; initial_θ=nothing

end
```

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
---
* * *

Comment on lines +479 to +485
| Scenario | Recommended Method |
|----------|-------------------|
| Low-D, need fastest convergence | DEER (full Jacobian) |
| High-D, memory-constrained | Stochastic quasi-DEER |
| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |
| Very long chains | Add sliding window |
| Approximate samples OK | Early stopping |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
| Scenario | Recommended Method |
|----------|-------------------|
| Low-D, need fastest convergence | DEER (full Jacobian) |
| High-D, memory-constrained | Stochastic quasi-DEER |
| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |
| Very long chains | Add sliding window |
| Approximate samples OK | Early stopping |
| Scenario | Recommended Method |
|:--------------------------------- |:------------------------------- |
| Low-D, need fastest convergence | DEER (full Jacobian) |
| High-D, memory-constrained | Stochastic quasi-DEER |
| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |
| Very long chains | Add sliding window |
| Approximate samples OK | Early stopping |

| Approximate samples OK | Early stopping |
| MALA with known Hessian structure | Quasi-DEER with preconditioning |

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
---
* * *

Comment on lines +492 to +496
- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- **Code**: https://github.com/lindermanlab/parallel-mcmc
- **DEER theory**: Lim et al. [7], Danieli et al. [6]
- **Quasi-DEER**: Gonzalez et al. [8]
- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- **Code**: https://github.com/lindermanlab/parallel-mcmc
- **DEER theory**: Lim et al. [7], Danieli et al. [6]
- **Quasi-DEER**: Gonzalez et al. [8]
- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]
- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
- **Code**: https://github.com/lindermanlab/parallel-mcmc
- **DEER theory**: Lim et al. [7], Danieli et al. [6]
- **Quasi-DEER**: Gonzalez et al. [8]
- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]

Comment on lines +183 to +190
- [ ] **7.3 Statistical Validation**
- [ ] Samples match target distribution (known distributions)
- [ ] Compare sample quality metrics (ESS, R-hat) to sequential

- [ ] **7.4 Performance Benchmarks**
- [ ] Wall-clock time vs sequential
- [ ] Scaling with chain length T
- [ ] Memory usage profiling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- [ ] **7.3 Statistical Validation**
- [ ] Samples match target distribution (known distributions)
- [ ] Compare sample quality metrics (ESS, R-hat) to sequential
- [ ] **7.4 Performance Benchmarks**
- [ ] Wall-clock time vs sequential
- [ ] Scaling with chain length T
- [ ] Memory usage profiling
> Ensure correctness and performance

- [ ] Scaling with chain length T
- [ ] Memory usage profiling

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
---
- [ ] **7.1 Unit Tests**
+ [ ] Parallel scan correctness (compare to sequential)
+ [ ] Jacobian computation accuracy
+ [ ] Hutchinson estimator convergence
- [ ] **7.2 Integration Tests**
+ [ ] DEER converges to sequential MALA trace
+ [ ] DEER converges to sequential HMC trace
+ [ ] Quasi-DEER matches DEER (eventually)
- [ ] **7.3 Statistical Validation**
+ [ ] Samples match target distribution (known distributions)
+ [ ] Compare sample quality metrics (ESS, R-hat) to sequential
- [ ] **7.4 Performance Benchmarks**
+ [ ] Wall-clock time vs sequential
+ [ ] Scaling with chain length T
+ [ ] Memory usage profiling
* * *

│ └── test_abstractmcmc.jl # ✅ 75 tests passing
```

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
---
* * *

Comment on lines +222 to +223
- ForwardDiff.jl - Jacobian computation
- LinearAlgebra - Matrix operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- ForwardDiff.jl - Jacobian computation
- LinearAlgebra - Matrix operations
- ForwardDiff.jl - Jacobian computation
- LinearAlgebra - Matrix operations

Comment on lines +226 to +228
- CUDA.jl - GPU acceleration
- Zygote.jl / Enzyme.jl - Alternative AD for JVPs
- ChainRulesCore.jl - Custom rrules for stop-gradient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
- CUDA.jl - GPU acceleration
- Zygote.jl / Enzyme.jl - Alternative AD for JVPs
- ChainRulesCore.jl - Custom rrules for stop-gradient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant