[WIP] Parallel MCMC via DEER algorithm by xukai92 · Pull Request #487 · TuringLang/AdvancedHMC.jl

xukai92 · 2026-01-21T04:50:21Z

Parallel MCMC Implementation (DEER Algorithm)

Implements parallel MCMC sampling based on:

Parallelizing MCMC Across the Sequence Length
https://arxiv.org/abs/2508.18413

Status: Phases 1-5 Complete (420 tests passing)

Completed Phases

Phase 1: Core Infrastructure ✅

Parallel scan with O(log T) complexity
Affine transforms (Matrix, Diagonal, Block 2×2)
Jacobian computation utilities (FD, Hutchinson estimator)

Phase 2: DEER Algorithm Core ✅

Newton iteration framework
Full DEER (complete Jacobians)
Quasi-DEER (diagonal approximation)
Stochastic Quasi-DEER (Hutchinson estimator)

Phase 3: MALA Integration ✅

Parallel MALA sampler
Soft gating for differentiability

Phase 4: HMC Integration ✅

Approach A: Parallelize across HMC steps
Approach B: Parallelize leapfrog integration
Block Quasi-DEER for 2×2 leapfrog Jacobians

Phase 5: AdvancedHMC.jl Integration ✅

ParallelHMCSampler / ParallelHMC
ParallelMALASampler / ParallelMALA
parallel_sample() with LogDensityProblems interface
ParallelSamplerState with trajectory and convergence info

Remaining Phases

Phase 6: Advanced features (sliding window, GPU support)
Phase 7: Testing and validation

Usage Example

using AdvancedHMC.Parallel

# Define target distribution
logp = SimpleLogDensity(2, x -> -0.5 * sum(x.^2), x -> -x)

# Create parallel sampler
sampler = ParallelHMC(0.1, 10; method=QuasiDEER())

# Sample
state = parallel_sample(logp, sampler, 1000)
samples = get_samples(state, 100)  # discard burn-in

Test Summary

Component	Tests
Parallel Scan	141
Jacobian Utilities	57
DEER Algorithm	67
MALA	31
HMC	49
AbstractMCMC	75
Total	420

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement the foundation for parallelizing MCMC across sequence length based on Zoltowski et al. "Parallelizing MCMC Across the Sequence Length". Core components: - Parallel scan solving linear recurrences s_t = J_t * s_{t-1} + u_t - Three affine transform types for different DEER variants: - MatrixAffineTransform (full Jacobian for DEER) - DiagonalAffineTransform (diagonal approx for Quasi-DEER) - Block2x2AffineTransform (2x2 blocks for leapfrog/Block Quasi-DEER) - Associative composition operator enabling O(log T) parallel time - Method types: FullDEER, QuasiDEER, StochasticQuasiDEER, BlockQuasiDEER Files added: - src/parallel/Parallel.jl - Module definition - src/parallel/types.jl - Type definitions - src/parallel/scan.jl - Parallel scan implementation - test/parallel/test_scan.jl - 141 tests - docs/parallel_mcmc_implementation_plan.md - Implementation tracker - docs/parallel_mcmc_implementation_note.md - Algorithm reference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement utilities for computing Jacobians and their approximations, completing Phase 1 (Core Infrastructure) of the parallel MCMC implementation. New utilities in src/parallel/jacobian.jl: - jacobian_fd: Full Jacobian via finite differences - jacobian_diagonal_full: Extract Jacobian diagonal - jvp_fd / vjp_fd: Jacobian-vector and vector-Jacobian products - rademacher_vector: Generate ±1 random vectors - hutchinson_diagonal: Stochastic diagonal estimator using Hutchinson's method - hessian_diagonal: Hessian diagonal for leapfrog integration - Batch versions for all operations (batch_jacobians, batch_hutchinson_diagonals, etc.) The implementation is AD-agnostic with finite difference fallbacks. Users can plug in ForwardDiff/Zygote/Enzyme-based implementations. Tests: 57 new tests for Jacobian utilities (all passing) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement the main DEER (Doubly Efficient Estimation via Recursion) algorithm that solves MCMC chains in parallel via Newton's method. Core algorithm in src/parallel/deer.jl: - deer(): Main entry point with method dispatch - DEERResult: Result type with trajectory, convergence info, residual history - sequential_mcmc(): Reference implementation for testing Three DEER variants implemented: - FullDEER: Full D×D Jacobian matrices, O(T*D²) memory - QuasiDEER: Diagonal Jacobian approximation, O(T*D) memory - StochasticQuasiDEER: Hutchinson estimator for diagonal, only needs JVPs The algorithm: 1. Initialize trajectory (forward pass or zeros) 2. Newton iteration loop: - Evaluate f_t at all timesteps (parallelizable) - Compute Jacobians (method-dependent) - Compute offsets u_t = f_t(s_{t-1}) - J_t * s_{t-1} - Solve linear system via parallel scan - Check convergence (max absolute difference) 3. Return DEERResult with trajectory and diagnostics Tests: 67 new tests covering linear/nonlinear systems, all variants, convergence tracking, edge cases, and numerical stability. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement parallelized Metropolis-Adjusted Langevin Algorithm using DEER. MALA components in src/parallel/mala.jl: - MALARandomInputs: Pre-sampled (ξ, u) pairs for proposals and accept-reject - MALAConfig: Configuration struct with step size, log density, gradient - mala_proposal(): Langevin proposal x̃ = x + ε∇log p(x) + √(2ε)ξ - mala_log_acceptance_ratio(): MH ratio with forward/backward densities - soft_gate(): Differentiable accept-reject using sigmoid + straight-through - mala_transition(): Complete MALA step combining proposal and accept-reject - parallel_mala(): Run full MALA chain in parallel via DEER - sequential_mala(): Reference implementation for testing The stop-gradient trick enables computing Jacobians through the non-differentiable accept-reject step by using a soft sigmoid gate in the backward pass while keeping hard decisions in the forward pass. Tests: 31 new tests covering proposal mechanics, sequential/parallel equivalence, various target distributions, acceptance behavior, and sample statistics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 243 to 245 in 4593dbc

    
           - Good when L is large 
        
           - Use **block quasi-DEER** for this (Section 2.4) 
        
           - Convergence in ~L/2 iterations typically

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 259 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 271 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 273 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 276 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 280 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 289 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 291 in 4593dbc

s_prev = (t == 1) ? s0 : s[t-1, :]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 293 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 302 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 306 in 4593dbc

s_prev = (t == 1) ? s0 : s[t-1, :]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 313 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 320 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 325 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 328 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 334 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 341 in 4593dbc

```

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 357 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 372 in 4593dbc

```

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 382 to 385 in 4593dbc

    
           1. Initialize window at the start of the sequence 
        
           2. Apply DEER update only within the window 
        
           3. Shift window forward to the first non-converged timestep 
        
           4. Repeat until the entire sequence converges

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 391 to 392 in 4593dbc

    
           - Run fewer Newton iterations for faster but approximate samples 
        
           - The paper shows early-stopped parallel MALA can achieve comparable MMD to fully converged samples

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 397 to 398 in 4593dbc

    
           - Smaller step sizes ε → faster DEER convergence 
        
           - Standard tuning (e.g., 80% acceptance rate) works well

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 401 to 403 in 4593dbc

    
           - Larger step sizes → more iterations to converge 
        
           - Block quasi-DEER: ~2× fewer iterations than diagonal quasi-DEER 
        
           - Parallelizing leapfrog is most beneficial when L (number of leapfrog steps) is large

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 408 to 409 in 4593dbc

    
           - Traditional: Parallelize across independent chains (batch size B) 
        
           - New: Also parallelize across chain length T

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 413 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 420 to 421 in 4593dbc

    
           - **ForwardDiff.jl** for full Jacobians (small D) 
        
           - **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 439 to 441 in 4593dbc

    
           - Use `CUDA.jl` with `@cuda` kernels 
        
           - Or `KernelAbstractions.jl` for portable GPU code 
        
           - Ensure f is GPU-compatible (no scalar indexing, etc.)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 444 to 445 in 4593dbc

    
           - Implement custom CUDA kernel for efficiency 
        
           - Or use `Transducers.jl` with GPU support

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 451 to 453 in 4593dbc

    
           1. **New sampler type**: `ParallelHMC` or `DEER_HMC` 
        
           2. **Modify `Leapfrog` integrator**: Add parallel mode 
        
           3. **New `sample` method**: Returns full chain in one call

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 461 in 4593dbc

window_size::Union{Int, Nothing} # for sliding window

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 465 to 469 in 4593dbc

    
           rng::AbstractRNG, 
        
           model, 
        
           sampler::ParallelHMC, 
        
           n_samples::Int; 
        
           initial_θ = nothing

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 475 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 479 to 485 in 4593dbc

    
           | Scenario | Recommended Method | 
        
           |----------|-------------------| 
        
           | Low-D, need fastest convergence | DEER (full Jacobian) | 
        
           | High-D, memory-constrained | Stochastic quasi-DEER | 
        
           | HMC with many leapfrog steps | Block quasi-DEER on leapfrog | 
        
           | Very long chains | Add sliding window | 
        
           | Approximate samples OK | Early stopping |

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 488 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 492 to 496 in 4593dbc

    
           - **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025 
        
           - **Code**: https://github.com/lindermanlab/parallel-mcmc 
        
           - **DEER theory**: Lim et al. [7], Danieli et al. [6] 
        
           - **Quasi-DEER**: Gonzalez et al. [8] 
        
           - **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 498 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 7 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 15 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 20 in 4593dbc

> Foundation components needed by all DEER variants

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 22 to 44 in 4593dbc

    
           - [x] **1.1 Parallel Scan Implementation** ✅ 
        
             - [x] Associative operator for affine transforms: `(J₂, u₂) ⊕ (J₁, u₁) = (J₂*J₁, J₂*u₁ + u₂)` 
        
             - [x] Full matrix version (for DEER) - `MatrixAffineTransform` 
        
             - [x] Diagonal version (for Quasi-DEER) - `DiagonalAffineTransform` 
        
             - [x] Block-diagonal version (for Block Quasi-DEER / leapfrog) - `Block2x2AffineTransform` 
        
             - [x] CPU implementation using `Base.accumulate` 
        
             - [ ] GPU implementation (CUDA.jl) - deferred to Phase 6 
        
           - [x] **1.2 Jacobian Computation Utilities** ✅ 
        
             - [x] Full Jacobian computation (finite differences, AD-agnostic interface) 
        
             - [x] Diagonal extraction from Jacobian 
        
             - [x] Stochastic diagonal estimation (Hutchinson's method with Rademacher vectors) 
        
             - [x] JVP (Jacobian-vector product) wrapper 
        
             - [x] VJP (Vector-Jacobian product) wrapper 
        
             - [x] Batch versions for computing at multiple points 
        
             - [x] Hessian diagonal utilities (for leapfrog) 
        
           - [x] **1.3 Core Types and Interfaces** ✅ 
        
             - [x] `AbstractParallelMethod` with subtypes: `FullDEER`, `QuasiDEER`, `StochasticQuasiDEER`, `BlockQuasiDEER` 
        
             - [x] `ParallelMCMCState` to hold trajectory and convergence info 
        
             - [x] `ParallelMCMCSettings` configuration struct 
        
           ---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 47 to 60 in 4593dbc

    
           > The main Newton iteration loop 
        
           - [x] **2.1 Newton Iteration Framework** ✅ 
        
             - [x] Main loop structure with convergence checking 
        
             - [x] Parallel evaluation of f_t at all timesteps 
        
             - [x] Parallel computation of Jacobians (method-dependent) 
        
             - [x] Compute inputs u_t = f_t(s_{t-1}) - J_t * s_{t-1} 
        
             - [x] Solve linear system via parallel scan 
        
             - [x] Convergence criterion (max absolute difference) 
        
             - [x] DEERResult type with trajectory, convergence info, residual history 
        
           - [x] **2.2 Full DEER Implementation** ✅ 
        
             - [x] Store and manipulate T × D × D Jacobian matrices 
        
             - [x] Full matrix parallel scan

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 62 to 69 in 4593dbc

    
           - [x] **2.3 Quasi-DEER Implementation** ✅ 
        
             - [x] Diagonal Jacobian storage (T × D) 
        
             - [x] Elementwise parallel scan 
        
           - [x] **2.4 Stochastic Quasi-DEER Implementation** ✅ 
        
             - [x] Hutchinson estimator with configurable number of samples 
        
             - [x] Rademacher vector generation (from Phase 1.2) 
        
             - [x] JVP-based diagonal estimation

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 71 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 74 to 80 in 4593dbc

    
           > Parallelized Metropolis-Adjusted Langevin Algorithm 
        
           - [x] **3.1 MALA Transition Function** ✅ 
        
             - [x] Proposal: x̃ = x + ε∇log p(x) + √(2ε)ξ 
        
             - [x] Acceptance ratio computation (forward and backward proposal densities) 
        
             - [x] Stop-gradient trick for differentiable accept-reject 
        
             - [x] Soft gating with sigmoid and straight-through estimator

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 82 to 90 in 4593dbc

    
           - [x] **3.2 Parallel MALA Sampler** ✅ 
        
             - [x] MALARandomInputs type for pre-sampled (ξ, u) pairs 
        
             - [x] sample_mala_inputs() for batch sampling 
        
             - [x] parallel_mala() integrating with DEER framework 
        
             - [x] sequential_mala() for reference/testing 
        
             - [x] Convenience API with automatic input sampling 
        
           - [ ] **3.3 MALA-specific Optimizations** (deferred) 
        
             - [ ] Preconditioning with Hessian eigendecomposition (optional)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 92 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 95 in 4593dbc

> Two approaches for parallelizing HMC

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 97 to 122 in 4593dbc

    
           - [ ] **4.1 Approach A: Parallelize Across HMC Steps** 
        
             - [ ] Treat full HMC step as transition function f_t 
        
             - [ ] Sequential leapfrog within each step 
        
             - [ ] Parallel Newton across T HMC samples 
        
             - [ ] Good when T >> L (many samples, few leapfrog steps) 
        
           - [ ] **4.2 Approach B: Parallelize Leapfrog Integration** 
        
             - [ ] State is s = [x, v] (position + momentum) 
        
             - [ ] Apply DEER to L leapfrog steps within each HMC step 
        
             - [ ] Block Quasi-DEER with 2×2 block structure per dimension 
        
             - [ ] Good when L is large 
        
           - [ ] **4.3 Block Quasi-DEER for Leapfrog** 
        
             - [ ] Block Jacobian structure: 
        
               ``` 
        
               J = [ I_D          ε*I_D        ] 
        
                   [ ε*diag(H)    I_D + ε²*diag(H) ] 
        
               ``` 
        
             - [ ] Efficient 2×2 block scan per dimension 
        
             - [ ] Hessian diagonal computation 
        
           - [ ] **4.4 Accept-Reject for HMC** 
        
             - [ ] Stop-gradient trick (same as MALA) 
        
             - [ ] Momentum refresh handling 
        
           ---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 125 in 4593dbc

> Integrate with existing library architecture

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 127 to 140 in 4593dbc

    
           - [ ] **5.1 New Sampler Types** 
        
             - [ ] `ParallelHMC <: AbstractMCMCSampler` 
        
             - [ ] `ParallelMALA <: AbstractMCMCSampler` 
        
           - [ ] **5.2 AbstractMCMC Interface** 
        
             - [ ] Implement `AbstractMCMC.step` (or batch variant) 
        
             - [ ] Implement `AbstractMCMC.sample` returning full chain 
        
             - [ ] Handle RNG properly for reproducibility 
        
           - [ ] **5.3 Integration with Existing Components** 
        
             - [ ] Use existing `Hamiltonian` type 
        
             - [ ] Use existing `Metric` types 
        
             - [ ] Use existing `PhasePoint` structure 
        
             - [ ] Compatibility with existing gradient computation

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 142 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 145 to 159 in 4593dbc

    
           > Optimizations and extensions 
        
           - [ ] **6.1 Sliding Window** 
        
             - [ ] For memory-constrained high-dimensional problems 
        
             - [ ] Window initialization and shifting logic 
        
             - [ ] Convergence tracking per window 
        
           - [ ] **6.2 Early Stopping** 
        
             - [ ] Return approximate samples before full convergence 
        
             - [ ] Quality-vs-time tradeoff parameter 
        
           - [ ] **6.3 GPU Support** 
        
             - [ ] CUDA.jl parallel scan kernel 
        
             - [ ] GPU-compatible transition functions 
        
             - [ ] Batch Jacobian computation on GPU

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 161 to 164 in 4593dbc

    
           - [ ] **6.4 Diagnostics and Monitoring** 
        
             - [ ] Iteration count tracking 
        
             - [ ] Convergence history 
        
             - [ ] Per-timestep residual monitoring

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 166 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 169 to 179 in 4593dbc

    
           > Ensure correctness and performance 
        
           - [ ] **7.1 Unit Tests** 
        
             - [ ] Parallel scan correctness (compare to sequential) 
        
             - [ ] Jacobian computation accuracy 
        
             - [ ] Hutchinson estimator convergence 
        
           - [ ] **7.2 Integration Tests** 
        
             - [ ] DEER converges to sequential MALA trace 
        
             - [ ] DEER converges to sequential HMC trace 
        
             - [ ] Quasi-DEER matches DEER (eventually)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 181 to 188 in 4593dbc

    
           - [ ] **7.3 Statistical Validation** 
        
             - [ ] Samples match target distribution (known distributions) 
        
             - [ ] Compare sample quality metrics (ESS, R-hat) to sequential 
        
           - [ ] **7.4 Performance Benchmarks** 
        
             - [ ] Wall-clock time vs sequential 
        
             - [ ] Scaling with chain length T 
        
             - [ ] Memory usage profiling

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 190 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 213 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 218 to 219 in 4593dbc

    
           - ForwardDiff.jl - Jacobian computation 
        
           - LinearAlgebra - Matrix operations

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 222 to 224 in 4593dbc

    
           - CUDA.jl - GPU acceleration 
        
           - Zygote.jl / Enzyme.jl - Alternative AD for JVPs 
        
           - ChainRulesCore.jl - Custom rrules for stop-gradient

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 226 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 230 to 236 in 4593dbc

    
           1. **Start with Phase 1.1** - Parallel scan is the foundation 
        
           2. **Then Phase 1.2-1.3** - Jacobian utilities and types 
        
           3. **Phase 2** - Core DEER loop (test with simple linear system first) 
        
           4. **Phase 3** - MALA (simpler than HMC, good first MCMC target) 
        
           5. **Phase 4** - HMC integration 
        
           6. **Phase 5** - AdvancedHMC.jl integration 
        
           7. **Phase 6-7** - Advanced features and thorough testing

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 238 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 242 to 259 in 4593dbc

    
           | Date | Phase | Item | Status | Notes | 
        
           |------|-------|------|--------|-------| 
        
           | 2026-01-20 | - | Initial plan created | ✅ | Based on implementation note | 
        
           | 2026-01-20 | 1.1 | Parallel scan implementation | ✅ | All 3 variants (matrix, diagonal, block 2x2) | 
        
           | 2026-01-20 | 1.3 | Core types and interfaces | ✅ | Types for methods, settings, state | 
        
           | 2026-01-20 | 7.1 | Parallel scan unit tests | ✅ | 141 tests passing | 
        
           | 2026-01-20 | 1.2 | Jacobian computation utilities | ✅ | FD-based, Hutchinson estimator, JVP/VJP | 
        
           | 2026-01-20 | 7.1 | Jacobian utility tests | ✅ | 57 tests passing | 
        
           | 2026-01-20 | 1 | **Phase 1 Complete** | ✅ | All core infrastructure done | 
        
           | 2026-01-20 | 2 | DEER algorithm core | ✅ | Newton iteration, Full/Quasi/Stochastic DEER | 
        
           | 2026-01-20 | 7.1 | DEER algorithm tests | ✅ | 67 tests passing | 
        
           | 2026-01-20 | 2 | **Phase 2 Complete** | ✅ | Core DEER algorithm working | 
        
           | 2026-01-20 | 3 | MALA transition function | ✅ | Proposal, acceptance, soft gating | 
        
           | 2026-01-20 | 3 | Parallel MALA sampler | ✅ | Integrated with DEER framework | 
        
           | 2026-01-20 | 7.1 | MALA tests | ✅ | 31 tests passing | 
        
           | 2026-01-20 | 3 | **Phase 3 Complete** | ✅ | Parallel MALA working | 
        
           ---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 263 to 272 in 4593dbc

    
           1. **AD Backend Choice**: ForwardDiff for full Jacobians, but what for JVPs in stochastic quasi-DEER? Zygote vs Enzyme? 
        
              - **Decision**: Use whichever works best; start with ForwardDiff 
        
           2. **GPU Priority**: Should GPU support be Phase 1 or deferred to Phase 6? 
        
              - **Decision**: Deferred to Phase 6; focus on CPU correctness first 
        
           3. **Integration Approach**: Should we modify existing HMC/Leapfrog types or create entirely new parallel variants? 
        
              - **Decision**: Be compatible with existing abstractions and API design as much as possible 
        
           4. **Testing Strategy**: What target distributions should we use for validation? Gaussian (analytical), banana, Neal's funnel?

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 274 in 4593dbc

    
           5. **API Design**: Return full chain always, or support incremental parallel blocks?

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 276 in 4593dbc

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 280 to 283 in 4593dbc

    
           - Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025 
        
           - Reference implementation: https://github.com/lindermanlab/parallel-mcmc (JAX) 
        
           - DEER theory: Lim et al., Danieli et al. 
        
           - Hutchinson estimator: Hutchinson (1990), Bekas et al.

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 109 to 110 in 4593dbc

    
           f, s0, trajectory, ω, method; 
        
           jacobian_fn=jacobian_fn, jvp_fn=jvp_fn, rng=rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 142 to 145 in 4593dbc

    
           function _deer_iteration( 
        
               f, s0, trajectory, ω, method::FullDEER; 
        
               jacobian_fn, jvp_fn, rng 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 149 to 152 in 4593dbc

    
           function _deer_iteration( 
        
               f, s0, trajectory, ω, method::QuasiDEER; 
        
               jacobian_fn, jvp_fn, rng 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 157 to 158 in 4593dbc

    
           f, s0, trajectory, ω, method::StochasticQuasiDEER; 
        
           jacobian_fn, jvp_fn, rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 161 to 162 in 4593dbc

    
           f, s0, trajectory, ω; 
        
           jvp_fn=jvp_fn, rng=rng, n_samples=method.n_samples

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 179 to 183 in 4593dbc

    
           f, 
        
           s0::AbstractVector{T}, 
        
           trajectory::AbstractMatrix{T}, 
        
           ω; 
        
           jacobian_fn=jacobian_fd,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 230 to 234 in 4593dbc

    
           f, 
        
           s0::AbstractVector{T}, 
        
           trajectory::AbstractMatrix{T}, 
        
           ω; 
        
           jacobian_fn=jacobian_fd,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 307 in 4593dbc

    
           J_diag[t, :] = hutchinson_diagonal(f_t, s_prev, jvp_fn; rng=rng, n_samples=n_samples)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 340 to 345 in 4593dbc

    
           f, 
        
           s0::AbstractVector, 
        
           T_len::Int, 
        
           ω, 
        
           settings::ParallelMCMCSettings; 
        
           kwargs...

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 348 in 4593dbc

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 352 in 4593dbc

kwargs...

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 363 to 368 in 4593dbc

    
           function sequential_mcmc( 
        
               f, 
        
               s0::AbstractVector{T}, 
        
               T_len::Int, 
        
               ω, 
        
           ) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 118 in 4593dbc

    
           function batch_jacobian_diagonals(f, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 210 in 4593dbc

    
           function rademacher_vector(rng::AbstractRNG, n::Int, ::Type{T}=Float64) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Lines 241 to 245 in 4593dbc

    
           f, 
        
           x::AbstractVector{T}, 
        
           jvp_fn; 
        
           rng::AbstractRNG=default_rng(), 
        
           n_samples::Int=1,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Lines 275 to 279 in 4593dbc

    
           f, 
        
           xs::AbstractMatrix{T}, 
        
           jvp_fn; 
        
           rng::AbstractRNG=default_rng(), 
        
           n_samples::Int=1,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 311 in 4593dbc

    
           function hessian_diagonal(grad_log_p, x::AbstractVector{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 327 in 4593dbc

    
           function batch_hessian_diagonals(grad_log_p, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 40 to 43 in 4593dbc

    
           rng::AbstractRNG, 
        
           D::Int, 
        
           T_len::Int, 
        
           ::Type{T}=Float64,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Line 48 in 4593dbc

sample_mala_inputs(D::Int, T_len::Int, T::Type=Float64) =

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Line 50 in 4593dbc

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 70 to 73 in 4593dbc

    
           x::AbstractVector{T}, 
        
           ∇logp_x::AbstractVector{T}, 
        
           ε::T, 
        
           ξ::AbstractVector{T},

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 102 to 106 in 4593dbc

    
           x::AbstractVector{T}, 
        
           x̃::AbstractVector{T}, 
        
           logp, 
        
           ∇logp, 
        
           ε::T,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 197 to 201 in 4593dbc

    
           x::AbstractVector{T}, 
        
           ω::MALARandomInputs{T}, 
        
           logp, 
        
           ∇logp, 
        
           ε::T,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 381 to 386 in 4593dbc

    
           logp, 
        
           ∇logp, 
        
           ε::T, 
        
           s0::AbstractVector{T}, 
        
           T_len::Int; 
        
           rng::AbstractRNG=default_rng(),

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/scan.jl

Line 262 in 4593dbc

    
           function make_block_transforms(H_diag::Matrix{T}, ε::T, u_x::Matrix{T}, u_v::Matrix{T}) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 138 in 4593dbc

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 142 in 4593dbc

rng=rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 238 in 4593dbc

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 241 in 4593dbc

rng=MersenneTwister(12345)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 77 in 4593dbc

    
           result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=200)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 100 in 4593dbc

    
           result = parallel_mala(config, s0, T_len, ω; method=FullDEER(), tol=1e-10, max_iters=100)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 125 in 4593dbc

config, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 129 in 4593dbc

rng=MersenneTwister(999)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 204 in 4593dbc

logp, ∇logp, ε, s0, T_len;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 207 in 4593dbc

tol=1e-10

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 274 in 4593dbc

    
                   result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=500)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_scan.jl

Line 284 in 4593dbc

    
           t_block = Block2x2AffineTransform(rand(D), rand(D), rand(D), rand(D), rand(D), rand(D))

github-actions · 2026-01-21T04:51:41Z

CLAUDE.md

+## Common Commands
+
+### Testing
+```bash


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

```bash

```bash

github-actions · 2026-01-21T04:51:41Z

CLAUDE.md

+Tests use ReTest.jl. Test files are in `test/` and cover: metric, hamiltonian, integrator, trajectory, adaptation, sampler, abstractmcmc, mcmcchains, constructors.
+
+### Code Formatting
+```bash


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

```bash

```bash

github-actions · 2026-01-21T04:51:41Z

CLAUDE.md

+```bash
+julia -e 'using JuliaFormatter; format(".")'
+```
+Uses Blue style (see `.JuliaFormatter.toml`).


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

Uses Blue style (see `.JuliaFormatter.toml`).

Uses Blue style (see `.JuliaFormatter.toml`).

github-actions · 2026-01-21T04:51:41Z

CLAUDE.md

+- `src/metric.jl` - Metric types: `UnitEuclideanMetric`, `DiagEuclideanMetric`, `DenseEuclideanMetric`
+- `src/hamiltonian.jl` - `Hamiltonian`, `PhasePoint`, `DualValue` (caches value+gradient)
+- `src/integrator.jl` - `Leapfrog`, `JitteredLeapfrog`, `TemperedLeapfrog`
+- `src/trajectory.jl` - Termination criteria (`ClassicNoUTurn`, `GeneralisedNoUTurn`, `FixedNSteps`) and trajectory samplers (`EndPointTS`, `SliceTS`, `MultinomialTS`)
+- `src/adaptation/` - Step size (`NesterovDualAveraging`), mass matrix (`WelfordVar`, `WelfordCov`, `NutpieVar`), composite (`StanHMCAdaptor`, `NaiveHMCAdaptor`)
+- `src/constructors.jl` - Convenience constructors: `NUTS`, `HMC`, `HMCDA`
+- `src/abstractmcmc.jl` - AbstractMCMC.jl interface implementation
+- `src/riemannian/` - Riemannian manifold support with `GeneralizedLeapfrog`


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- `src/metric.jl` - Metric types: `UnitEuclideanMetric`, `DiagEuclideanMetric`, `DenseEuclideanMetric`

- `src/hamiltonian.jl` - `Hamiltonian`, `PhasePoint`, `DualValue` (caches value+gradient)

- `src/integrator.jl` - `Leapfrog`, `JitteredLeapfrog`, `TemperedLeapfrog`

- `src/trajectory.jl` - Termination criteria (`ClassicNoUTurn`, `GeneralisedNoUTurn`, `FixedNSteps`) and trajectory samplers (`EndPointTS`, `SliceTS`, `MultinomialTS`)

- `src/adaptation/` - Step size (`NesterovDualAveraging`), mass matrix (`WelfordVar`, `WelfordCov`, `NutpieVar`), composite (`StanHMCAdaptor`, `NaiveHMCAdaptor`)

- `src/constructors.jl` - Convenience constructors: `NUTS`, `HMC`, `HMCDA`

- `src/abstractmcmc.jl` - AbstractMCMC.jl interface implementation

- `src/riemannian/` - Riemannian manifold support with `GeneralizedLeapfrog`

- `src/metric.jl` - Metric types: `UnitEuclideanMetric`, `DiagEuclideanMetric`, `DenseEuclideanMetric`

- `src/hamiltonian.jl` - `Hamiltonian`, `PhasePoint`, `DualValue` (caches value+gradient)

- `src/integrator.jl` - `Leapfrog`, `JitteredLeapfrog`, `TemperedLeapfrog`

- `src/trajectory.jl` - Termination criteria (`ClassicNoUTurn`, `GeneralisedNoUTurn`, `FixedNSteps`) and trajectory samplers (`EndPointTS`, `SliceTS`, `MultinomialTS`)

- `src/adaptation/` - Step size (`NesterovDualAveraging`), mass matrix (`WelfordVar`, `WelfordCov`, `NutpieVar`), composite (`StanHMCAdaptor`, `NaiveHMCAdaptor`)

- `src/constructors.jl` - Convenience constructors: `NUTS`, `HMC`, `HMCDA`

- `src/abstractmcmc.jl` - AbstractMCMC.jl interface implementation

- `src/riemannian/` - Riemannian manifold support with `GeneralizedLeapfrog`

github-actions · 2026-01-21T04:51:41Z

CLAUDE.md

+- `AdvancedHMCOrdinaryDiffEqExt` - DiffEqIntegrator support
+- `AdvancedHMCMCMCChainsExt` - MCMCChains.jl integration
+- `AdvancedHMCCUDAExt` - CUDA array support
+- `AdvancedHMCComponentArraysExt` - ComponentArrays support
+- `AdvancedHMCADTypesExt` - ADTypes.jl integration


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- `AdvancedHMCOrdinaryDiffEqExt` - DiffEqIntegrator support

- `AdvancedHMCMCMCChainsExt` - MCMCChains.jl integration

- `AdvancedHMCCUDAExt` - CUDA array support

- `AdvancedHMCComponentArraysExt` - ComponentArrays support

- `AdvancedHMCADTypesExt` - ADTypes.jl integration

- `AdvancedHMCOrdinaryDiffEqExt` - DiffEqIntegrator support

- `AdvancedHMCMCMCChainsExt` - MCMCChains.jl integration

- `AdvancedHMCCUDAExt` - CUDA array support

- `AdvancedHMCComponentArraysExt` - ComponentArrays support

- `AdvancedHMCADTypesExt` - ADTypes.jl integration

github-actions · 2026-01-21T04:51:42Z

docs/parallel_mcmc_implementation_note.md

+function hmc_step(x_prev, ω_momentum, ω_leapfrog, u_accept, ε, L, ∇logp)
+    # Sample momentum
+    v = ω_momentum  # ~ N(0, I)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2026-01-21T04:51:42Z

docs/parallel_mcmc_implementation_note.md

+
+    # Leapfrog integration (sequential within this step)
+    x, v = leapfrog(x_prev, v, ε, L, ∇logp)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2026-01-21T04:51:42Z

docs/parallel_mcmc_implementation_note.md

+
+    # Accept-reject (with stop-gradient trick as in MALA)
+    # ...
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2026-01-21T04:51:42Z

docs/parallel_mcmc_implementation_note.md

+- Parallelize across T HMC samples
+- Each parallel iteration evaluates L leapfrog steps sequentially
+- Good when T >> L


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- Parallelize across T HMC samples

- Each parallel iteration evaluates L leapfrog steps sequentially

- Good when T >> L

- Parallelize across T HMC samples

- Each parallel iteration evaluates L leapfrog steps sequentially

- Good when T >> L

github-actions · 2026-01-21T04:51:43Z

docs/parallel_mcmc_implementation_note.md

+Run HMC steps sequentially, but parallelize the inner leapfrog loop:
+
+**Leapfrog step** (Equation 8):
+```


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

```

```

github-actions · 2026-01-21T04:53:19Z

AdvancedHMC.jl documentation for PR #487 is available at:
https://TuringLang.github.io/AdvancedHMC.jl/previews/PR487/

codecov · 2026-01-21T05:08:03Z

Codecov Report

❌ Patch coverage is 0% with 566 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.75%. Comparing base (6bc0c74) to head (c041fe9).

Files with missing lines	Patch %	Lines
src/parallel/deer.jl	0.00%	151 Missing ⚠️
src/parallel/hmc.jl	0.00%	109 Missing ⚠️
src/parallel/scan.jl	0.00%	89 Missing ⚠️
src/parallel/abstractmcmc.jl	0.00%	71 Missing ⚠️
src/parallel/jacobian.jl	0.00%	67 Missing ⚠️
src/parallel/mala.jl	0.00%	62 Missing ⚠️
src/parallel/types.jl	0.00%	17 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (6bc0c74) and HEAD (c041fe9). Click for more details.

HEAD has 3 uploads less than BASE

Flag BASE (6bc0c74) HEAD (c041fe9)

9 6

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #487       +/-   ##
===========================================
- Coverage   77.71%   53.75%   -23.96%     
===========================================
  Files          21       28        +7     
  Lines        1270     1836      +566     
===========================================
  Hits          987      987               
- Misses        283      849      +566

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Implements parallel HMC using DEER algorithm with two approaches: Approach A - Parallelize across HMC steps: - parallel_hmc() runs T HMC steps in parallel - hmc_transition_soft() with soft MH gating for differentiability Approach B - Parallelize leapfrog integration: - parallel_leapfrog() parallelizes L leapfrog steps - leapfrog_transition() as DEER transition function Block Quasi-DEER for leapfrog: - BlockQuasiDEER type exploiting 2x2 block Jacobian structure - _deer_iteration_block() for efficient block-diagonal Newton - parallel_scan_block() using Block2x2AffineTransform 49 new tests (345 total parallel tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 465 to 469 in 5036586

    
           rng::AbstractRNG, 
        
           model, 
        
           sampler::ParallelHMC, 
        
           n_samples::Int; 
        
           initial_θ = nothing

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 475 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 479 to 485 in 5036586

    
           | Scenario | Recommended Method | 
        
           |----------|-------------------| 
        
           | Low-D, need fastest convergence | DEER (full Jacobian) | 
        
           | High-D, memory-constrained | Stochastic quasi-DEER | 
        
           | HMC with many leapfrog steps | Block quasi-DEER on leapfrog | 
        
           | Very long chains | Add sliding window | 
        
           | Approximate samples OK | Early stopping |

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 488 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Lines 492 to 496 in 5036586

    
           - **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025 
        
           - **Code**: https://github.com/lindermanlab/parallel-mcmc 
        
           - **DEER theory**: Lim et al. [7], Danieli et al. [6] 
        
           - **Quasi-DEER**: Gonzalez et al. [8] 
        
           - **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_note.md

Line 498 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 7 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 15 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 20 in 5036586

> Foundation components needed by all DEER variants

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 22 to 44 in 5036586

    
           - [x] **1.1 Parallel Scan Implementation** ✅ 
        
             - [x] Associative operator for affine transforms: `(J₂, u₂) ⊕ (J₁, u₁) = (J₂*J₁, J₂*u₁ + u₂)` 
        
             - [x] Full matrix version (for DEER) - `MatrixAffineTransform` 
        
             - [x] Diagonal version (for Quasi-DEER) - `DiagonalAffineTransform` 
        
             - [x] Block-diagonal version (for Block Quasi-DEER / leapfrog) - `Block2x2AffineTransform` 
        
             - [x] CPU implementation using `Base.accumulate` 
        
             - [ ] GPU implementation (CUDA.jl) - deferred to Phase 6 
        
           - [x] **1.2 Jacobian Computation Utilities** ✅ 
        
             - [x] Full Jacobian computation (finite differences, AD-agnostic interface) 
        
             - [x] Diagonal extraction from Jacobian 
        
             - [x] Stochastic diagonal estimation (Hutchinson's method with Rademacher vectors) 
        
             - [x] JVP (Jacobian-vector product) wrapper 
        
             - [x] VJP (Vector-Jacobian product) wrapper 
        
             - [x] Batch versions for computing at multiple points 
        
             - [x] Hessian diagonal utilities (for leapfrog) 
        
           - [x] **1.3 Core Types and Interfaces** ✅ 
        
             - [x] `AbstractParallelMethod` with subtypes: `FullDEER`, `QuasiDEER`, `StochasticQuasiDEER`, `BlockQuasiDEER` 
        
             - [x] `ParallelMCMCState` to hold trajectory and convergence info 
        
             - [x] `ParallelMCMCSettings` configuration struct 
        
           ---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 47 to 60 in 5036586

    
           > The main Newton iteration loop 
        
           - [x] **2.1 Newton Iteration Framework** ✅ 
        
             - [x] Main loop structure with convergence checking 
        
             - [x] Parallel evaluation of f_t at all timesteps 
        
             - [x] Parallel computation of Jacobians (method-dependent) 
        
             - [x] Compute inputs u_t = f_t(s_{t-1}) - J_t * s_{t-1} 
        
             - [x] Solve linear system via parallel scan 
        
             - [x] Convergence criterion (max absolute difference) 
        
             - [x] DEERResult type with trajectory, convergence info, residual history 
        
           - [x] **2.2 Full DEER Implementation** ✅ 
        
             - [x] Store and manipulate T × D × D Jacobian matrices 
        
             - [x] Full matrix parallel scan

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 62 to 69 in 5036586

    
           - [x] **2.3 Quasi-DEER Implementation** ✅ 
        
             - [x] Diagonal Jacobian storage (T × D) 
        
             - [x] Elementwise parallel scan 
        
           - [x] **2.4 Stochastic Quasi-DEER Implementation** ✅ 
        
             - [x] Hutchinson estimator with configurable number of samples 
        
             - [x] Rademacher vector generation (from Phase 1.2) 
        
             - [x] JVP-based diagonal estimation

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 71 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 74 to 80 in 5036586

    
           > Parallelized Metropolis-Adjusted Langevin Algorithm 
        
           - [x] **3.1 MALA Transition Function** ✅ 
        
             - [x] Proposal: x̃ = x + ε∇log p(x) + √(2ε)ξ 
        
             - [x] Acceptance ratio computation (forward and backward proposal densities) 
        
             - [x] Stop-gradient trick for differentiable accept-reject 
        
             - [x] Soft gating with sigmoid and straight-through estimator

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 82 to 90 in 5036586

    
           - [x] **3.2 Parallel MALA Sampler** ✅ 
        
             - [x] MALARandomInputs type for pre-sampled (ξ, u) pairs 
        
             - [x] sample_mala_inputs() for batch sampling 
        
             - [x] parallel_mala() integrating with DEER framework 
        
             - [x] sequential_mala() for reference/testing 
        
             - [x] Convenience API with automatic input sampling 
        
           - [ ] **3.3 MALA-specific Optimizations** (deferred) 
        
             - [ ] Preconditioning with Hessian eigendecomposition (optional)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 92 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 95 in 5036586

> Two approaches for parallelizing HMC

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 97 to 122 in 5036586

    
           - [ ] **4.1 Approach A: Parallelize Across HMC Steps** 
        
             - [ ] Treat full HMC step as transition function f_t 
        
             - [ ] Sequential leapfrog within each step 
        
             - [ ] Parallel Newton across T HMC samples 
        
             - [ ] Good when T >> L (many samples, few leapfrog steps) 
        
           - [ ] **4.2 Approach B: Parallelize Leapfrog Integration** 
        
             - [ ] State is s = [x, v] (position + momentum) 
        
             - [ ] Apply DEER to L leapfrog steps within each HMC step 
        
             - [ ] Block Quasi-DEER with 2×2 block structure per dimension 
        
             - [ ] Good when L is large 
        
           - [ ] **4.3 Block Quasi-DEER for Leapfrog** 
        
             - [ ] Block Jacobian structure: 
        
               ``` 
        
               J = [ I_D          ε*I_D        ] 
        
                   [ ε*diag(H)    I_D + ε²*diag(H) ] 
        
               ``` 
        
             - [ ] Efficient 2×2 block scan per dimension 
        
             - [ ] Hessian diagonal computation 
        
           - [ ] **4.4 Accept-Reject for HMC** 
        
             - [ ] Stop-gradient trick (same as MALA) 
        
             - [ ] Momentum refresh handling 
        
           ---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 125 in 5036586

> Integrate with existing library architecture

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 127 to 140 in 5036586

    
           - [ ] **5.1 New Sampler Types** 
        
             - [ ] `ParallelHMC <: AbstractMCMCSampler` 
        
             - [ ] `ParallelMALA <: AbstractMCMCSampler` 
        
           - [ ] **5.2 AbstractMCMC Interface** 
        
             - [ ] Implement `AbstractMCMC.step` (or batch variant) 
        
             - [ ] Implement `AbstractMCMC.sample` returning full chain 
        
             - [ ] Handle RNG properly for reproducibility 
        
           - [ ] **5.3 Integration with Existing Components** 
        
             - [ ] Use existing `Hamiltonian` type 
        
             - [ ] Use existing `Metric` types 
        
             - [ ] Use existing `PhasePoint` structure 
        
             - [ ] Compatibility with existing gradient computation

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 142 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 145 to 159 in 5036586

    
           > Optimizations and extensions 
        
           - [ ] **6.1 Sliding Window** 
        
             - [ ] For memory-constrained high-dimensional problems 
        
             - [ ] Window initialization and shifting logic 
        
             - [ ] Convergence tracking per window 
        
           - [ ] **6.2 Early Stopping** 
        
             - [ ] Return approximate samples before full convergence 
        
             - [ ] Quality-vs-time tradeoff parameter 
        
           - [ ] **6.3 GPU Support** 
        
             - [ ] CUDA.jl parallel scan kernel 
        
             - [ ] GPU-compatible transition functions 
        
             - [ ] Batch Jacobian computation on GPU

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 161 to 164 in 5036586

    
           - [ ] **6.4 Diagnostics and Monitoring** 
        
             - [ ] Iteration count tracking 
        
             - [ ] Convergence history 
        
             - [ ] Per-timestep residual monitoring

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 166 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 169 to 179 in 5036586

    
           > Ensure correctness and performance 
        
           - [ ] **7.1 Unit Tests** 
        
             - [ ] Parallel scan correctness (compare to sequential) 
        
             - [ ] Jacobian computation accuracy 
        
             - [ ] Hutchinson estimator convergence 
        
           - [ ] **7.2 Integration Tests** 
        
             - [ ] DEER converges to sequential MALA trace 
        
             - [ ] DEER converges to sequential HMC trace 
        
             - [ ] Quasi-DEER matches DEER (eventually)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 181 to 188 in 5036586

    
           - [ ] **7.3 Statistical Validation** 
        
             - [ ] Samples match target distribution (known distributions) 
        
             - [ ] Compare sample quality metrics (ESS, R-hat) to sequential 
        
           - [ ] **7.4 Performance Benchmarks** 
        
             - [ ] Wall-clock time vs sequential 
        
             - [ ] Scaling with chain length T 
        
             - [ ] Memory usage profiling

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 190 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 213 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 218 to 219 in 5036586

    
           - ForwardDiff.jl - Jacobian computation 
        
           - LinearAlgebra - Matrix operations

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 222 to 224 in 5036586

    
           - CUDA.jl - GPU acceleration 
        
           - Zygote.jl / Enzyme.jl - Alternative AD for JVPs 
        
           - ChainRulesCore.jl - Custom rrules for stop-gradient

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 226 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 230 to 236 in 5036586

    
           1. **Start with Phase 1.1** - Parallel scan is the foundation 
        
           2. **Then Phase 1.2-1.3** - Jacobian utilities and types 
        
           3. **Phase 2** - Core DEER loop (test with simple linear system first) 
        
           4. **Phase 3** - MALA (simpler than HMC, good first MCMC target) 
        
           5. **Phase 4** - HMC integration 
        
           6. **Phase 5** - AdvancedHMC.jl integration 
        
           7. **Phase 6-7** - Advanced features and thorough testing

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 238 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 242 to 259 in 5036586

    
           | Date | Phase | Item | Status | Notes | 
        
           |------|-------|------|--------|-------| 
        
           | 2026-01-20 | - | Initial plan created | ✅ | Based on implementation note | 
        
           | 2026-01-20 | 1.1 | Parallel scan implementation | ✅ | All 3 variants (matrix, diagonal, block 2x2) | 
        
           | 2026-01-20 | 1.3 | Core types and interfaces | ✅ | Types for methods, settings, state | 
        
           | 2026-01-20 | 7.1 | Parallel scan unit tests | ✅ | 141 tests passing | 
        
           | 2026-01-20 | 1.2 | Jacobian computation utilities | ✅ | FD-based, Hutchinson estimator, JVP/VJP | 
        
           | 2026-01-20 | 7.1 | Jacobian utility tests | ✅ | 57 tests passing | 
        
           | 2026-01-20 | 1 | **Phase 1 Complete** | ✅ | All core infrastructure done | 
        
           | 2026-01-20 | 2 | DEER algorithm core | ✅ | Newton iteration, Full/Quasi/Stochastic DEER | 
        
           | 2026-01-20 | 7.1 | DEER algorithm tests | ✅ | 67 tests passing | 
        
           | 2026-01-20 | 2 | **Phase 2 Complete** | ✅ | Core DEER algorithm working | 
        
           | 2026-01-20 | 3 | MALA transition function | ✅ | Proposal, acceptance, soft gating | 
        
           | 2026-01-20 | 3 | Parallel MALA sampler | ✅ | Integrated with DEER framework | 
        
           | 2026-01-20 | 7.1 | MALA tests | ✅ | 31 tests passing | 
        
           | 2026-01-20 | 3 | **Phase 3 Complete** | ✅ | Parallel MALA working | 
        
           ---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 263 to 272 in 5036586

    
           1. **AD Backend Choice**: ForwardDiff for full Jacobians, but what for JVPs in stochastic quasi-DEER? Zygote vs Enzyme? 
        
              - **Decision**: Use whichever works best; start with ForwardDiff 
        
           2. **GPU Priority**: Should GPU support be Phase 1 or deferred to Phase 6? 
        
              - **Decision**: Deferred to Phase 6; focus on CPU correctness first 
        
           3. **Integration Approach**: Should we modify existing HMC/Leapfrog types or create entirely new parallel variants? 
        
              - **Decision**: Be compatible with existing abstractions and API design as much as possible 
        
           4. **Testing Strategy**: What target distributions should we use for validation? Gaussian (analytical), banana, Neal's funnel?

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 274 in 5036586

    
           5. **API Design**: Return full chain always, or support incremental parallel blocks?

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 276 in 5036586

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 280 to 283 in 5036586

    
           - Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025 
        
           - Reference implementation: https://github.com/lindermanlab/parallel-mcmc (JAX) 
        
           - DEER theory: Lim et al., Danieli et al. 
        
           - Hutchinson estimator: Hutchinson (1990), Bekas et al.

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 109 to 110 in 5036586

    
           f, s0, trajectory, ω, method; 
        
           jacobian_fn=jacobian_fn, jvp_fn=jvp_fn, rng=rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 142 to 145 in 5036586

    
           function _deer_iteration( 
        
               f, s0, trajectory, ω, method::FullDEER; 
        
               jacobian_fn, jvp_fn, rng 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 149 to 152 in 5036586

    
           function _deer_iteration( 
        
               f, s0, trajectory, ω, method::QuasiDEER; 
        
               jacobian_fn, jvp_fn, rng 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 157 to 158 in 5036586

    
           f, s0, trajectory, ω, method::StochasticQuasiDEER; 
        
           jacobian_fn, jvp_fn, rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 161 to 162 in 5036586

    
           f, s0, trajectory, ω; 
        
           jvp_fn=jvp_fn, rng=rng, n_samples=method.n_samples

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 179 to 183 in 5036586

    
           f, 
        
           s0::AbstractVector{T}, 
        
           trajectory::AbstractMatrix{T}, 
        
           ω; 
        
           jacobian_fn=jacobian_fd,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 230 to 234 in 5036586

    
           f, 
        
           s0::AbstractVector{T}, 
        
           trajectory::AbstractMatrix{T}, 
        
           ω; 
        
           jacobian_fn=jacobian_fd,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 307 in 5036586

    
           J_diag[t, :] = hutchinson_diagonal(f_t, s_prev, jvp_fn; rng=rng, n_samples=n_samples)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 333 to 334 in 5036586

    
           f, s0, trajectory, ω, method::BlockQuasiDEER; 
        
           jacobian_fn, jvp_fn, rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 337 in 5036586

f, s0, trajectory, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 340 in 5036586

M⁻¹=method.M⁻¹

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 409 in 5036586

r_prev = s_prev[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 412 in 5036586

f_r = f_vals[t, (D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 421 to 424 in 5036586

    
           transforms = [Block2x2AffineTransform( 
        
               J_a[t, :], J_b[t, :], J_c[t, :], J_e[t, :], 
        
               u_x[t, :], u_v[t, :] 
        
           ) for t in 1:T_len]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 428 in 5036586

r0 = s0[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 436 in 5036586

trajectory_new[:, (D+1):end] = trajectory_r

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 460 in 5036586

prefix[t] = compose(transforms[t], prefix[t-1])

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 488 to 493 in 5036586

    
           f, 
        
           s0::AbstractVector, 
        
           T_len::Int, 
        
           ω, 
        
           settings::ParallelMCMCSettings; 
        
           kwargs...

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 496 in 5036586

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 500 in 5036586

kwargs...

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 511 to 516 in 5036586

    
           function sequential_mcmc( 
        
               f, 
        
               s0::AbstractVector{T}, 
        
               T_len::Int, 
        
               ω, 
        
           ) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 41 to 44 in 5036586

    
           rng::AbstractRNG, 
        
           D::Int, 
        
           T_len::Int; 
        
           M⁻¹::AbstractVector=ones(D),

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 353 to 356 in 5036586

    
           config::HMCConfig{T}, 
        
           s0::AbstractVector{T}, 
        
           T_len::Int, 
        
           ω::Vector{<:HMCRandomInputs{T}},

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Line 366 in 5036586

    
           θ_new = hmc_transition(θ_curr, ω[t], config.logp, config.∇logp, config.ε, config.L; M⁻¹=M⁻¹)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 414 to 415 in 5036586

    
           θ, ω_t, config.logp, config.∇logp, config.ε, config.L; 
        
           M⁻¹=M⁻¹, τ=T(0.01)  # Small τ for near-hard gating

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Line 490 in 5036586

r = s[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 600 to 604 in 5036586

    
           function hessian_diagonal_fd( 
        
               ∇logp, 
        
               θ::AbstractVector{T}; 
        
               δ::T=T(1e-5), 
        
           ) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 118 in 5036586

    
           function batch_jacobian_diagonals(f, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 210 in 5036586

    
           function rademacher_vector(rng::AbstractRNG, n::Int, ::Type{T}=Float64) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Lines 241 to 245 in 5036586

    
           f, 
        
           x::AbstractVector{T}, 
        
           jvp_fn; 
        
           rng::AbstractRNG=default_rng(), 
        
           n_samples::Int=1,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Lines 275 to 279 in 5036586

    
           f, 
        
           xs::AbstractMatrix{T}, 
        
           jvp_fn; 
        
           rng::AbstractRNG=default_rng(), 
        
           n_samples::Int=1,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 311 in 5036586

    
           function hessian_diagonal(grad_log_p, x::AbstractVector{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 327 in 5036586

    
           function batch_hessian_diagonals(grad_log_p, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 40 to 43 in 5036586

    
           rng::AbstractRNG, 
        
           D::Int, 
        
           T_len::Int, 
        
           ::Type{T}=Float64,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Line 48 in 5036586

sample_mala_inputs(D::Int, T_len::Int, T::Type=Float64) =

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Line 50 in 5036586

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 70 to 73 in 5036586

    
           x::AbstractVector{T}, 
        
           ∇logp_x::AbstractVector{T}, 
        
           ε::T, 
        
           ξ::AbstractVector{T},

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 102 to 106 in 5036586

    
           x::AbstractVector{T}, 
        
           x̃::AbstractVector{T}, 
        
           logp, 
        
           ∇logp, 
        
           ε::T,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 197 to 201 in 5036586

    
           x::AbstractVector{T}, 
        
           ω::MALARandomInputs{T}, 
        
           logp, 
        
           ∇logp, 
        
           ε::T,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 381 to 386 in 5036586

    
           logp, 
        
           ∇logp, 
        
           ε::T, 
        
           s0::AbstractVector{T}, 
        
           T_len::Int; 
        
           rng::AbstractRNG=default_rng(),

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/scan.jl

Line 262 in 5036586

    
           function make_block_transforms(H_diag::Matrix{T}, ε::T, u_x::Matrix{T}, u_v::Matrix{T}) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 138 in 5036586

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 142 in 5036586

rng=rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 238 in 5036586

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 241 in 5036586

rng=MersenneTwister(12345)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 167 in 5036586

    
           result = parallel_hmc(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-8, max_iters=200)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 193 in 5036586

    
           result = parallel_hmc(config, s0, T_len, ω; method=FullDEER(), tol=1e-8, max_iters=100)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 211 in 5036586

logp, ∇logp, ε, L, s0, T_len;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 214 in 5036586

tol=1e-8

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 256 in 5036586

r_new = s_new[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 291 in 5036586

r_par = s_final[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 339 in 5036586

    
           trajectory, acceptance_rate = sequential_hmc(logp, ∇logp, ε, L, s0, T_len; rng=rng, M⁻¹=M⁻¹)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 380 in 5036586

samples = trajectory[(burn_in+1):end, :]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 445 in 5036586

r_par = s_final[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 481 in 5036586

@test s_final[(D+1):end] ≈ r_seq atol = 1e-5

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 77 in 5036586

    
           result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=200)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 100 in 5036586

    
           result = parallel_mala(config, s0, T_len, ω; method=FullDEER(), tol=1e-10, max_iters=100)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 125 in 5036586

config, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 129 in 5036586

rng=MersenneTwister(999)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 204 in 5036586

logp, ∇logp, ε, s0, T_len;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 207 in 5036586

tol=1e-10

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 274 in 5036586

    
                   result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=500)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_scan.jl

Line 284 in 5036586

    
           t_block = Block2x2AffineTransform(rand(D), rand(D), rand(D), rand(D), rand(D), rand(D))

github-actions · 2026-01-25T16:14:57Z

docs/parallel_mcmc_implementation_note.md

+- Good when L is large
+- Use **block quasi-DEER** for this (Section 2.4)
+- Convergence in ~L/2 iterations typically


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- Good when L is large

- Use **block quasi-DEER** for this (Section 2.4)

- Convergence in ~L/2 iterations typically

- Good when L is large

- Use **block quasi-DEER** for this (Section 2.4)

- Convergence in ~L/2 iterations typically

github-actions · 2026-01-25T16:14:57Z

docs/parallel_mcmc_implementation_note.md

+
+Note: The update for coordinate d does **not** depend on `x_{t-1,d}`, which affects the Jacobian structure.
+
+---


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

---

* * *

github-actions · 2026-01-25T16:14:57Z

docs/parallel_mcmc_implementation_note.md

+    T: number of samples
+    method: :deer, :quasi_deer, :stochastic_quasi_deer, :block_quasi_deer
+    """
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2026-01-25T16:14:57Z

docs/parallel_mcmc_implementation_note.md

+    """
+
+    D = length(s0)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2026-01-25T16:14:57Z

docs/parallel_mcmc_implementation_note.md

+
+    # Pre-sample all random inputs
+    ω = sample_random_inputs(T)  # Shape: (T, ...)
+


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

github-actions · 2026-01-25T16:14:59Z

docs/parallel_mcmc_implementation_note.md

+- **ForwardDiff.jl** for full Jacobians (small D)
+- **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- **ForwardDiff.jl** for full Jacobians (small D)

- **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER

- **ForwardDiff.jl** for full Jacobians (small D)

- **Zygote.jl** or **Enzyme.jl** for JVPs in stochastic quasi-DEER

github-actions · 2026-01-25T16:14:59Z

docs/parallel_mcmc_implementation_note.md

+- Use `CUDA.jl` with `@cuda` kernels
+- Or `KernelAbstractions.jl` for portable GPU code
+- Ensure f is GPU-compatible (no scalar indexing, etc.)


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- Use `CUDA.jl` with `@cuda` kernels

- Or `KernelAbstractions.jl` for portable GPU code

- Ensure f is GPU-compatible (no scalar indexing, etc.)

- Use `CUDA.jl` with `@cuda` kernels

- Or `KernelAbstractions.jl` for portable GPU code

- Ensure f is GPU-compatible (no scalar indexing, etc.)

github-actions · 2026-01-25T16:14:59Z

docs/parallel_mcmc_implementation_note.md

+- Implement custom CUDA kernel for efficiency
+- Or use `Transducers.jl` with GPU support


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- Implement custom CUDA kernel for efficiency

- Or use `Transducers.jl` with GPU support

- Implement custom CUDA kernel for efficiency

- Or use `Transducers.jl` with GPU support

github-actions · 2026-01-25T16:14:59Z

docs/parallel_mcmc_implementation_note.md

+1. **New sampler type**: `ParallelHMC` or `DEER_HMC`
+2. **Modify `Leapfrog` integrator**: Add parallel mode
+3. **New `sample` method**: Returns full chain in one call


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

1. **New sampler type**: `ParallelHMC` or `DEER_HMC`

2. **Modify `Leapfrog` integrator**: Add parallel mode

3. **New `sample` method**: Returns full chain in one call

1. **New sampler type**: `ParallelHMC` or `DEER_HMC`

2. **Modify `Leapfrog` integrator**: Add parallel mode

3. **New `sample` method**: Returns full chain in one call

github-actions · 2026-01-25T16:14:59Z

docs/parallel_mcmc_implementation_note.md

+    method::Symbol  # :deer, :quasi_deer, :stochastic_quasi_deer
+    tol::Float64
+    max_iters::Int
+    window_size::Union{Int, Nothing}  # for sliding window


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

window_size::Union{Int, Nothing} # for sliding window

window_size::Union{Int,Nothing} # for sliding window

Implements parallel sampler types for AbstractMCMC-compatible interface: - ParallelHMCSampler / ParallelHMC - ParallelMALASampler / ParallelMALA - ParallelSamplerState with trajectory and convergence info - parallel_sample() for batch sampling with LogDensityProblems - SimpleLogDensity wrapper for easy testing 75 new tests (420 total parallel tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 230 in c041fe9

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 234 to 240 in c041fe9

    
           1. **Start with Phase 1.1** - Parallel scan is the foundation 
        
           2. **Then Phase 1.2-1.3** - Jacobian utilities and types 
        
           3. **Phase 2** - Core DEER loop (test with simple linear system first) 
        
           4. **Phase 3** - MALA (simpler than HMC, good first MCMC target) 
        
           5. **Phase 4** - HMC integration 
        
           6. **Phase 5** - AdvancedHMC.jl integration 
        
           7. **Phase 6-7** - Advanced features and thorough testing

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 242 in c041fe9

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 246 to 273 in c041fe9

    
           | Date | Phase | Item | Status | Notes | 
        
           |------|-------|------|--------|-------| 
        
           | 2026-01-20 | - | Initial plan created | ✅ | Based on implementation note | 
        
           | 2026-01-20 | 1.1 | Parallel scan implementation | ✅ | All 3 variants (matrix, diagonal, block 2x2) | 
        
           | 2026-01-20 | 1.3 | Core types and interfaces | ✅ | Types for methods, settings, state | 
        
           | 2026-01-20 | 7.1 | Parallel scan unit tests | ✅ | 141 tests passing | 
        
           | 2026-01-20 | 1.2 | Jacobian computation utilities | ✅ | FD-based, Hutchinson estimator, JVP/VJP | 
        
           | 2026-01-20 | 7.1 | Jacobian utility tests | ✅ | 57 tests passing | 
        
           | 2026-01-20 | 1 | **Phase 1 Complete** | ✅ | All core infrastructure done | 
        
           | 2026-01-20 | 2 | DEER algorithm core | ✅ | Newton iteration, Full/Quasi/Stochastic DEER | 
        
           | 2026-01-20 | 7.1 | DEER algorithm tests | ✅ | 67 tests passing | 
        
           | 2026-01-20 | 2 | **Phase 2 Complete** | ✅ | Core DEER algorithm working | 
        
           | 2026-01-20 | 3 | MALA transition function | ✅ | Proposal, acceptance, soft gating | 
        
           | 2026-01-20 | 3 | Parallel MALA sampler | ✅ | Integrated with DEER framework | 
        
           | 2026-01-20 | 7.1 | MALA tests | ✅ | 31 tests passing | 
        
           | 2026-01-20 | 3 | **Phase 3 Complete** | ✅ | Parallel MALA working | 
        
           | 2026-01-29 | 4.1 | Approach A: Parallelize HMC steps | ✅ | parallel_hmc(), soft MH gating | 
        
           | 2026-01-29 | 4.2 | Approach B: Parallelize leapfrog | ✅ | parallel_leapfrog(), leapfrog_transition() | 
        
           | 2026-01-29 | 4.3 | Block Quasi-DEER for leapfrog | ✅ | 2×2 block structure, hessian_diagonal_fd | 
        
           | 2026-01-29 | 7.1 | HMC tests | ✅ | 49 tests passing | 
        
           | 2026-01-29 | 4 | **Phase 4 Complete** | ✅ | Parallel HMC working (345 total tests) | 
        
           | 2026-01-30 | 5.1 | New sampler types | ✅ | ParallelHMCSampler, ParallelMALASampler | 
        
           | 2026-01-30 | 5.2 | AbstractMCMC interface | ✅ | parallel_sample(), ParallelSamplerState | 
        
           | 2026-01-30 | 5.3 | Integration with components | ✅ | Metric types, LogDensityProblems | 
        
           | 2026-01-30 | 7.1 | AbstractMCMC tests | ✅ | 75 tests passing | 
        
           | 2026-01-30 | 5 | **Phase 5 Complete** | ✅ | AdvancedHMC.jl integration (420 total tests) | 
        
           ---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 277 to 286 in c041fe9

    
           1. **AD Backend Choice**: ForwardDiff for full Jacobians, but what for JVPs in stochastic quasi-DEER? Zygote vs Enzyme? 
        
              - **Decision**: Use whichever works best; start with ForwardDiff 
        
           2. **GPU Priority**: Should GPU support be Phase 1 or deferred to Phase 6? 
        
              - **Decision**: Deferred to Phase 6; focus on CPU correctness first 
        
           3. **Integration Approach**: Should we modify existing HMC/Leapfrog types or create entirely new parallel variants? 
        
              - **Decision**: Be compatible with existing abstractions and API design as much as possible 
        
           4. **Testing Strategy**: What target distributions should we use for validation? Gaussian (analytical), banana, Neal's funnel?

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 288 in c041fe9

    
           5. **API Design**: Return full chain always, or support incremental parallel blocks?

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Line 290 in c041fe9

---

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/docs/parallel_mcmc_implementation_plan.md

Lines 294 to 297 in c041fe9

    
           - Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025 
        
           - Reference implementation: https://github.com/lindermanlab/parallel-mcmc (JAX) 
        
           - DEER theory: Lim et al., Danieli et al. 
        
           - Hutchinson estimator: Hutchinson (1990), Bekas et al.

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/Parallel.jl

Lines 54 to 55 in c041fe9

    
           const IS_SUBMODULE = parentmodule(@__MODULE__) !== Main && 
        
                                nameof(parentmodule(@__MODULE__)) === :AdvancedHMC

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/Parallel.jl

Line 59 in c041fe9

    
           import ..AdvancedHMC: AbstractMetric, DiagEuclideanMetric, UnitEuclideanMetric, DenseEuclideanMetric

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Lines 46 to 48 in c041fe9

    
           T<:AbstractFloat, 
        
           M<:AbstractParallelMethod, 
        
           I<:Union{Symbol,AbstractMetric},

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Lines 102 to 104 in c041fe9

    
           T<:AbstractFloat, 
        
           M<:AbstractParallelMethod, 
        
           I<:Union{Symbol,AbstractMetric},

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 247 in c041fe9

∇logp = function(x)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 270 in c041fe9

config, s0, N, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 274 in c041fe9

verbose=verbose

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 286 in c041fe9

acceptance_rate

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 308 in c041fe9

∇logp = function(x)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 328 in c041fe9

config, s0, N, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 332 in c041fe9

verbose=verbose

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 343 in c041fe9

acceptance_rate

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 373 in c041fe9

return state.trajectory[(burn_in+1):end, :]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Lines 399 to 402 in c041fe9

    
           stat = ( 
        
               iteration=i, 
        
               converged=state.converged, 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/abstractmcmc.jl

Line 432 in c041fe9

    
           LogDensityProblems.capabilities(::Type{<:SimpleLogDensity}) = LogDensityProblems.LogDensityOrder{1}()

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 109 to 110 in c041fe9

    
           f, s0, trajectory, ω, method; 
        
           jacobian_fn=jacobian_fn, jvp_fn=jvp_fn, rng=rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 142 to 145 in c041fe9

    
           function _deer_iteration( 
        
               f, s0, trajectory, ω, method::FullDEER; 
        
               jacobian_fn, jvp_fn, rng 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 149 to 152 in c041fe9

    
           function _deer_iteration( 
        
               f, s0, trajectory, ω, method::QuasiDEER; 
        
               jacobian_fn, jvp_fn, rng 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 157 to 158 in c041fe9

    
           f, s0, trajectory, ω, method::StochasticQuasiDEER; 
        
           jacobian_fn, jvp_fn, rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 161 to 162 in c041fe9

    
           f, s0, trajectory, ω; 
        
           jvp_fn=jvp_fn, rng=rng, n_samples=method.n_samples

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 179 to 183 in c041fe9

    
           f, 
        
           s0::AbstractVector{T}, 
        
           trajectory::AbstractMatrix{T}, 
        
           ω; 
        
           jacobian_fn=jacobian_fd,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 230 to 234 in c041fe9

    
           f, 
        
           s0::AbstractVector{T}, 
        
           trajectory::AbstractMatrix{T}, 
        
           ω; 
        
           jacobian_fn=jacobian_fd,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 307 in c041fe9

    
           J_diag[t, :] = hutchinson_diagonal(f_t, s_prev, jvp_fn; rng=rng, n_samples=n_samples)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 333 to 334 in c041fe9

    
           f, s0, trajectory, ω, method::BlockQuasiDEER; 
        
           jacobian_fn, jvp_fn, rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 337 in c041fe9

f, s0, trajectory, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 340 in c041fe9

M⁻¹=method.M⁻¹

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 409 in c041fe9

r_prev = s_prev[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 412 in c041fe9

f_r = f_vals[t, (D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 421 to 424 in c041fe9

    
           transforms = [Block2x2AffineTransform( 
        
               J_a[t, :], J_b[t, :], J_c[t, :], J_e[t, :], 
        
               u_x[t, :], u_v[t, :] 
        
           ) for t in 1:T_len]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 428 in c041fe9

r0 = s0[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 436 in c041fe9

trajectory_new[:, (D+1):end] = trajectory_r

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 460 in c041fe9

prefix[t] = compose(transforms[t], prefix[t-1])

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 488 to 493 in c041fe9

    
           f, 
        
           s0::AbstractVector, 
        
           T_len::Int, 
        
           ω, 
        
           settings::ParallelMCMCSettings; 
        
           kwargs...

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 496 in c041fe9

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Line 500 in c041fe9

kwargs...

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/deer.jl

Lines 511 to 516 in c041fe9

    
           function sequential_mcmc( 
        
               f, 
        
               s0::AbstractVector{T}, 
        
               T_len::Int, 
        
               ω, 
        
           ) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 41 to 44 in c041fe9

    
           rng::AbstractRNG, 
        
           D::Int, 
        
           T_len::Int; 
        
           M⁻¹::AbstractVector=ones(D),

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 353 to 356 in c041fe9

    
           config::HMCConfig{T}, 
        
           s0::AbstractVector{T}, 
        
           T_len::Int, 
        
           ω::Vector{<:HMCRandomInputs{T}},

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Line 366 in c041fe9

    
           θ_new = hmc_transition(θ_curr, ω[t], config.logp, config.∇logp, config.ε, config.L; M⁻¹=M⁻¹)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 414 to 415 in c041fe9

    
           θ, ω_t, config.logp, config.∇logp, config.ε, config.L; 
        
           M⁻¹=M⁻¹, τ=T(0.01)  # Small τ for near-hard gating

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Line 490 in c041fe9

r = s[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/hmc.jl

Lines 600 to 604 in c041fe9

    
           function hessian_diagonal_fd( 
        
               ∇logp, 
        
               θ::AbstractVector{T}; 
        
               δ::T=T(1e-5), 
        
           ) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 118 in c041fe9

    
           function batch_jacobian_diagonals(f, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 210 in c041fe9

    
           function rademacher_vector(rng::AbstractRNG, n::Int, ::Type{T}=Float64) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Lines 241 to 245 in c041fe9

    
           f, 
        
           x::AbstractVector{T}, 
        
           jvp_fn; 
        
           rng::AbstractRNG=default_rng(), 
        
           n_samples::Int=1,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Lines 275 to 279 in c041fe9

    
           f, 
        
           xs::AbstractMatrix{T}, 
        
           jvp_fn; 
        
           rng::AbstractRNG=default_rng(), 
        
           n_samples::Int=1,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 311 in c041fe9

    
           function hessian_diagonal(grad_log_p, x::AbstractVector{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/jacobian.jl

Line 327 in c041fe9

    
           function batch_hessian_diagonals(grad_log_p, xs::AbstractMatrix{T}, diag_fn=jacobian_diagonal_full) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 40 to 43 in c041fe9

    
           rng::AbstractRNG, 
        
           D::Int, 
        
           T_len::Int, 
        
           ::Type{T}=Float64,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Line 48 in c041fe9

sample_mala_inputs(D::Int, T_len::Int, T::Type=Float64) =

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Line 50 in c041fe9

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 70 to 73 in c041fe9

    
           x::AbstractVector{T}, 
        
           ∇logp_x::AbstractVector{T}, 
        
           ε::T, 
        
           ξ::AbstractVector{T},

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 102 to 106 in c041fe9

    
           x::AbstractVector{T}, 
        
           x̃::AbstractVector{T}, 
        
           logp, 
        
           ∇logp, 
        
           ε::T,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 197 to 201 in c041fe9

    
           x::AbstractVector{T}, 
        
           ω::MALARandomInputs{T}, 
        
           logp, 
        
           ∇logp, 
        
           ε::T,

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/mala.jl

Lines 381 to 386 in c041fe9

    
           logp, 
        
           ∇logp, 
        
           ε::T, 
        
           s0::AbstractVector{T}, 
        
           T_len::Int; 
        
           rng::AbstractRNG=default_rng(),

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/src/parallel/scan.jl

Line 262 in c041fe9

    
           function make_block_transforms(H_diag::Matrix{T}, ε::T, u_x::Matrix{T}, u_v::Matrix{T}) where {T}

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Line 20 in c041fe9

    
           @test LogDensityProblems.capabilities(typeof(ld)) == LogDensityProblems.LogDensityOrder{1}()

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Lines 42 to 46 in c041fe9

    
           sampler2 = ParallelHMCSampler(0.05, 20; 
        
               method=FullDEER(), 
        
               metric=:unit, 
        
               tol=1e-8, 
        
               max_iters=500

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Lines 72 to 75 in c041fe9

    
           sampler2 = ParallelMALASampler(0.05; 
        
               method=FullDEER(), 
        
               tol=1e-10 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Line 114 in c041fe9

0.85 # acceptance_rate

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Lines 129 to 131 in c041fe9

    
           state = ParallelSamplerState( 
        
               trajectory, true, 3, 1e-10, 0.9 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Line 141 in c041fe9

@test samples_after_burn == trajectory[(burn_in+1):end, :]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Lines 149 to 151 in c041fe9

    
           state = ParallelSamplerState( 
        
               trajectory, true, 2, 1e-9, 0.95 
        
           )

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_abstractmcmc.jl

Line 276 in c041fe9

    
           state_full = parallel_sample(MersenneTwister(101112), ld, sampler_full, N; initial_params=zeros(D))

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 138 in c041fe9

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 142 in c041fe9

rng=rng

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 238 in c041fe9

f, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_deer.jl

Line 241 in c041fe9

rng=MersenneTwister(12345)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 167 in c041fe9

    
           result = parallel_hmc(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-8, max_iters=200)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 193 in c041fe9

    
           result = parallel_hmc(config, s0, T_len, ω; method=FullDEER(), tol=1e-8, max_iters=100)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 211 in c041fe9

logp, ∇logp, ε, L, s0, T_len;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 214 in c041fe9

tol=1e-8

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 256 in c041fe9

r_new = s_new[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 291 in c041fe9

r_par = s_final[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 339 in c041fe9

    
           trajectory, acceptance_rate = sequential_hmc(logp, ∇logp, ε, L, s0, T_len; rng=rng, M⁻¹=M⁻¹)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 380 in c041fe9

samples = trajectory[(burn_in+1):end, :]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 445 in c041fe9

r_par = s_final[(D+1):end]

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_hmc.jl

Line 481 in c041fe9

@test s_final[(D+1):end] ≈ r_seq atol = 1e-5

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 77 in c041fe9

    
           result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=200)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 100 in c041fe9

    
           result = parallel_mala(config, s0, T_len, ω; method=FullDEER(), tol=1e-10, max_iters=100)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 125 in c041fe9

config, s0, T_len, ω;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 129 in c041fe9

rng=MersenneTwister(999)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 204 in c041fe9

logp, ∇logp, ε, s0, T_len;

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 207 in c041fe9

tol=1e-10

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_mala.jl

Line 274 in c041fe9

    
                   result = parallel_mala(config, s0, T_len, ω; method=QuasiDEER(), tol=1e-10, max_iters=500)

[JuliaFormatter] _{reported by reviewdog 🐶}

AdvancedHMC.jl/test/parallel/test_scan.jl

Line 284 in c041fe9

    
           t_block = Block2x2AffineTransform(rand(D), rand(D), rand(D), rand(D), rand(D), rand(D))

github-actions · 2026-01-30T19:40:35Z

docs/parallel_mcmc_implementation_note.md

+    rng::AbstractRNG,
+    model,
+    sampler::ParallelHMC,
+    n_samples::Int;
+    initial_θ = nothing


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

rng::AbstractRNG,

model,

sampler::ParallelHMC,

n_samples::Int;

initial_θ = nothing

rng::AbstractRNG, model, sampler::ParallelHMC, n_samples::Int; initial_θ=nothing

github-actions · 2026-01-30T19:40:35Z

docs/parallel_mcmc_implementation_note.md

+end
+```
+
+---


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

---

* * *

github-actions · 2026-01-30T19:40:35Z

docs/parallel_mcmc_implementation_note.md

+| Scenario | Recommended Method |
+|----------|-------------------|
+| Low-D, need fastest convergence | DEER (full Jacobian) |
+| High-D, memory-constrained | Stochastic quasi-DEER |
+| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |
+| Very long chains | Add sliding window |
+| Approximate samples OK | Early stopping |


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

| Scenario | Recommended Method |

|----------|-------------------|

| Low-D, need fastest convergence | DEER (full Jacobian) |

| High-D, memory-constrained | Stochastic quasi-DEER |

| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |

| Very long chains | Add sliding window |

| Approximate samples OK | Early stopping |

| Scenario | Recommended Method |

|:--------------------------------- |:------------------------------- |

| Low-D, need fastest convergence | DEER (full Jacobian) |

| High-D, memory-constrained | Stochastic quasi-DEER |

| HMC with many leapfrog steps | Block quasi-DEER on leapfrog |

| Very long chains | Add sliding window |

| Approximate samples OK | Early stopping |

github-actions · 2026-01-30T19:40:35Z

docs/parallel_mcmc_implementation_note.md

+| Approximate samples OK | Early stopping |
+| MALA with known Hessian structure | Quasi-DEER with preconditioning |
+
+---


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

---

* * *

github-actions · 2026-01-30T19:40:36Z

docs/parallel_mcmc_implementation_note.md

+- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
+- **Code**: https://github.com/lindermanlab/parallel-mcmc
+- **DEER theory**: Lim et al. [7], Danieli et al. [6]
+- **Quasi-DEER**: Gonzalez et al. [8]
+- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025

- **Code**: https://github.com/lindermanlab/parallel-mcmc

- **DEER theory**: Lim et al. [7], Danieli et al. [6]

- **Quasi-DEER**: Gonzalez et al. [8]

- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]

- **Paper**: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025

- **Code**: https://github.com/lindermanlab/parallel-mcmc

- **DEER theory**: Lim et al. [7], Danieli et al. [6]

- **Quasi-DEER**: Gonzalez et al. [8]

- **Stochastic diagonal estimation**: Hutchinson [22], Bekas et al. [20]

github-actions · 2026-01-30T19:40:37Z

docs/parallel_mcmc_implementation_plan.md

+- [ ] **7.3 Statistical Validation**
+  - [ ] Samples match target distribution (known distributions)
+  - [ ] Compare sample quality metrics (ESS, R-hat) to sequential
+
+- [ ] **7.4 Performance Benchmarks**
+  - [ ] Wall-clock time vs sequential
+  - [ ] Scaling with chain length T
+  - [ ] Memory usage profiling


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- [ ] **7.3 Statistical Validation**

- [ ] Samples match target distribution (known distributions)

- [ ] Compare sample quality metrics (ESS, R-hat) to sequential

- [ ] **7.4 Performance Benchmarks**

- [ ] Wall-clock time vs sequential

- [ ] Scaling with chain length T

- [ ] Memory usage profiling

> Ensure correctness and performance

github-actions · 2026-01-30T19:40:37Z

docs/parallel_mcmc_implementation_plan.md

+  - [ ] Scaling with chain length T
+  - [ ] Memory usage profiling
+
+---


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

---

- [ ] **7.1 Unit Tests**

+ [ ] Parallel scan correctness (compare to sequential)

+ [ ] Jacobian computation accuracy

+ [ ] Hutchinson estimator convergence

- [ ] **7.2 Integration Tests**

+ [ ] DEER converges to sequential MALA trace

+ [ ] DEER converges to sequential HMC trace

+ [ ] Quasi-DEER matches DEER (eventually)

- [ ] **7.3 Statistical Validation**

+ [ ] Samples match target distribution (known distributions)

+ [ ] Compare sample quality metrics (ESS, R-hat) to sequential

- [ ] **7.4 Performance Benchmarks**

+ [ ] Wall-clock time vs sequential

+ [ ] Scaling with chain length T

+ [ ] Memory usage profiling

* * *

github-actions · 2026-01-30T19:40:37Z

docs/parallel_mcmc_implementation_plan.md

+│   └── test_abstractmcmc.jl # ✅ 75 tests passing
+```
+
+---


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

---

* * *

github-actions · 2026-01-30T19:40:37Z

docs/parallel_mcmc_implementation_plan.md

+- ForwardDiff.jl - Jacobian computation
+- LinearAlgebra - Matrix operations


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- ForwardDiff.jl - Jacobian computation

- LinearAlgebra - Matrix operations

- ForwardDiff.jl - Jacobian computation

- LinearAlgebra - Matrix operations

github-actions · 2026-01-30T19:40:37Z

docs/parallel_mcmc_implementation_plan.md

+- CUDA.jl - GPU acceleration
+- Zygote.jl / Enzyme.jl - Alternative AD for JVPs
+- ChainRulesCore.jl - Custom rrules for stop-gradient


[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

- CUDA.jl - GPU acceleration

- Zygote.jl / Enzyme.jl - Alternative AD for JVPs

- ChainRulesCore.jl - Custom rrules for stop-gradient

xukai92 and others added 5 commits January 20, 2026 17:01

Add CLAUDE.md for Claude Code guidance

af1b5a6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions bot assigned xukai92 Jan 21, 2026

github-actions bot reviewed Jan 21, 2026

View reviewed changes

github-actions bot reviewed Jan 25, 2026

View reviewed changes

github-actions bot reviewed Jan 30, 2026

View reviewed changes

	- Good when L is large
	- Use block quasi-DEER for this (Section 2.4)
	- Convergence in ~L/2 iterations typically

	1. Initialize window at the start of the sequence
	2. Apply DEER update only within the window
	3. Shift window forward to the first non-converged timestep
	4. Repeat until the entire sequence converges

	- Run fewer Newton iterations for faster but approximate samples
	- The paper shows early-stopped parallel MALA can achieve comparable MMD to fully converged samples

	- Smaller step sizes ε → faster DEER convergence
	- Standard tuning (e.g., 80% acceptance rate) works well

	- Larger step sizes → more iterations to converge
	- Block quasi-DEER: ~2× fewer iterations than diagonal quasi-DEER
	- Parallelizing leapfrog is most beneficial when L (number of leapfrog steps) is large

	- Traditional: Parallelize across independent chains (batch size B)
	- New: Also parallelize across chain length T

	- ForwardDiff.jl for full Jacobians (small D)
	- Zygote.jl or Enzyme.jl for JVPs in stochastic quasi-DEER

	- Use `CUDA.jl` with `@cuda` kernels
	- Or `KernelAbstractions.jl` for portable GPU code
	- Ensure f is GPU-compatible (no scalar indexing, etc.)

	- Implement custom CUDA kernel for efficiency
	- Or use `Transducers.jl` with GPU support

	1. New sampler type: `ParallelHMC` or `DEER_HMC`
	2. Modify `Leapfrog` integrator: Add parallel mode
	3. New `sample` method: Returns full chain in one call

	rng::AbstractRNG,
	model,
	sampler::ParallelHMC,
	n_samples::Int;
	initial_θ = nothing

	\| Scenario \| Recommended Method \|
	\|----------\|-------------------\|
	\| Low-D, need fastest convergence \| DEER (full Jacobian) \|
	\| High-D, memory-constrained \| Stochastic quasi-DEER \|
	\| HMC with many leapfrog steps \| Block quasi-DEER on leapfrog \|
	\| Very long chains \| Add sliding window \|
	\| Approximate samples OK \| Early stopping \|

	- Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
	- Code: https://github.com/lindermanlab/parallel-mcmc
	- DEER theory: Lim et al. [7], Danieli et al. [6]
	- Quasi-DEER: Gonzalez et al. [8]
	- Stochastic diagonal estimation: Hutchinson [22], Bekas et al. [20]

	- [x] 1.1 Parallel Scan Implementation ✅
	- [x] Associative operator for affine transforms: `(J₂, u₂) ⊕ (J₁, u₁) = (J₂J₁, J₂u₁ + u₂)`
	- [x] Full matrix version (for DEER) - `MatrixAffineTransform`
	- [x] Diagonal version (for Quasi-DEER) - `DiagonalAffineTransform`
	- [x] Block-diagonal version (for Block Quasi-DEER / leapfrog) - `Block2x2AffineTransform`
	- [x] CPU implementation using `Base.accumulate`
	- [ ] GPU implementation (CUDA.jl) - deferred to Phase 6

	- [x] 1.2 Jacobian Computation Utilities ✅
	- [x] Full Jacobian computation (finite differences, AD-agnostic interface)
	- [x] Diagonal extraction from Jacobian
	- [x] Stochastic diagonal estimation (Hutchinson's method with Rademacher vectors)
	- [x] JVP (Jacobian-vector product) wrapper
	- [x] VJP (Vector-Jacobian product) wrapper
	- [x] Batch versions for computing at multiple points
	- [x] Hessian diagonal utilities (for leapfrog)

	- [x] 1.3 Core Types and Interfaces ✅
	- [x] `AbstractParallelMethod` with subtypes: `FullDEER`, `QuasiDEER`, `StochasticQuasiDEER`, `BlockQuasiDEER`
	- [x] `ParallelMCMCState` to hold trajectory and convergence info
	- [x] `ParallelMCMCSettings` configuration struct

	---

	> The main Newton iteration loop

	- [x] 2.1 Newton Iteration Framework ✅
	- [x] Main loop structure with convergence checking
	- [x] Parallel evaluation of f_t at all timesteps
	- [x] Parallel computation of Jacobians (method-dependent)
	- [x] Compute inputs u_t = f_t(s_{t-1}) - J_t * s_{t-1}
	- [x] Solve linear system via parallel scan
	- [x] Convergence criterion (max absolute difference)
	- [x] DEERResult type with trajectory, convergence info, residual history

	- [x] 2.2 Full DEER Implementation ✅
	- [x] Store and manipulate T × D × D Jacobian matrices
	- [x] Full matrix parallel scan

	- [x] 2.3 Quasi-DEER Implementation ✅
	- [x] Diagonal Jacobian storage (T × D)
	- [x] Elementwise parallel scan

	- [x] 2.4 Stochastic Quasi-DEER Implementation ✅
	- [x] Hutchinson estimator with configurable number of samples
	- [x] Rademacher vector generation (from Phase 1.2)
	- [x] JVP-based diagonal estimation

	> Parallelized Metropolis-Adjusted Langevin Algorithm

	- [x] 3.1 MALA Transition Function ✅
	- [x] Proposal: x̃ = x + ε∇log p(x) + √(2ε)ξ
	- [x] Acceptance ratio computation (forward and backward proposal densities)
	- [x] Stop-gradient trick for differentiable accept-reject
	- [x] Soft gating with sigmoid and straight-through estimator

	- [x] 3.2 Parallel MALA Sampler ✅
	- [x] MALARandomInputs type for pre-sampled (ξ, u) pairs
	- [x] sample_mala_inputs() for batch sampling
	- [x] parallel_mala() integrating with DEER framework
	- [x] sequential_mala() for reference/testing
	- [x] Convenience API with automatic input sampling

	- [ ] 3.3 MALA-specific Optimizations (deferred)
	- [ ] Preconditioning with Hessian eigendecomposition (optional)

	- [ ] 4.1 Approach A: Parallelize Across HMC Steps
	- [ ] Treat full HMC step as transition function f_t
	- [ ] Sequential leapfrog within each step
	- [ ] Parallel Newton across T HMC samples
	- [ ] Good when T >> L (many samples, few leapfrog steps)

	- [ ] 4.2 Approach B: Parallelize Leapfrog Integration
	- [ ] State is s = [x, v] (position + momentum)
	- [ ] Apply DEER to L leapfrog steps within each HMC step
	- [ ] Block Quasi-DEER with 2×2 block structure per dimension
	- [ ] Good when L is large

	- [ ] 4.3 Block Quasi-DEER for Leapfrog
	- [ ] Block Jacobian structure:
	```
	J = [ I_D ε*I_D ]
	[ εdiag(H) I_D + ε²diag(H) ]
	```
	- [ ] Efficient 2×2 block scan per dimension
	- [ ] Hessian diagonal computation

	- [ ] 4.4 Accept-Reject for HMC
	- [ ] Stop-gradient trick (same as MALA)
	- [ ] Momentum refresh handling

	---

	- [ ] 5.1 New Sampler Types
	- [ ] `ParallelHMC <: AbstractMCMCSampler`
	- [ ] `ParallelMALA <: AbstractMCMCSampler`

	- [ ] 5.2 AbstractMCMC Interface
	- [ ] Implement `AbstractMCMC.step` (or batch variant)
	- [ ] Implement `AbstractMCMC.sample` returning full chain
	- [ ] Handle RNG properly for reproducibility

	- [ ] 5.3 Integration with Existing Components
	- [ ] Use existing `Hamiltonian` type
	- [ ] Use existing `Metric` types
	- [ ] Use existing `PhasePoint` structure
	- [ ] Compatibility with existing gradient computation

Conversation

xukai92 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parallel MCMC Implementation (DEER Algorithm)

Status: Phases 1-5 Complete (420 tests passing)

Completed Phases

Phase 1: Core Infrastructure ✅

Phase 2: DEER Algorithm Core ✅

Phase 3: MALA Integration ✅

Phase 4: HMC Integration ✅

Phase 5: AdvancedHMC.jl Integration ✅

Remaining Phases

Usage Example

Test Summary

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

codecov bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jan 25, 2026

Choose a reason for hiding this comment

xukai92 commented Jan 21, 2026 •

edited

Loading

codecov bot commented Jan 21, 2026 •

edited

Loading

	> Optimizations and extensions

	- [ ] 6.1 Sliding Window
	- [ ] For memory-constrained high-dimensional problems
	- [ ] Window initialization and shifting logic
	- [ ] Convergence tracking per window

	- [ ] 6.2 Early Stopping
	- [ ] Return approximate samples before full convergence
	- [ ] Quality-vs-time tradeoff parameter

	- [ ] 6.3 GPU Support
	- [ ] CUDA.jl parallel scan kernel
	- [ ] GPU-compatible transition functions
	- [ ] Batch Jacobian computation on GPU

	- [ ] 6.4 Diagnostics and Monitoring
	- [ ] Iteration count tracking
	- [ ] Convergence history
	- [ ] Per-timestep residual monitoring

	> Ensure correctness and performance

	- [ ] 7.1 Unit Tests
	- [ ] Parallel scan correctness (compare to sequential)
	- [ ] Jacobian computation accuracy
	- [ ] Hutchinson estimator convergence

	- [ ] 7.2 Integration Tests
	- [ ] DEER converges to sequential MALA trace
	- [ ] DEER converges to sequential HMC trace
	- [ ] Quasi-DEER matches DEER (eventually)

	- [ ] 7.3 Statistical Validation
	- [ ] Samples match target distribution (known distributions)
	- [ ] Compare sample quality metrics (ESS, R-hat) to sequential

	- [ ] 7.4 Performance Benchmarks
	- [ ] Wall-clock time vs sequential
	- [ ] Scaling with chain length T
	- [ ] Memory usage profiling

	- ForwardDiff.jl - Jacobian computation
	- LinearAlgebra - Matrix operations

	- CUDA.jl - GPU acceleration
	- Zygote.jl / Enzyme.jl - Alternative AD for JVPs
	- ChainRulesCore.jl - Custom rrules for stop-gradient

	1. Start with Phase 1.1 - Parallel scan is the foundation
	2. Then Phase 1.2-1.3 - Jacobian utilities and types
	3. Phase 2 - Core DEER loop (test with simple linear system first)
	4. Phase 3 - MALA (simpler than HMC, good first MCMC target)
	5. Phase 4 - HMC integration
	6. Phase 5 - AdvancedHMC.jl integration
	7. Phase 6-7 - Advanced features and thorough testing

	\| Date \| Phase \| Item \| Status \| Notes \|
	\|------\|-------\|------\|--------\|-------\|
	\| 2026-01-20 \| - \| Initial plan created \| ✅ \| Based on implementation note \|
	\| 2026-01-20 \| 1.1 \| Parallel scan implementation \| ✅ \| All 3 variants (matrix, diagonal, block 2x2) \|
	\| 2026-01-20 \| 1.3 \| Core types and interfaces \| ✅ \| Types for methods, settings, state \|
	\| 2026-01-20 \| 7.1 \| Parallel scan unit tests \| ✅ \| 141 tests passing \|
	\| 2026-01-20 \| 1.2 \| Jacobian computation utilities \| ✅ \| FD-based, Hutchinson estimator, JVP/VJP \|
	\| 2026-01-20 \| 7.1 \| Jacobian utility tests \| ✅ \| 57 tests passing \|
	\| 2026-01-20 \| 1 \| Phase 1 Complete \| ✅ \| All core infrastructure done \|
	\| 2026-01-20 \| 2 \| DEER algorithm core \| ✅ \| Newton iteration, Full/Quasi/Stochastic DEER \|
	\| 2026-01-20 \| 7.1 \| DEER algorithm tests \| ✅ \| 67 tests passing \|
	\| 2026-01-20 \| 2 \| Phase 2 Complete \| ✅ \| Core DEER algorithm working \|
	\| 2026-01-20 \| 3 \| MALA transition function \| ✅ \| Proposal, acceptance, soft gating \|
	\| 2026-01-20 \| 3 \| Parallel MALA sampler \| ✅ \| Integrated with DEER framework \|
	\| 2026-01-20 \| 7.1 \| MALA tests \| ✅ \| 31 tests passing \|
	\| 2026-01-20 \| 3 \| Phase 3 Complete \| ✅ \| Parallel MALA working \|

	---

	1. AD Backend Choice: ForwardDiff for full Jacobians, but what for JVPs in stochastic quasi-DEER? Zygote vs Enzyme?
	- Decision: Use whichever works best; start with ForwardDiff

	2. GPU Priority: Should GPU support be Phase 1 or deferred to Phase 6?
	- Decision: Deferred to Phase 6; focus on CPU correctness first

	3. Integration Approach: Should we modify existing HMC/Leapfrog types or create entirely new parallel variants?
	- Decision: Be compatible with existing abstractions and API design as much as possible

	4. Testing Strategy: What target distributions should we use for validation? Gaussian (analytical), banana, Neal's funnel?

	- Paper: Zoltowski et al., "Parallelizing MCMC Across the Sequence Length", NeurIPS 2025
	- Reference implementation: https://github.com/lindermanlab/parallel-mcmc (JAX)
	- DEER theory: Lim et al., Danieli et al.
	- Hutchinson estimator: Hutchinson (1990), Bekas et al.