I'm looking into the possibility of using DynamicHMC.jl for lattice model simulations. For such problems, the number of degrees of freedom ("parameters") should, typically, be as large as possible - O(10^6) is not uncommon.
Now, because of this large number of degrees of freedom, I was thinking of taking advantage of GPU processing to accelerate the log-density and gradient calculations to be fed to the HMC sampler. However, before I do any deep dive into the code base of DynamicHMC, I'd like to know if you think this is feasible...
As a starting point for any discussion: Here's some exploratory testing, from which I got stuck:
using LogDensityProblems, DynamicHMC, DynamicHMC.Diagnostics
using Parameters, Statistics, Random
using Zygote, CuArrays
#Both value and gradient in same calculation:
function value_and_gradient(f, x...)
value, back = Zygote.pullback(f, x...)
grad = back(1)[1]
return value, grad
end
#### Define the problem ####
struct LogNormalTest
n::Int
end
(ℓ::LogNormalTest)(x) = -sum(x.^2)
LogDensityProblems.capabilities(::LogNormalTest) = LogDensityProblems.LogDensityOrder{1}()
LogDensityProblems.dimension(ℓ::LogNormalTest) = ℓ.n
LogDensityProblems.logdensity(ℓ::LogNormalTest, x) = ℓ(x)
LogDensityProblems.logdensity_and_gradient(ℓ::LogNormalTest, x) = value_and_gradient(ℓ, x)
#### Testing the problem ####
n = 100 #Should, ideally, be much larger...
ℓ = LogNormalTest(n)
x0 = randn(n) |> cu #Make a CuArray
LogDensityProblems.dimension(ℓ) #Works
LogDensityProblems.logdensity(ℓ, x0) #Works
LogDensityProblems.logdensity_and_gradient(ℓ, x0) #Works
#### HMC ####
# rng = Random.GLOBAL_RNG #Works with this, but only on the CPU..?
rng = CURAND.RNG() #Fails with this... But more is probably needed?
results = mcmc_with_warmup(rng, ℓ, 10) #Errors with "KernelError: passing and using non-bitstype argument"
(I'm on DynamicHMC v2.1.0, CuArrays v1.4.7, Zygote v0.4.1 )
I'm looking into the possibility of using DynamicHMC.jl for lattice model simulations. For such problems, the number of degrees of freedom ("parameters") should, typically, be as large as possible - O(10^6) is not uncommon.
Now, because of this large number of degrees of freedom, I was thinking of taking advantage of GPU processing to accelerate the log-density and gradient calculations to be fed to the HMC sampler. However, before I do any deep dive into the code base of DynamicHMC, I'd like to know if you think this is feasible...
As a starting point for any discussion: Here's some exploratory testing, from which I got stuck:
(I'm on DynamicHMC v2.1.0, CuArrays v1.4.7, Zygote v0.4.1 )