Parallelization is a useful feature of multi-core processors that improves computation time by running multiple independent processes simultaneously.

Due to how NIMBLE builds models and algorithms and automatically generates C++ code, parallelizating NIMBLE code requires some care. Naive approaches such as building a single model using standard parallelization packages in R to parallelize an MCMC are likely to fail, or worse, behave incorrectly without failing.

That said, parallelizing NIMBLE pipelines is possible and can be very convenient if done properly. The key consideration is to ensure that all NIMBLE execution, including model building, is conducted inside the parallelized code. This ensures that all models and algorithms are independent objects that don’t interfere with each other.

Note that we have work in progress to make parallelizing NIMBLE code easier and more powerful.

Parallelization in practice

In this example, we’ll use basic functionality from the parallel package to run MCMC in parallel chains.


First we create a cluster, specifying the number of cores we want the cluster to operate across.

this_cluster <- makeCluster(4)

(Side note: do not use makeForkCluster when parallelizing NIMBLE code; it will not work. makeCluster sets up a PSOCK cluster, which will work. If you really want to use forking, if you are able to set up separate directories for NIMBLE’s generated code for each worker, that should allow one to use forked processes. One option is to use unixtools::set.tempdir() and another is to use the dirName argument to compileNimble().)

We’ll set up a situation where we run four independent MCMC simulations. As part of this, we’ll also calculate WAIC, a model selection tool, from the combined chains. Note that calculating WAIC in this way only allows calculation of the default conditional WAIC and requires that all parent nodes of data nodes be monitored, as discussed further in help(calculateWAIC).

The key is to create a function that includes all the modeling steps and run that function in parallel. Note that myCode could have been created outside the parallelized code, but all the other steps need to be done inside the parallelized code.



# Simulate some data
myData <- rgamma(1000, shape = 0.4, rate = 0.8)

## nimble version 1.2.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit https://R-nimble.org.
## Note for advanced users who have written their own MCMC samplers:
##   As of version 0.13.0, NIMBLE's protocol for handling posterior
##   predictive nodes has changed in a way that could affect user-defined
##   samplers in some situations. Please see Section 15.5.1 of the User Manual.
## Attaching package: 'nimble'
## The following object is masked from 'package:stats':
##     simulate
myCode <- nimbleCode({
  a ~ dunif(0, 100)
  b ~ dnorm(0, 100)
  for (i in 1:length_y) {
    y[i] ~ dgamma(shape = a, rate = b)

# Create a function with all the needed code
run_MCMC_allcode <- function(seed, data, code, useWAIC = TRUE) {
  myModel <- nimbleModel(code = code,
                          data = list(y = data),
                          constants = list(length_y = 1000),
                          inits = list(a = 0.5, b = 0.5))
  CmyModel <- compileNimble(myModel)

    monitors <- myModel$getParents(myModel$getNodeNames(dataOnly = TRUE), stochOnly = TRUE)
  ## One may also wish to add additional monitors
  myMCMC <- buildMCMC(CmyModel, monitors = monitors)
  CmyMCMC <- compileNimble(myMCMC)
  results <- runMCMC(CmyMCMC, niter = 10000, setSeed = seed)

Now, we execute the desired code using parLapply, which is equivalent to lapply with each process running in parallel.

chain_output <- parLapply(cl = this_cluster, X = 1:4, 
                          fun = run_MCMC_allcode, 
                          data = myData, code = myCode,
                          useWAIC = useWAIC)

# It's good practice to close the cluster when you're done with it.

Alternatively, we could use foreach with the doParallel backend (via registerDoParallel, which uses makeCluster by default under the hood) to invoke run_MCMC_allcode in parallel.

We ran four independent MCMCs.

par(mfrow = c(2,2))
for (i in 1:4) {
  this_output <- chain_output[[i]]
  plot(this_output[,"b"], type = "l", ylab = 'b')

Calculating WAIC

We can calculate WAIC using NIMBLE’s off-line WAIC function, passing in the full matrix of samples from all the chains.

if(useWAIC) {
  ## We'll need a model object with which to do the WAIC calculation on the MCMC samples.
  myModel <- nimbleModel(code = myCode,
                         data = list(y = myData),
                         constants = list(length_y = 1000))
  CmyModel <- compileNimble(myModel)         # calculateWAIC needs compiled model to exist
  samples <- do.call(rbind, chain_output)    # single matrix of samples
  waic <- calculateWAIC(samples, myModel)
## Defining model
## Building model
## Setting data and initial values
## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.
## Checking model sizes and dimensions
##   [Note] This model is not fully initialized. This is not an error.
##          To see which variables are not initialized, use model$initializeInfo().
##          For more information on model initialization, see help(modelInitialization).
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## Calculating WAIC.
## nimbleList object of type waicNimbleList
## Field "WAIC":
## [1] -314.8691
## Field "lppd":
## [1] 159.4573
## Field "pWAIC":
## [1] 2.02273

Setting up initial values with parallelization

In the example above, we provided a different random number seed to use for each chain. However we didn’t start the chain from different (dispersed) starting values. That’s often a good idea as part of assessing convergence. To do so, you could generate random initial values inside the function that would then differ between the chains.

Alternatively, you could generate your initial values outside of the parallelization (either randomly or via some other approach) and pass in the initial values as an argument to run_MCMC_allcode. Here’s an example:

per_chain_info <- list(
  list(seed = 1,
       inits = list(a = 0.5, b = 0.5)),
  list(seed = 2,
       inits = list(a = 0.3, b = 0.3)),
  list(seed = 3,
       inits = list(a = 0.2, b = 0.6)),
  list(seed = 4,
       inits = list(a = 0.6, b = 0.2)))

run_MCMC_allcode <- function(info, data) {
  ## Remaining code as shown above, except modify the `nimbleModel`
  ## call to use `info$seed` and `info$inits`.
  ## ....
chain_output <- parLapply(cl = this_cluster, X = per_chain_info, 
                          fun = run_MCMC_allcode, 
                          data = myData)

# It's good practice to close the cluster when you're done with it.

Using user-defined distributions, functions, or samplers with parallelization

Note that NIMBLE generally expects user-defined distributions, functions, and samplers to be defined in the global environment. As discussed above, for parallelization, it is important to do all the model and MCMC building and compilation inside a function. Therefore, in order to achieve this but still have the user-defined elements in the global environment we recommend using assign to copy them into the global environment from within your function.

After defining your distribution, function, or sampler inside your overall function, here is how you can do that:

## for a user-defined function
assign('dfoo', dfoo, envir = .GlobalEnv)
assign('rfoo', rfoo, envir = .GlobalEnv)
## for a user-defined distribution
assign('myfun', myfun, envir = .GlobalEnv)
## for a user-defined sampler
assign('sampler_mySampler', sampler_mySampler, envir = .GlobalEnv)