Chapter 7 MCMC
NIMBLE provides a variety of paths to creating and executing an MCMC algorithm, which differ greatly in their simplicity of use, and also in the options available and customizability.
The most direct approach to invoking the MCMC engine is using the nimbleMCMC
function (Section 7.1). This one-line call creates and executes an MCMC, and provides a wide range of options for controlling the MCMC: specifying monitors, burn-in, and thinning, running multiple MCMC chains with different initial values, and returning posterior samples, summary statistics, and/or a WAIC value. However, this approach is restricted to using NIMBLE’s default MCMC algorithm; further customization of, for example, the specific samplers employed, is not possible.
The lengthier and more customizable approach to invoking the MCMC engine on a particular NIMBLE model object involves the following steps:
(Optional) Create and customize an MCMC configuration for a particular model:
Use
configureMCMC
to create an MCMC configuration (see Section 7.2). The configuration contains a list of samplers with the node(s) they will sample.(Optional) Customize the MCMC configuration:
- Add, remove, or re-order the list of samplers (Section 7.11 and
help(samplers)
in R for details), including adding your own samplers (Section 15.5); - Change the tuning parameters or adaptive properties of individual samplers;
- Change the variables to monitor (record for output) and thinning intervals for MCMC samples.
- Add, remove, or re-order the list of samplers (Section 7.11 and
Use
buildMCMC
to build the MCMC object and its samplers either from the model (using default MCMC configuration) or from a customized MCMC configuration (Section 7.3).Compile the MCMC object (and the model), unless one is debugging and wishes to run the uncompiled MCMC.
Run the MCMC and extract the samples (Sections 7.5, 7.6 and 7.7).
Optionally, calculate the WAIC (Section 7.8).
Prior to version 0.8.0, NIMBLE provided two additional functions, MCMCsuite
and compareMCMCs
, to facilitate comparison of multiple MCMC algorithms, either internal or external to NIMBLE. Those capabilities have been redesigned and moved into a separate package called compareMCMCs
.
End-to-end examples of MCMC in NIMBLE can be found in Sections 2.5-2.6 and Section 7.12.
7.1 One-line invocation of MCMC: nimbleMCMC
The most direct approach to executing an MCMC algorithm in NIMBLE is using nimbleMCMC
. This single function can be used to create an underlying model and associated MCMC algorithm, compile both of these, execute the MCMC, and return samples, summary statistics, and a WAIC value. This approach circumvents the longer (and more flexible) approach using nimbleModel
, configureMCMC
, buildMCMC
, compileNimble
, and runMCMC
, which is described subsequently.
The nimbleMCMC
function provides control over the:
- number of MCMC iterations in each chain;
- number of MCMC chains to execute;
- number of burn-in samples to discard from each chain;
- thinning interval on which samples should be recorded;
- model variables to monitor and return posterior samples;
- initial values, or a function for generating initial values for each chain;
- setting the random number seed;
- returning posterior samples as a matrix or a
coda
mcmc
object; - returning posterior summary statistics; and
- returning a WAIC value calculated using post-burn-in samples from all chains.
This entry point for using nimbleMCMC
is the code
, constants
, data
, and inits
arguments that are used for building a NIMBLE model (see Chapters 5 and 6). However, when using nimbleMCMC
, the inits
argument can also specify a list of lists of initial values that will be used for each MCMC chain, or a function that generates a list of initial values, which will be generated at the onset of each chain. As an alternative entry point, a NIMBLE model
object can also be supplied to nimbleMCMC
, in which case this model will be used to build the MCMC algorithm.
Based on its arguments, nimbleMCMC
optionally returns any combination of
- Posterior samples,
- Posterior summary statistics, and
- WAIC value.
The above are calculated and returned for each MCMC chain, using the post-burn-in and thinned samples. Additionally, posterior summary statistics are calculated for all chains combined when multiple chains are run.
Several example usages of nimbleMCMC
are shown below:
code <- nimbleCode({
mu ~ dnorm(0, sd = 1000)
sigma ~ dunif(0, 1000)
for(i in 1:10)
x[i] ~ dnorm(mu, sd = sigma)
})
data <- list(x = c(2, 5, 3, 4, 1, 0, 1, 3, 5, 3))
initsFunction <- function() list(mu = rnorm(1,0,1), sigma = runif(1,0,10))
# execute one MCMC chain, monitoring the "mu" and "sigma" variables,
# with thinning interval 10. fix the random number seed for reproducible
# results. by default, only returns posterior samples.
mcmc.out <- nimbleMCMC(code = code, data = data, inits = initsFunction,
monitors = c("mu", "sigma"), thin = 10,
niter = 20000, nchains = 1, setSeed = TRUE)
# note that the inits argument to nimbleModel must be a list of
# initial values, whereas nimbleMCMC can accept inits as a function
# for generating new initial values for each chain.
initsList <- initsFunction()
Rmodel <- nimbleModel(code, data = data, inits = initsList)
# using the existing Rmodel object, execute three MCMC chains with
# specified burn-in. return samples, summary statistics, and WAIC.
mcmc.out <- nimbleMCMC(model = Rmodel,
niter = 20000, nchains = 3, nburnin = 2000,
summary = TRUE, WAIC = TRUE)
# run ten chains, generating random initial values for each
# chain using the inits function specified above.
# only return summary statistics from each chain; not all the samples.
mcmc.out <- nimbleMCMC(model = Rmodel, nchains = 10, inits = initsFunction,
samples = FALSE, summary = TRUE)
See help(nimbleMCMC)
for further details.
7.2 The MCMC configuration
The MCMC configuration contains information needed for building an MCMC. When no customization is needed, one can jump directly to the buildMCMC
step below. An MCMC configuration is an object of class MCMCconf
, which includes:
- The model on which the MCMC will operate
- The model nodes which will be sampled (updated) by the MCMC
- The samplers and their internal configurations, called control parameters
- Two sets of variables that will be monitored (recorded) during execution of the MCMC and thinning intervals for how often each set will be recorded. Two sets are allowed because it can be useful to monitor different variables at different intervals
7.2.1 Default MCMC configuration
Assuming we have a model named Rmodel
, the following will generate a default MCMC configuration:
The default configuration will contain a single sampler for each node in the model, and the default ordering follows the topological ordering of the model.
7.2.1.1 Default assignment of sampler algorithms
The default sampler assigned to a stochastic node is determined by the following, in order of precedence:
- If the node has no data nodes in its entire downstream dependency network, a
posterior_predictive
sampler is assigned. This sampler updates the node in question and all downstream stochastic nodes by simulating new values from each node’s conditional distribution. As of version 0.13.0, the operation of theposterior_predictive
sampler has changed in order to improve MCMC mixing. See Section 7.2.1.2. - If the node has a conjugate relationship between its prior distribution and the distributions of its stochastic dependents, a
conjugate
(`Gibbs’) sampler is assigned. - If the node follows a multinomial distribution, then a
RW_multinomial
sampler is assigned. This is a discrete random-walk sampler in the space of multinomial outcomes. - If a node follows a Dirichlet distribution, then a
RW_dirichlet
sampler is assigned. This is a random walk sampler in the space of the simplex defined by the Dirichlet. - If a node follows an LKJ correlation distribution, then a
RW_block_lkj_corr_cholesky
sampler is assigned. This is a block random walk sampler in a transformed space where the transformation uses the signed stickbreaking approach described in Section 10.12 of Stan Development Team (2021b). - If the node follows any other multivariate distribution, then a
RW_block
sampler is assigned for all elements. This is a Metropolis-Hastings adaptive random-walk sampler with a multivariate normal proposal (Roberts and Sahu 1997). As of NIMBLE version 1.3.0, one can instead use thebarker
sampler if one setsnimbleOptions(MCMCuseBarkerAsDefaultMV = TRUE)
. - If the node is binary-valued (strictly taking values 0 or 1), then a
binary
sampler is assigned. This sampler calculates the conditional probability for both possible node values and draws the new node value from the conditional distribution, in effect making a Gibbs sampler. - If the node is otherwise discrete-valued, then a
slice
sampler is assigned (R. M. Neal 2003). - If none of the above criteria are satisfied, then a
RW
sampler is assigned. This is a Metropolis-Hastings adaptive random-walk sampler with a univariate normal proposal distribution.
Details of each sampler and its control parameters can be found by invoking help(samplers)
.
7.2.1.2 Sampling posterior predictive nodes
A posterior predictive node is a node that is not itself data and has no data nodes in its entire downstream (descendant) dependency network. Such nodes play no role in inference for model parameters but have often been included in BUGS models to accomplish posterior predictive checks and similar calculations.
As of version 0.13.0, NIMBLE’s handling of posterior predictive nodes in MCMC sampling has changed in order to improve MCMC mixing. Samplers for nodes that are not posterior predictive nodes no longer condition on the values of the posterior predictive nodes. This produces a valid MCMC over the posterior distribution marginalizing over the posterior predictive nodes. This MCMC will generally mix better than an MCMC that conditions on the values of posterior predictive nodes, by reducing the dimensionality of the parameter space and removing the dependence between the sampled nodes and the posterior predictive nodes. At the end of each MCMC iteration, the posterior predictive nodes are sampled by posterior_predictive
sampler(s) based on their conditional distribution(s).
7.2.1.3 Options to control default sampler assignments
Very basic control of default sampler assignments is provided via two arguments to configureMCMC
. The useConjugacy
argument controls whether conjugate samplers are assigned when possible, and the multivariateNodesAsScalars
argument controls whether scalar elements of multivariate nodes are sampled individually. See help(configureMCMC)
for usage details.
7.2.1.4 Default monitors
The default MCMC configuration includes monitors on all top-level stochastic nodes of the model. Only variables that are monitored will have their samples saved for use outside of the MCMC. MCMC configurations include two sets of monitors, each with different thinning intervals. By default, the second set of monitors (monitors2
) is empty.
7.2.1.5 Automated parameter blocking
The default configuration may be replaced by one generated from an automated parameter blocking algorithm. This algorithm determines groupings of model nodes that, when jointly sampled with a RW_block
sampler, increase overall MCMC efficiency. Overall efficiency is defined as the effective sample size of the slowest-mixing node divided by computation time. This is done by:
Note that this using autoBlock = TRUE
compiles and runs MCMCs, progressively exploring different sampler assignments, so it takes some time and generates some output. It is most useful for determining effective blocking strategies that can be re-used for later runs. The additional control argument autoIt
may also be provided to indicate the number of MCMC samples to be used in each trial of the automated blocking procedure (default 20,000).
7.2.2 Customizing the MCMC configuration
The MCMC configuration may be customized in a variety of ways, either through additional named arguments to configureMCMC
or by calling methods of an existing MCMCconf
object.
7.2.2.1 Controlling which nodes to sample
One can create an MCMC configuration with default samplers on just a particular set of nodes using the nodes
argument to configureMCMC
. The value for the nodes
argument may be a character vector containing node and/or variable names. In the case of a variable name, a default sampler will be added for all stochastic nodes in the variable. The order of samplers will match the order of nodes
. Any deterministic nodes will be ignored.
If a data node is included in nodes
, it will be assigned a sampler. This is the only way in which a default sampler may be placed on a data node and will result in overwriting data values in the node.
7.2.2.2 Creating an empty configuration
If you plan to customize the choice of all samplers, it can be useful to obtain a configuration with no sampler assignments at all. This can be done by any of nodes = NULL
, nodes = character()
, or nodes = list()
.
7.2.2.3 Overriding the default sampler control list values
The default values of control list elements for all sampling algorithms may be overridden through use of the control
argument to configureMCMC
, which should be a named list.
Named elements in the control
argument will be used for all default samplers and any subsequent sampler added via addSampler
(see below). For example, the following will create the default MCMC configuration, except all RW
samplers will have their initial scale
set to 3, and none of the samplers (RW
, or otherwise) will be adaptive.
When adding samplers to a configuration using addSampler
, the default control list can also be over-ridden.
7.2.2.4 Adding samplers to the configuration: addSampler
Additional samplers may be added to a configuration using the addSampler
method of the MCMCconf
object. The addSampler
method has two modes of operation, determined by the default
argument.
When default = TRUE
, addSampler
will assign NIMBLE’s default sampling algorithm for each node specified following the same protocol as configureMCMC
. This may include conjugate samplers, multivariate samplers, or otherwise, also using additional arguments to guide the selection process (for example, useConjugacy
and multivariateNodesAsScalars
). In this mode of operation, the type
argument is not used.
When default = FALSE
, or when this argument is omitted, addSampler
uses the type
argument to specify the precise sampler to assign. Instances of this particular sampler are assigned to all nodes specified. The type
argument may be provided as a character string or a nimbleFunction object. Valid character strings are indicated in help(samplers)
(do not include "sampler_"
). Added samplers can be labeled with a name
argument, which is used in output of printSamplers
. Writing a new sampler as a nimbleFunction is covered in Section 15.5.
Regardless of the mode of operation, nodes are specified using either the target
or the nodes
argument. The target
argument does not undergo expansion to constituent nodes (unless default = TRUE
), and thus only a single sampler is added. The nodes
argument is always expanded to the underlying nodes, and separate samplers are added for each node. Eithe argument is provided as a character vector. Newly added samplers will be appended to the end of current sampler list. Adding a sampler for a node will not remove existing samplers operating on that node.
The hierarchy of precedence for control list elements for added samplers is:
- The
control
list argument provided toaddSampler
; - The original
control
list argument provided toconfigureMCMC
; - The default values, as defined in the sampling algorithm
setup
function.
See help(addSampler)
for more details.
7.2.2.5 Printing, re-ordering, modifying and removing samplers: printSamplers, removeSamplers, setSamplers, and getSamplerDefinition
The current, ordered, list of all samplers in the MCMC configuration may be printed by calling the printSamplers
method. When you want to see only samplers acting on specific model nodes or variables, provide those names as an argument to printSamplers
. The printSamplers
method accepts arguments controlling the level of detail displayed as discussed in its R help information.
# Print all samplers
mcmcConf$printSamplers()
# Print all samplers operating on node "a[1]",
# or any of the "beta[]" variables
mcmcConf$printSamplers(c("a[1]", "beta"))
# Print all conjugate and slice samplers
mcmcConf$printSamplers(type = c("conjugate", "slice"))
# Print all RW samplers operating on "x"
mcmcConf$printSamplers("x", type = "RW")
# Print the first 100 samplers
mcmcConf$printSamplers(1:100)
# Print all samplers in their order of execution
mcmcConf$printSamplers(executionOrder = TRUE)
Samplers may be removed from the configuration object using removeSamplers
, which accepts a character vector of node or variable names, or a numeric vector of indices.
# Remove all samplers acting on "x" or any component of it
mcmcConf$removeSamplers("x")
# Remove all samplers acting on "alpha[1]" and "beta[1]"
mcmcConf$removeSamplers(c("alpha[1]", "beta[1]"))
# Remove the first five samplers
mcmcConf$removeSamplers(1:5)
# Providing no argument removes all samplers
mcmcConf$removeSamplers()
Samplers to retain may be specified reordered using setSamplers
, which also accepts a character vector of node or variable names, or a numeric vector of indices.
# Set the list of samplers to those acting on any components of the
# model variables "x", "y", or "z".
mcmcConf$setSamplers(c("x", "y", "z"))
# Set the list of samplers to only those acting on model nodes
# "alpha[1]", "alpha[2]", ..., "alpha[10]"
mcmcConf$setSamplers("alpha[1:10]")
# Truncate the current list of samplers to the first 10 and the 100th
mcmcConf$setSamplers(ind = c(1:10, 100))
The nimbleFunction definition underlying a particular sampler may be viewed using the getSamplerDefinition
method, using the sampler index as an argument. A node name argument may also be supplied, in which case the definition of the first sampler acting on that node is returned. In all cases, getSamplerDefinition
only returns the definition of the first sampler specified either by index or node name.
7.2.2.6 Customizing individual sampler configurations: getSamplers, setSamplers, setName, setSamplerFunction, setTarget, and setControl
Each sampler in an MCMCconf
object is represented by a sampler configuration as a samplerConf
object. Each samplerConf
is a reference class object containing the following (required) fields: name
(a character string), samplerFunction
(a valid nimbleFunction sampler), target
(the model node to be sampled), and control
(list of control arguments). The MCMCconf
method getSamplers
allows access to the samplerConf
objects. These can be modified and then passed as an argument to setSamplers
to over-write the current list of samplers in the MCMC configuration object. However, no checking of the validity of this modified list is performed; if the list of samplerConf objects is corrupted to be invalid, incorrect behavior will result at the time of calling buildMCMC
. The fields of a samplerConf
object can be modified using the access functions setName(name)
, setSamplerFunction(fun)
, setTarget(target, model)
, and setControl(control)
.
Here are some examples:
# retrieve samplerConf list
samplerConfList <- mcmcConf$getSamplers()
# change the name of the first sampler
samplerConfList[[1]]$setName("newNameForThisSampler")
# change the sampler function of the second sampler,
# assuming existance of a nimbleFunction 'anotherSamplerNF',
# which represents a valid nimbleFunction sampler.
samplerConfList[[2]]$setSamplerFunction(anotherSamplerNF)
# change the 'adaptive' element of the control list of the third sampler
control <- samplerConfList[[3]]$control
control$adaptive <- FALSE
samplerConfList[[3]]$setControl(control)
# change the target node of the fourth sampler
samplerConfList[[4]]$setTarget("y", model) # model argument required
# use this modified list of samplerConf objects in the MCMC configuration
mcmcConf$setSamplers(samplerConfList)
7.2.2.7 Customizing the sampler execution order
The ordering of sampler execution can be controlled as well. This allows for sampler functions to execute multiple times within a single MCMC iteration, or the execution of different sampler functions to be interleaved with one another.
The sampler execution order is set using the function setSamplerExecutionOrder
, and the current ordering of execution is retrieved using getSamplerExecutionOrder
. For example, assuming the MCMC configuration object mcmcConf
contains five samplers:
# first sampler to execute twice, in succession:
mcmcConf$setSamplerExecutionOrder(c(1, 1, 2, 3, 4, 5))
# first sampler to execute multiple times, interleaved:
mcmcConf$setSamplerExecutionOrder(c(1, 2, 1, 3, 1, 4, 1, 5))
# fourth sampler to execute 10 times, only
mcmcConf$setSamplerExecutionOrder(rep(4, 10))
# omitting the argument to setSamplerExecutionOrder()
# resets the ordering to each sampler executing once, sequentially
mcmcConf$setSamplerExecutionOrder()
# retrieve the current ordering of sampler execution
ordering <- mcmcConf$getSamplerExecutionOrder()
# print the sampler functions in the order of execution
mcmcConf$printSamplers(executionOrder = TRUE)
7.2.2.8 Monitors and thinning intervals: printMonitors, getMonitors, setMonitors, addMonitors, resetMonitors and setThin
An MCMC configuration object contains two independent sets of variables to monitor, each with their own thinning interval: thin
corresponding to monitors
, and thin2
corresponding to monitors2
. Monitors operate at the variable level. Only entire model variables may be monitored. Specifying a monitor on a node, e.g., x[1]
, will result in the entire variable x
being monitored.
The variables specified in monitors
and monitors2
will be recorded (with thinning interval thin
) into objects called mvSamples
and mvSamples2
, contained within the MCMC object. These are both modelValues objects; modelValues are NIMBLE data structures used to store multiple sets of values of model variables18. These can be accessed as the member data mvSamples
and mvSamples2
of the MCMC object, and they can be converted to matrices using as.matrix
or lists using as.list
(see Section 7.7).
Monitors may be added to the MCMC configuration either in the original call to configureMCMC
or using the addMonitors
method:
# Using an argument to configureMCMC
mcmcConf <- configureMCMC(Rmodel, monitors = c("alpha", "beta"),
monitors2 = "x")
# Calling a member method of the mcmcconf object
# This results in the same monitors as above
mcmcConf$addMonitors("alpha", "beta")
mcmcConf$addMonitors2("x")
A new set of monitor variables can be added to the MCMC configuration, overwriting the current monitors, using the setMonitors
method:
Similarly, either thinning interval may be set at either step:
# Using an argument to configureMCMC
mcmcConf <- configureMCMC(Rmodel, thin = 1, thin2 = 100)
# Calling a member method of the mcmcConf object
# This results in the same thinning intervals as above
mcmcConf$setThin(1)
mcmcConf$setThin2(100)
The current lists of monitors and thinning intervals may be displayed using the printMonitors
method. Both sets of monitors (monitors
and monitors2
) may be reset to empty character vectors by calling the resetMonitors
method. The methods getMonitors
and getMonitors2
return the currently specified monitors
and monitors2
as character vectors.
7.2.2.9 Monitoring model log-probabilities
To record model log-probabilities from an MCMC, one can add monitors for logProb variables (which begin with the prefix logProb_
) that correspond to variables with (any) stochastic nodes. For example, to record and extract log-probabilities for the variables alpha
, sigma_mu
, and Y
:
mcmcConf <- configureMCMC(Rmodel, enableWAIC = TRUE)
mcmcConf$addMonitors("logProb_alpha", "logProb_sigma_mu", "logProb_Y")
Rmcmc <- buildMCMC(mcmcConf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
Cmcmc$run(10000)
samples <- as.matrix(Cmcmc$mvSamples)
The samples
matrix will contain both MCMC samples and model log-probabilities.
7.2.2.10 Using numeric samples to define a prior distribution
A set of numeric samples, perhaps generated from another MCMC algorithm, can be used to define the prior distribution of model nodes. This is accomplished using the prior_samples
MCMC sampler. When assigning the prior_samples
sampler to a target
node, you must also provide a vector of numeric values (when target
is a scalar node), or a matrix of values in the multidimenional case. A new value (or rows of values) is selected either sequentially (or randomly) from this numeric vector/matrix, and assigned into the target
node on each MCMC iteration. See help(prior_samples)
for more details:
7.3 Building and compiling the MCMC
Once the MCMC configuration object has been created, and customized to one’s liking, it may be used to build an MCMC function:
buildMCMC
is a nimbleFunction. The returned object Rmcmc
is an instance of the nimbleFunction specific to configuration mcmcConf
(and of course its associated model).
Note that if you would like to be able to calculate the WAIC of the model, you should usually set enableWAIC = TRUE
as an argument toconfigureMCMC
(or to buildMCMC
if not using configureMCMC
), or set nimbleOptions(MCMCenableWAIC = TRUE)
, which will enable WAIC calculations for all subsequently built MCMC functions. For more information on WAIC calculations, including situations in which you can calculate WAIC without having set enableWAIC = TRUE
see Section 7.8 or help(waic)
in R.
When no customization is needed, one can skip configureMCMC
and simply provide a model object to buildMCMC
. The following two MCMC functions will be identical:
mcmcConf <- configureMCMC(Rmodel) # default MCMC configuration
Rmcmc1 <- buildMCMC(mcmcConf)
Rmcmc2 <- buildMCMC(Rmodel) # uses the default configuration for Rmodel
For speed of execution, we usually want to compile the MCMC function to C++ (as is the case for other NIMBLE functions). To do so, we use compileNimble
. If the model has already been compiled, it should be provided as the project
argument so the MCMC will be part of the same compiled project. A typical compilation call looks like:
Alternatively, if the model has not already been compiled, they can be compiled together in one line:
Note that if you compile the MCMC with another object (the model in this case), you’ll need to explicitly refer to the MCMC component of the resulting object to be able to run the MCMC:
7.4 Initializing MCMC
To see how to provide initial values, see help(runMCMC)
, help(nimbleMCMC)
, or help(nimbleModel)
.
It is often important to provide valid and reasonable initial values to an MCMC, either as fixed values or via an initialization function.
Not doing so can cause slow convergence or even failure of an MCMC algorithm (e.g., if the values the model is initialized with are not valid). When starting an MCMC, when NIMBLE encounters a missing parameter value, it simulates from the prior distribution. NIMBLE can be more sensitive to missing or bad starting values than some MCMC packages.
The following cases are particularly important to consider when initializing:
- In a model with flat priors (i.e., using
x~dflat()
orx~dhalfflat()
), NIMBLE cannot generate initial values from those priors. - In a model with diffuse priors, initializing from the prior can give unreasonable/extreme initial values.
- In a model with stochastic indices (e.g.,
x[idx[i]]
withidx[i]
unknown), those indices should have (valid) initial values. - In a model with constraints (via
dconstraint
), the model should be initialized such that the constraints are satisfied. - In a model with censoring (via
dinterval
), the initial value(s) oft
indinterval
for the censored node(s) should be consistent with the censoring that is specified.
That said, some MCMC algorithms (such a conjugate samplers) don’t use the initial values, so flat or diffuse priors on nodes assigned such samplers will not be a problem with regard to initialization.
Many user questions and problems end up being related to missing or bad initial values. Some suggestions for diagnosing initialization problems include:
- Run
model$initializeInfo()
to find uninitialized nodes. - Run
model$calculate()
to see if the full model log probability density can be calculated (values of-Inf
,NA
,NaN
indicate there are missing or invalid initial values). - To debug the specific node(s) whose initial value(s) are causing problems, run
model$calculate(nodes)
on specificnodes
to find their log probability density values, looking look for values of-Inf
,NA
, orNaN
. - For the node(s) with invalid log probability density, inspect the values of the node(s) and parameter(s) of the distribution(s) assigned to those node(s). You can inspect values of a variable in the model using natural R syntax,
model$variable_name
. - With a compiled or uncompiled model, you can run the model initialization (but no MCMC iterations) with
mcmc$run(niter = 0)
. Then inspect the values of model variables to see the results of initialization.
7.5 User-friendly execution of MCMC algorithms: runMCMC
Once an MCMC algorithm has been created using buildMCMC
, the function runMCMC
can be used to run multiple chains and extract posterior samples, summary statistics and/or a WAIC value. This is a simpler approach to executing an MCMC algorithm, than the process of executing and extracting samples as described in Sections 7.6 and 7.7.
runMCMC
also provides several user-friendly options such as burn-in, thinning, running multiple chains, and different initial values for each chain. However, using runMCMC
does not support several lower-level options, such as timing the individual samplers internal to the MCMC, continuing an existing MCMC run (picking up where it left off), or modifying the sampler execution ordering.
runMCMC
takes arguments that will control the following aspects of the MCMC:
- Number of iterations in each chain;
- Number of chains;
- Number of burn-in samples to discard from each chain;
- Thinning interval for recording samples;
- Initial values, or a function for generating initial values for each chain;
- Setting the random number seed;
- Returning the posterior samples as a
coda
mcmc
object; - Returning summary statistics calculated from each chains; and
- Returning a WAIC value calculated using (post-burn-in) samples from all chains.
The following examples demonstrate some uses of runMCMC
, and assume the existence of Cmcmc
, a compiled MCMC algorithm.
# run a single chain, and return a matrix of samples
mcmc.out <- runMCMC(Cmcmc)
# run three chains of 10000 samples, discard initial burn-in of 1000,
# record samples thereafter using a thinning interval of 10,
# and return of list of sample matrices
mcmc.out <- runMCMC(Cmcmc, niter=10000, nburnin=1000, thin=10, nchains=3)
# run three chains, returning posterior samples, summary statistics,
# and the WAIC value
mcmc.out <- runMCMC(Cmcmc, nchains = 3, summary = TRUE, WAIC = TRUE)
# run two chains, and specify the initial values for each
initsList <- list(list(mu = 1, sigma = 1),
list(mu = 2, sigma = 10))
mcmc.out <- runMCMC(Cmcmc, nchains = 2, inits = initsList)
# run ten chains of 100,000 iterations each, using a function to
# generate initial values and a fixed random number seed for each chain.
# only return summary statistics from each chain; not all the samples.
initsFunction <- function()
list(mu = rnorm(1,0,1), sigma = runif(1,0,100))
mcmc.out <- runMCMC(Cmcmc, niter = 100000, nchains = 10,
inits = initsFunction, setSeed = TRUE,
samples = FALSE, summary = TRUE)
See help(runMCMC)
for further details.
7.6 Running the MCMC
The MCMC algorithm (either the compiled or uncompiled version) can be executed using the member method mcmc$run
(see help(buildMCMC)
in R). The run
method has one required argument, niter
, the number of iterations to be run.
The run
method has optional arguments nburnin
, thin
and thin2
. These can be used to specify the number of pre-thinning burn-in samples to discard, and the post-burnin thinning intervals for recording samples (corresponding to monitors
and monitors2
). If either thin
and thin2
are provided, they will override the thinning intervals that were specified in the original MCMC configuration object.
7.6.1 Rerunning versus restarting an MCMC
The run
method has an optional reset
argument. When reset = TRUE
(the default value), the following occurs prior to running the MCMC:
- All model nodes are checked and filled or updated as needed, in valid (topological) order. If a stochastic node is missing a value, it is populated using a call to
simulate
and its log probability value is calculated. The values of deterministic nodes are calculated from their parent nodes. If any right-hand-side-only nodes (e.g., explanatory variables) are missing a value, an error results. - All MCMC sampler functions are reset to their initial state: the initial values of any sampler control parameters (e.g.,
scale
,sliceWidth
, orpropCov
) are reset to their initial values, as were specified by the original MCMC configuration. - The internal modelValues objects
mvSamples
andmvSamples2
are each resized to the appropriate length for holding the requested number of samples (niter/thin
, andniter/thin2
, respectively).
This means that one can begin a new run of an existing MCMC without having to rebuild or recompile the model or the MCMC. This can be helpful if one wants to use the same model and MCMC configuration, but with different initial values, different values of data nodes (but which nodes are data nodes must be the same), changes to covariate values(or other non-data, non-parameter values) in the model, or a different number of MCMC iterations, thinning interval or burn-in.
In contrast, when mcmc$run(niter, reset = FALSE)
is called, the MCMC picks up from where it left off, continuing the previous chain and expanding the output as needed. No values in the model are checked or altered, and sampler functions are not reset to their initial states. (Similarly, if using WAIC, one can use resetWAIC = FALSE
so that the WAIC calculation is based on all the samples from the expanded set of samples.)
The run
method also has an optional resetMV
argument. This argument is only considered when'reset
is set to FALSE
. When mcmc$run(niter, reset = FALSE, resetMV = TRUE)
is called the internal modelValues objects mvSamples
and mvSamples2
are each resized to the appropriate length for holding the requested number of samples (niter/thin
, and niter/thin2
, respectively) and the MCMC carries on from where it left off. In other words, the previously obtained samples are deleted (e.g. to reduce memory usage) prior to continuing the MCMC. The default value of resetMV
is FALSE
.
7.6.2 Measuring sampler computation times: getTimes
If you want to obtain the computation time spent in each sampler, you can set time=TRUE
as a run-time argument and then use the method getTimes()
obtain the times. For example,
will return a vector of the total time spent in each sampler, measured in seconds.
7.6.3 Assessing the adaption process of RW and RW_block samplers
If you’d like to see the evolution (over the iterations) of the acceptance proportion and proposal scale information, you can use some internal methods provided by NIMBLE, after setting two options to make the history accessible. Here we assume that cMCMC
is the compiled MCMC object and idx
is the numeric index of the sampler function you want to access from amongst the list of sampler functions that are part of the MCMC.
## set options to make history accessible
nimbleOptions(buildInterfacesForCompiledNestedNimbleFunctions = TRUE)
nimbleOptions(MCMCsaveHistory = TRUE)
## Next, set up and run your MCMC.
## Now access the history information:
Cmcmc$samplerFunctions[[idx]]$getScaleHistory()
Cmcmc$samplerFunctions[[idx]]$getAcceptanceHistory()
Cmcmc$samplerFunctions[[idx]]$getPropCovHistory() ## only for RW_block
Note that modifying elements of the control list may greatly
affect the performance of the RW_block
sampler. In particular, the sampler
can take a long time to find a good proposal covariance when the
elements being sampled are not on the same scale. We recommend
providing an informed value for ‘propCov’ in this case (possibly
simply a diagonal matrix that approximates the relative scales),
as well as possibly providing a value of ‘scale’ that errs on the
side of being too small. You may also consider decreasing
‘adaptFactorExponent’ and/or ‘adaptInterval’, as doing so has
greatly improved performance in some cases.
7.7 Extracting MCMC samples
After executing the MCMC, the output samples can be extracted as follows:
These modelValues objects can be converted into matrices using as.matrix
or lists using as.list
:
samplesMatrix <- as.matrix(mvSamples)
samplesList <- as.list(mvSamples)
samplesMatrix2 <- as.matrix(mvSamples2)
samplesList2 <- as.list(mvSamples2)
The column names of the matrices will be the node names of nodes in the monitored variables. Then, for example, the mean of the samples for node x[2]
could be calculated as:
The list version will contain an element for each variable that will be the size and shape of the variable with an additional index for MCMC iteration. By default MCMC iteration will be the first index, but including iterationAsLastIndex = TRUE
will make it the last index.
Obtaining samples as matrices or lists is most common, but see Section 14.1 for more about programming with modelValues objects, especially if you want to write nimbleFunctions to use the samples.
7.8 Calculating WAIC
The WAIC (Watanabe 2010) can be calculated from the posterior samples produced during the MCMC algorithm. Users have two options to calculate WAIC. The main approach requires enabling WAIC when setting up the MCMC and allows access to a variety of ways to calculate WAIC. This approach does not require that any specific monitors be set19 because the WAIC is calculated in an online manner, accumulating the necessary quantities to calculate WAIC as the MCMC is run.
The second approach allows one to calculate the WAIC after an MCMC has been run using an MCMC object or matrix (or dataframe) containing the posterior samples, but it requires the user to have correctly specified the monitored variables and only provides the default way to calculate WAIC. An advantage of the second approach is that one can specify additional burnin beyond that specified for the MCMC.
Here we first discuss the main approach and then close the section by showing the second approach. Specific details of the syntax are provided in help(waic)
.
To enable WAIC for the first approach, the argument enableWAIC = TRUE
must be supplied to configureMCMC
(or to buildMCMC
if not using configureWAIC
), or the MCMCenableWAIC
NIMBLE option must have been set to TRUE
.
The WAIC (as well as the pWAIC and lppd values) is extracted by the member method mcmc$getWAIC
(see help(waic)
in R for more details) or is available as the WAIC
element of the runMCMC
or nimbleMCMC
output lists. One can use the member method mcmc$getWAICdetails
for additional quantities related to WAIC that are discussed below.
Note that there is not a unique value of WAIC for a model. WAIC is calculated from Equations 5, 12, and 13 in Gelman, Hwang, and Vehtari (2014) (i.e., using WAIC2), as discussed in detail in Hug and Paciorek (2021). Therefore, NIMBLE provides user control over how WAIC is calculated in two ways.
First, by default NIMBLE provides the conditional WAIC, namely the version of WAIC where all parameters directly involved in the likelihood are treated as \(\theta\) for the purposes of Equation 5 from Gelman, Hwang, and Vehtari (2014). Users can request the marginal WAIC (see Ariyo et al. (2020)), namely the version of WAIC where latent variables are integrated over (i.e., using a marginal likelihood). This is done by providing the waicControl
list with a marginalizeNodes
element to configureMCMC
or buildMCMC
(when providing a model as the argument to buildMCMC
). See help(waic)
for more details.
Second, WAIC relies on a partition of the observations, i.e., ‘pointwise’ prediction. By default, in NIMBLE the sum over log pointwise predictive density values treats each data node as contributing a single value to the sum. When a data node is multivariate, that data node contributes a single value to the sum based on the joint density of the elements in the node. If one wants to group data nodes such that the joint density within each group is used, one can provide the waicControl
list with a dataGroups
element to configureMCMC
or buildMCMC
(when providing a model as the argument to buildMCMC
). See help(waic)
for more details.
Note that based on a limited set of simulation experiments in Hug and Paciorek (2021), our tentative recommendation is that users only use marginal WAIC if also using grouping.
Marginal WAIC requires using Monte Carlo simulation at each iteration of the MCMC to average over draws for the latent variables. To assess the stability of the marginal WAIC to the number of Monte Carlo iterations, one can examine how the WAIC changes with increasing iterations (up to the full number of iterations specified via the nItsMarginal
element of the waicControl
list) based on the WAIC_partialMC
, lppd_partialMC
, and pWAIC_partialMC
elements of the detailed WAIC output.
For comparing WAIC between two models, Vehtari, Gelman, and Gabry (2017) discuss using the per-observation (more generally, per-data group) contributions to the overall WAIC values to get an approximate standard error for the difference in WAIC between the models. These element-wise values are available as the WAIC_elements
, lppd_elements
, and pWAIC_elements
components of the detailed WAIC output.
The second overall approach to WAIC available in NIMBLE allows one to calculate WAIC post hoc after MCMC sampling using an MCMC object or matrix (or dataframe) containing posterior samples. Simply set up the MCMC and run it without enabling WAIC (but making sure to include in the monitors all stochastic parent nodes of the data nodes) and then use the function calculateWAIC
, as shown in this example:
samples <- runMCMC(Cmcmc, niter = 10000, nburnin = 1000)
## Using calculateWAIC with an MCMC object
calculateWAIC(Cmcmc)
## Using calculateWAIC with a matrix of samples
calculateWAIC(samples, model)
## Specifying additional burnin, so only last 5000 (1000+4000) iterations used
calculateWAIC(Cmcmc, burnin = 4000)
This approach only provides conditional WAIC without any grouping of data nodes (though one can achieve grouping by grouping data nodes into multivariate nodes).
7.9 k-fold cross-validation
The runCrossValidate
function in NIMBLE performs k-fold cross-validation on a nimbleModel
fit via MCMC. More information can be found by calling help(runCrossValidate)
.
7.10 Variable selection using Reversible Jump MCMC
A common method for Bayesian variable selection in regression-style problems is to define a space of different models and have MCMC sample over the models as well as the parameters for each model. The models differ in which explanatory variables are included. Often this idea is implemented in the BUGS language by writing the largest possible model and including indicator variables to turn regression parameters off and on (see model code below for an example of how indicator variables can be used). Foramally, that approach doesn’t sample between different models, instead embedding the models in one large model. However, that approach can result in poor mixing and require compromises in choices of prior distributions because a given regression parameter will sample from its prior when the corresponding indicator is 0. It is also computationally wasteful, because sampling effort will be spent on a coefficient even if it has no effect because the corresponding indicator is 0.
A different view of the problem is that the different combinations of coefficients represent models of different dimensions. Reversible Jump MCMC (Green 1995) (RJMCMC) is a general framework for MCMC simulation in which the dimension of the parameter space can vary between iterates of the Markov chain. The reversible jump sampler can be viewed as an extension of the Metropolis-Hastings algorithm onto more general state spaces. NIMBLE provides an implementation of RJMCMC for variable selection that requires the user to write the largest model of interest and supports use of indicator variables but formally uses RJMCMC sampling for better mixing and efficiency. When a coefficient is not in the model (or its indicator is 0), it will not be sampled, and it will therefore not follow its prior in that case.
In technical detail, given two models \(M_1\) and \(M_2\) of possibly different dimensions, the core idea of RJMCMC is to remove the difference in the dimensions of models \(M_1\) and \(M_2\) by supplementing the corresponding parameters \(\boldsymbol{\theta_1}\) and \(\boldsymbol{\theta_2}\) with auxiliary variables \(\boldsymbol{u}_{1 \rightarrow 2}\) and \(\boldsymbol{u}_{2 \rightarrow 1}\) such that \((\boldsymbol{\theta_1}, \boldsymbol{u}_{1 \rightarrow 2})\) and \((\boldsymbol{\theta_2}, \boldsymbol{u}_{2 \rightarrow 1})\) are in bijection, \((\boldsymbol{\theta_2}, \boldsymbol{u}_{2 \rightarrow 1}) = \Psi(\boldsymbol{\theta_1}, \boldsymbol{u}_{1 \rightarrow 2})\). The corresponding Metropolis-Hastings acceptance probability is generalized accounting for the proposal density for the auxiliary variables.
NIMBLE implements RJMCMC for variable selection using a univariate normal distribution centered on 0 (or some fixed value) as the proposal density for parameters being added to the model. Two ways to set up models for RJMCMC are supported, which differ by whether the inclusion probabilities for each parameter are assumed known or must be written in the model:
- If the inclusion probabilities are assumed known, then RJMCMC may be used with regular model code, i.e. model code written without heed to variable selection.
- If the inclusion probability is a parameter in the model, perhaps with its own prior, then RJMCMC requires that indicator variables be written in the model code. The indicator variables will be sampled using RJMCMC, but otherwise they are like any other nodes in the model.
The steps to set up variable selection using RJMCMC are:
- Write the model with indicator variables if needed.
- Configure the MCMC as usual with
configureMCMC
. - Modify the resulting MCMC configuration object with
configureRJ
.
The configureRJ
function modifies the MCMC configuration to (1) assign samplers that turn on and off variable in the model and (2) modify the existing samplers for the regression coefficients to use ‘toggled’ versions. The toggled versions invoke the samplers only when corresponding variable is currently in the model. In the case where indicator variables are included in the model, sampling to turn on and off variables is done using RJ_indicator
samplers. In the case where indicator variables are not included, sampling is done using RJ_fixed_prior
samplers.
In the following we provide an example for the two different model specifications.
7.10.1 Using indicator variables
Here we consider a normal linear regression in which two covariates x1
and x2
are candidates to be included in the model. The two corresponding coefficients are beta1
and beta2
. The indicator variables are z1
and z2
, and their inclusion probability is psi
. As described below, one can also use vectors for a set of coefficients and corresponding indicator variables.
## Linear regression with intercept and two covariates
code <- nimbleCode({
beta0 ~ dnorm(0, sd = 100)
beta1 ~ dnorm(0, sd = 100)
beta2 ~ dnorm(0, sd = 100)
sigma ~ dunif(0, 100)
z1 ~ dbern(psi) ## indicator variable associated with beta1
z2 ~ dbern(psi) ## indicator variable associated with beta2
psi ~ dbeta(1, 1) ## hyperprior on inclusion probability
for(i in 1:N) {
Ypred[i] <- beta0 + beta1 * z1 * x1[i] + beta2 * z2 * x2[i]
Y[i] ~ dnorm(Ypred[i], sd = sigma)
}
})
## simulate some data
set.seed(1)
N <- 100
x1 <- runif(N, -1, 1)
x2 <- runif(N, -1, 1) ## this covariate is not included
Y <- rnorm(N, 1.5 + 2 * x1, sd = 1)
## build the model and configure default MCMC
RJexampleModel <- nimbleModel(code, constants = list(N = N),
data = list(Y = Y, x1 = x1, x2 = x2),
inits = list(beta0 = 0, beta1 = 0, beta2 = 0,
sigma = sd(Y), z2 = 1, z1 = 1, psi = 0.5))
RJexampleConf <- configureMCMC(RJexampleModel)
## [1] conjugate_dnorm_dnorm_additive sampler: beta0
## [2] conjugate_dnorm_dnorm_linear sampler: beta1
## [3] conjugate_dnorm_dnorm_linear sampler: beta2
## [4] RW sampler: sigma
## [5] conjugate_dbeta_dbern_identity sampler: psi
## [6] binary sampler: z1
## [7] binary sampler: z2
At this point we can modify the MCMC configuration object, RJexampleConf
, to use reversible jump samplers for selection on beta1
and beta2
.
configureRJ(conf = RJexampleConf,
targetNodes = c("beta1", "beta2"),
indicatorNodes = c('z1', 'z2'),
control = list(mean = c(0, 0), scale = 2))
The targetNodes
argument gives the coefficients (nodes) for which we want to do variable selection. The indicatorNodes
gives the corresponding indicator nodes, ordered to match targetNodes
. The control
list gives the means and scale (standard deviation) for normal reversible jump proposals for targetNodes
. The means will typically be 0 (by default mean = 0
and scale = 1
), but they could be anything. An optional control
element called fixedValue
can be provided in the non-indicator setting; this gives the value taken by nodes in targetNodes
when they are out of the model (by default, fixedValue
is 0).
All control
elements can be scalars or vectors. A scalar values will be used for all targetNodes
. A vector value must be of equal length as targetNodes
and will be used in order.
To use RJMCMC on a vector of coefficients with a corresponding vector of indicator variables, simply provide the variable names for targetNodes
and indicatorNodes
. For example, targetNodes = "beta"
is equivalent to targetNodes = c("beta[1]", "beta[2]")
and indicatorNodes = "z"
is equivalent to indicatorNodes = c("z[1]", "z[2]")
. Expansion of variable names into a vector of node names will occur as described in Section 13.3.1.1. When using this method, both arguments must be provided as variable names to be expanded.
Next we can see the result of configureRJ
by printing the modified list of samplers:
## [1] conjugate_dnorm_dnorm_additive sampler: beta0
## [2] RW sampler: sigma
## [3] conjugate_dbeta_dbern_identity sampler: psi
## [4] RJ_indicator sampler: z1, mean: 0, scale: 2, targetNode: beta1
## [5] RJ_toggled sampler: beta1, samplerType: conjugate_dnorm_dnorm_linear
## [6] RJ_indicator sampler: z2, mean: 0, scale: 2, targetNode: beta2
## [7] RJ_toggled sampler: beta2, samplerType: conjugate_dnorm_dnorm_linear
An RJ_indicator
sampler was assigned to each of z[1]
and z[2]
in place of the binary
sampler, while the samplers for beta[1]
and beta[2]
have been changed to RJ_toggled
samplers. The latter samplers contain the original samplers, in this case conjugate_dnorm_dnorm
samplers, and use them only when the corresponding indicator variable is equal to \(1\) (i.e., when the coefficient is in the model).
Notice that the order of the samplers has changed, since configureRJ
calls removeSampler
for nodes in targetNodes
and indicatorNodes
, and subsequently addSampler
, which appends the sampler to the end of current sampler list. Order can be modified by using setSamplers
.
Also note that configureRJ
modifies the MCMC configuration (first argument) and returns NULL
.
7.10.2 Without indicator variables
We consider the same regression setting, but without the use of indicator variables and with fixed probabilities of including each coefficient in the model.
## Linear regression with intercept and two covariates
code <- nimbleCode({
beta0 ~ dnorm(0, sd = 100)
beta1 ~ dnorm(0, sd = 100)
beta2 ~ dnorm(0, sd = 100)
sigma ~ dunif(0, 100)
for(i in 1:N) {
Ypred[i] <- beta0 + beta1 * x1[i] + beta2 * x2[i]
Y[i] ~ dnorm(Ypred[i], sd = sigma)
}
})
## build the model
RJexampleModel2 <- nimbleModel(code, constants = list(N = N),
data = list(Y = Y, x1 = x1, x2 = x2),
inits = list(beta0 = 0, beta1 = 0, beta2 = 0,
sigma = sd(Y)))
RJexampleConf2 <- configureMCMC(RJexampleModel2)
## [1] conjugate_dnorm_dnorm_additive sampler: beta0
## [2] conjugate_dnorm_dnorm_linear sampler: beta1
## [3] conjugate_dnorm_dnorm_linear sampler: beta2
## [4] RW sampler: sigma
In this case, since there are no indicator variables, we need to pass to configureRJ
the prior inclusion probabilities for each node in targetNodes
, by specifying either one common value or a vector of values for the argument priorProb
. This case does not allow for a stochastic prior.
configureRJ(conf = RJexampleConf2,
targetNodes = c("beta1", "beta2"),
priorProb = 0.5,
control = list(mean = 0, scale = 2, fixedValue = c(1.5, 0)))
## [1] conjugate_dnorm_dnorm_additive sampler: beta0
## [2] RW sampler: sigma
## [3] RJ_fixed_prior sampler: beta1, priorProb: 0.5, mean: 0, scale: 2, fixedValue: 1.5
## [4] RJ_toggled sampler: beta1, samplerType: conjugate_dnorm_dnorm_linear, fixedValue: 1.5
## [5] RJ_fixed_prior sampler: beta2, priorProb: 0.5, mean: 0, scale: 2, fixedValue: 0
## [6] RJ_toggled sampler: beta2, samplerType: conjugate_dnorm_dnorm_linear, fixedValue: 0
Since there are no indicator variables, the RJ_fixed_prior
sampler is assigned directly to each of beta[1]
and beta[2]
along with the RJ_toggled
sampler. The former sets a coefficient to its fixedValue
when it is out of the model. The latter invokes the regular sampler for the coefficient only if it is in the model at a given iteration.
If fixedValue
is given when using indicatorNodes
the values provided in fixedValue
are ignored. However the same behavior can be obtained in this situation, using a different model specification. For example, the model in 7.10.1 can be modified to have beta1
equal to \(1.5\) when not in the model as follows:
7.11 Samplers provided with NIMBLE
NIMBLE provides a variety of other commonly-used samplers, some for general-purpose use and some for specialized use with particular distributions or situations. Some that are worth particular consideration are:
- the Hamiltonian Monte Carlo (HMC) sampler in the
nimbleHMC
package, - the slice sampler for sampling scalar parameters in place of the default
RW
(Metropolis) sampler, - the Barker sampler for sampling multiple parameters in place of the default
RW_block
(block Metropolis) sampler, - the noncentered sampler (
noncentered
) for jointly sampling random effects with their hyperparameters, - the Pólya-gamma sampler (
polyagamma
) for sampling regression coefficients in logistic regression specifications, and - the reversible-jump sampler for selection of coefficients in regression specifications.
Most documentation of MCMC samplers provided with NIMBLE can be found by invoking help(samplers)
in R.
In the remainder of this section, we provide additional explanation of conjugate samplers, the HMC sampler in the nimbleHMC
package, and how complete customization can be achieved by making a sampler use an arbitrary log-likelihood function, such as to build a particle MCMC algorithm.
7.11.1 Conjugate (‘Gibbs’) samplers
By default, configureMCMC()
and buildMCMC()
will assign conjugate samplers to all nodes satisfying a conjugate relationship, unless the option useConjugacy = FALSE
is specified.
The current release of NIMBLE supports conjugate sampling of the relationships listed in Table 7.120.
Prior Distribution | Sampling (Dependent Node) Distribution | Parameter |
---|---|---|
Beta | Bernoulli | prob |
Binomial | prob |
|
Negative Binomial | prob |
|
Dirichlet | Multinomial | prob |
Categorical | prob |
|
Flat | Normal | mean |
Lognormal | meanlog |
|
Gamma | Poisson | lambda |
Normal | tau |
|
Lognormal | taulog |
|
Gamma | rate |
|
Inverse Gamma | scale |
|
Exponential | rate |
|
Double Exponential | rate |
|
Weibull | lambda |
|
Halfflat | Normal | sd |
Lognormal | sdlog |
|
Inverse Gamma | Normal | var |
Lognormal | varlog |
|
Gamma | scale |
|
Inverse Gamma | rate |
|
Exponential | scale |
|
Double Exponential | scale |
|
Normal | Normal | mean |
Lognormal | meanlog |
|
Multivariate Normal | Multivariate Normal | mean |
Wishart | Multivariate Normal | prec |
Inverse Wishart | Multivariate Normal | cov |
Conjugate sampler functions may (optionally) dynamically check that their own posterior likelihood calculations are correct. If incorrect, a warning is issued. However, this functionality will roughly double the run-time required for conjugate sampling. By default, this option is disabled in NIMBLE. This option may be enabled with
nimbleOptions(verifyConjugatePosteriors = TRUE)
.
If one wants information about conjugate node relationships for other purposes, they can be obtained using the checkConjugacy
method on a model. This returns a named list describing all conjugate relationships. The checkConjugacy
method can also accept a character vector argument specifying a subset of node names to check for conjugacy.
7.11.2 Hamiltonian Monte Carlo (HMC)
As of version 1.0.0, NIMBLE provides support for automatic differentiation (AD), as described in Chapter 16. AD is needed for Hamiltonian Monte Carlo (HMC) sampling. The HMC algorithm is provided through the nimbleHMC
package, which requires version 1.0.0 (or higher) of the nimble
package.
HMC is an MCMC sampling algorithm that works by a physical analogy of frictionless motion on a surface that is the negative log posterior density. HMC often achieves good mixing, although at high computational cost. HMC sampling can operate on any continuous-valued model dimensions, including multiple dimensions simultaneously. HMC sampling is more flexibly applicable using nimbleHMC
than in other implementations, since HMC can be applied to some parts of a model while other samplers are used on other (potentially overlapping) parts of the same model. Indeed, R. M. Neal (2011) discusses this mixed application of HMC with other sampling methods.
The nimbleHMC
package provides two distinct HMC samplers, which implement two versions of No-U-Turn (NUTS) HMC sampling. HMC-NUTS sampling is a self-tuning version of standard HMC, which self-tunes the step-size and the number of integration steps used in the numeric integrator’s exploration of parameter space. The NUTS_classic
sampler in nimbleHMC
implements the original (“classic”) HMC-NUTS algorithm developed in Hoffman and Gelman (2014), which was the seminal version of HMC-NUTS. The NUTS
sampler in nimbleHMC
is a modern version of HMC-NUTS sampling with improved adaptation routines and convergence criteria, which matches the HMC sampler available in version 2.32.2 of Stan (Stan Development Team 2023).
The samplers provided in nimbleHMC
can be used in nimble
’s general MCMC system, and may be used in combination with other samplers provided with nimble
operating on different (or overlapping) parts of the model. For convenience, nimbleHMC
provides several different ways to set up and run HMC which allow different degrees of control:
nimbleHMC
is analogous tonimbleMCMC
: It accpets modelcode
,data
,inits
andconstants
arguments, and performs all steps from building the model to running one or more MCMC chains. Use thenimbleHMC
function if you want one call to applyNUTS
sampling to all continuous-valued model dimensions, andnimble
’s default samplers for all discrete-valued dimensions.buildHMC
is analogous tobuildMCMC
: It accepts a model object as an argument, and builds an MCMC algorithm which appliesNUTS
sampling to all continuous-valued model dimensions, andnimble
’s default samplers for all discrete-valued dimensions. UsebuildHMC
if you have created a model object, and want to automatically build the default HMC algorithm for your model.configureHMC
is analogous toconfigureMCMC
: It accepts a model object as an argument, and sets up a default MCMC configuration containing oneNUTS
sampler operating on all continuous-valued model dimensions, andnimble
’s default samplers for all discrete-valued dimensions. You can also specify whichnodes
should have samplers assigned. You can then modify the MCMC configuration (useaddSampler
and other methods) to further customize it before building and running the MCMC. UseconfigureHMC
if you are familiar with MCMC configuration objects, and may want to further customize your MCMC using HMC or other samplers or monitors before building the MCMC.addHMC
manages use ofaddSampler
to specifically add an HMC sampler for a chosen set of nodes to an exisiting MCMC configuration. It optionally removes other samplers operating on the specified nodes. UseaddHMC
to add an HMC sampler operating one or more target nodes to an exisiting MCMC configuration object.- For the most fine-grained control, you can use the
addSampler
method speciying the the argumenttype="NUTS"
ortype="NUTS_classic"
to add HMC sampler(s) to an MCMC configuration object.
As of version 1.1.0, we now support the following new functionality for AD in models:
- Dynamic indexing is supported, although of course one cannot take derivatives with respect to the actual index values, since they are discrete. This means the index values cannot be sampled by HMC.
- One can take derivatives with respect to nodes assigned CAR priors, so HMC sampling can be performed on variables assigned a CAR prior(i.e.,
dcar_normal
anddcar_proper
). - One can take derivatives with respect to the parameters of
dcat
,dconstraint
, anddinterval
. Hence HMC sampling is supported for parameters of nodes that havedcat
,dconstraint
, anddinterval
dependencies. (However, the nodes that follow one of these distributions are discrete, so derivatives with respect to those nodes themselves are not supported, and HMC cannot sample those nodes.) - Truncated distributions can be used, but one cannot take derivatives with respect to their parameters or the nodes themselves. This means that HMC sampling cannot be performed on nodes that have truncated nodes as dependencies.
More information is available from the help pages, e.g. help(nimbleHMC)
, help(buildHMC)
, help(configureHMC)
, or help(addHMC)
.
7.11.2.1 HMC example
Here we will configure and build an MCMC to use the NUTS
HMC sampler on a GLMM example. In this case we’ll use HMC sampling for the whole model, i.e., all parameters and latent states.
We’ll use a Poisson Generalized Linear Mixed Model (GLMM) as a simple example model for HMC (and also for Laplace approximation in 16.1). There will be 10 groups (indexed by i
) of 5 observations (indexed by j
) each. Each observation has a covariate, X
, and each group has a random effect ran_eff
. Here is the model code:
model_code <- nimbleCode({
# priors
intercept ~ dnorm(0, sd = 100)
beta ~ dnorm(0, sd = 100)
sigma ~ dunif(0, 10)
# random effects and data
for(i in 1:10) {
# random effects
ran_eff[i] ~ dnorm(0, sd = sigma)
for(j in 1:5) {
# data
y[i,j] ~ dpois(exp(intercept + beta*X[i,j] + ran_eff[i]))
}
}
})
We’ll simulate some values for X
.
In order to allow an algorithm to use AD for a specific model, that model must be created with buildDerivs = TRUE
. (Note that if you use the one-step method nimbleHMC
described above, then you do not need to build the model yourself and so you do not need to provide this buildDerivs
option.)
inits <- list(intercept = 0, beta = 0.2, sigma = 0.5)
model <- nimbleModel(model_code, constants = list(X = X), inits = inits,
calculate = FALSE, buildDerivs = TRUE) # Here is the argument needed for AD.
To finish setting up the example we need to put data values into the model. We could have provided data in the call to nimbleModel
, but instead we will simulate them using the model itself. Specifically, we will set parameter values, simulate data values, and then set those as the data to use.
## [1] NA
model$simulate(model$getDependencies('ran_eff'))
model$calculate() # Now the model is fully initialized: all nodes have valid values.
## [1] -80.74344
Finally, we will make a compiled version of the model.
Now we will build an HMC algorithm for the parameters and random effects in the model.
## ===== Monitors =====
## thin = 1: beta, intercept, sigma
## ===== Samplers =====
## NUTS sampler (1)
## - intercept, beta, sigma, ran_eff[1], ran_eff[2], ran_eff[3], ran_eff[4], ran_eff[5], ran_eff[6], ran_eff[7], ran_eff[8], ran_eff[9], ran_eff[10]
CHMC <- compileNimble(HMC, project = model)
# The compiled HMC will use the compiled model.
samples <- runMCMC(CHMC, niter = 1000, nburnin = 500) # short run for illustration
## [Note] NUTS sampler (nodes: intercept, beta, sigma, ran_eff[1], ran_eff[2], ran_eff[3], ran_eff[4], ran_eff[5], ran_eff[6], r...) is using 500 warmup iterations.
## Since `warmupMode` is 'default' and `nburnin` > 0,
## the number of warmup iterations is equal to `nburnin`.
## The burnin samples will be discarded, and all samples returned will be post-warmup.
We will not investigate results in detail, but here is a summary to see that reasonable results were generated.
##
## Iterations = 1:500
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 500
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta 0.1790 0.1438 0.006432 0.008174
## intercept -0.2524 0.4463 0.019960 0.056677
## sigma 0.7912 0.4091 0.018296 0.051351
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta -0.1299 0.08311 0.1873 0.280133 0.4415
## intercept -1.6013 -0.42914 -0.1681 -0.002989 0.3520
## sigma 0.2935 0.53669 0.6886 0.943581 2.0846
7.11.2.2 Automatic parameter transformation
In the example above, sigma
is a parameter that must be non-negative. HMC works only on unconstrained parameter spaces. NIMBLE’s HMC functionality automatically transforms any constrained parameters to be in an unconstrained parameter space, runs the MCMC, and back-transforms to the original space before providing the MCMC samples. The details of the parameter transformation system can be found in Section 16.8.
7.11.3 Particle filter samplers
The nimbleSMC
package provides samplers that perform particle MCMC, primarily intended for sampling top-level parameters in state-space or hidden Markov models of time-series data. This sampler is described in Section 8.1.2.
7.11.4 Customized log-likelihood evaluations: RW_llFunction sampler
Sometimes it is useful to control the log-likelihood calculations used for an MCMC updater instead of simply using the model. For example, one could use a sampler with a log-likelihood that analytically (or numerically) integrates over latent model nodes. Or one could use a sampler with a log-likelihood that comes from a stochastic approximation such as a particle filter (see below), allowing composition of a particle MCMC (PMCMC) algorithm (Andrieu, Doucet, and Holenstein 2010). The RW_llFunction
sampler handles this by using a Metropolis-Hastings algorithm with a normal proposal distribution and a user-provided log-likelihood function. To allow compiled execution, the log-likelihood function must be provided as a specialized instance of a nimbleFunction. The log-likelihood function may use the same model as the MCMC as a setup argument (as does the example below), but if so the state of the model should be unchanged during execution of the function (or you must understand the implications otherwise).
The RW_llFunction
sampler can be customized using the control
list argument to set the initial proposal distribution scale and the adaptive properties for the Metropolis-Hastings sampling. In addition, the control
list argument must contain a named llFunction
element. This is the specialized nimbleFunction that calculates the log-likelihood; it must accept no arguments and return a scalar double number. The return value must be the total log-likelihood of all stochastic dependents of the target
nodes – and, if includesTarget = TRUE
, of the target node(s) themselves – or whatever surrogate is being used for the total log-likelihood. This is a required control
list element with no default. See help(samplers)
for details.
Here is a complete example:
code <- nimbleCode({
p ~ dunif(0, 1)
y ~ dbin(p, n)
})
Rmodel <- nimbleModel(code, data = list(y=3), inits = list(p=0.5, n=10))
llFun <- nimbleFunction(
setup = function(model) { },
run = function() {
y <- model$y
p <- model$p
n <- model$n
ll <- lfactorial(n) - lfactorial(y) - lfactorial(n-y) +
y * log(p) + (n-y) * log(1-p)
returnType(double())
return(ll[1])
}
)
RllFun <- llFun(Rmodel)
mcmcConf <- configureMCMC(Rmodel, nodes = NULL)
mcmcConf$addSampler(target = "p", type = "RW_llFunction",
control = list(llFunction = RllFun, includesTarget = FALSE))
Rmcmc <- buildMCMC(mcmcConf)
Note that we need to return ll[1]
and not just ll
because there are no scalar variables in compiled models. Hence y
and other variables, and therefore ll
, are of dimension 1 (i.e., vectors / one-dimensional arrays), so we need to specify the first element in order to have the return type be a scalar.
7.12 Detailed MCMC example: litters
Here is a detailed example of specifying, building, compiling, and running two MCMC algorithms. We use the litters
example from the BUGS examples.
###############################
##### model configuration #####
###############################
# define our model using BUGS syntax
litters_code <- nimbleCode({
for (i in 1:G) {
a[i] ~ dgamma(1, .001)
b[i] ~ dgamma(1, .001)
for (j in 1:N) {
r[i,j] ~ dbin(p[i,j], n[i,j])
p[i,j] ~ dbeta(a[i], b[i])
}
mu[i] <- a[i] / (a[i] + b[i])
theta[i] <- 1 / (a[i] + b[i])
}
})
# list of fixed constants
constants <- list(G = 2,
N = 16,
n = matrix(c(13, 12, 12, 11, 9, 10, 9, 9, 8, 11, 8, 10, 13,
10, 12, 9, 10, 9, 10, 5, 9, 9, 13, 7, 5, 10, 7, 6,
10, 10, 10, 7), nrow = 2))
# list specifying model data
data <- list(r = matrix(c(13, 12, 12, 11, 9, 10, 9, 9, 8, 10, 8, 9, 12, 9,
11, 8, 9, 8, 9, 4, 8, 7, 11, 4, 4, 5 , 5, 3, 7, 3,
7, 0), nrow = 2))
# list specifying initial values
inits <- list(a = c(1, 1),
b = c(1, 1),
p = matrix(0.5, nrow = 2, ncol = 16),
mu = c(.5, .5),
theta = c(.5, .5))
# build the R model object
Rmodel <- nimbleModel(litters_code,
constants = constants,
data = data,
inits = inits)
###########################################
##### MCMC configuration and building #####
###########################################
# generate the default MCMC configuration;
# only wish to monitor the derived quantity "mu"
mcmcConf <- configureMCMC(Rmodel, monitors = "mu")
# check the samplers assigned by default MCMC configuration
mcmcConf$printSamplers()
# double-check our monitors, and thinning interval
mcmcConf$printMonitors()
# build the executable R MCMC function
mcmc <- buildMCMC(mcmcConf)
# let's try another MCMC, as well,
# this time using the crossLevel sampler for top-level nodes
# generate an empty MCMC configuration
# we need a new copy of the model to avoid compilation errors
Rmodel2 <- Rmodel$newModel()
mcmcConf_CL <- configureMCMC(Rmodel2, nodes = NULL, monitors = "mu")
# add two crossLevel samplers
mcmcConf_CL$addSampler(target = c("a[1]", "b[1]"), type = "crossLevel")
mcmcConf_CL$addSampler(target = c("a[2]", "b[2]"), type = "crossLevel")
# let's check the samplers
mcmcConf_CL$printSamplers()
# build this second executable R MCMC function
mcmc_CL <- buildMCMC(mcmcConf_CL)
###################################
##### compile to C++, and run #####
###################################
# compile the two copies of the model
Cmodel <- compileNimble(Rmodel)
Cmodel2 <- compileNimble(Rmodel2)
# compile both MCMC algorithms, in the same
# project as the R model object
# NOTE: at this time, we recommend compiling ALL
# executable MCMC functions together
Cmcmc <- compileNimble(mcmc, project = Rmodel)
Cmcmc_CL <- compileNimble(mcmc_CL, project = Rmodel2)
# run the default MCMC function,
# and example the mean of mu[1]
Cmcmc$run(1000)
cSamplesMatrix <- as.matrix(Cmcmc$mvSamples) # alternative: as.list
mean(cSamplesMatrix[, "mu[1]"])
# run the crossLevel MCMC function,
# and examine the mean of mu[1]
Cmcmc_CL$run(1000)
cSamplesMatrix_CL <- as.matrix(Cmcmc_CL$mvSamples)
mean(cSamplesMatrix_CL[, "mu[1]"])
###################################
#### run multiple MCMC chains #####
###################################
# run 3 chains of the crossLevel MCMC
samplesList <- runMCMC(Cmcmc_CL, niter=1000, nchains=3)
lapply(samplesList, dim)
7.13 Comparing different MCMCs with MCMCsuite and compareMCMCs
Please see the compareMCMCs
package for the features previously provided by MCMCsuite
and compareMCMCs
in NIMBLE (until version 0.8.0). The compareMCMCs
package provides tools to automatically run MCMC in nimble (including multiple sampler configurations), WinBUGS, OpenBUGS, JAGS, Stan, or any other engine for which you provide a simple common interface. The package makes it easy to manage comparison metrics and generate html pages with comparison figures.
7.14 Running MCMC chains in parallel
It is possible to run multiple chains in parallel using standard R parallelization packages such as parallel
, foreach
, and future
. However, you must create separate copies of all model and MCMC objects using nimbleModel
, buildMCMC
, compileNimble
, etc. This is because NIMBLE uses Reference Classes and R6 classes, so copying such objects simply creates a new variable name that refers to the original object.
Thus, in your parallel loop or lapply-style statement, you should run nimbleModel
and all subsequent calls to create and compile the model and MCMC algorithm within the parallelized block of code, once for each MCMC chain being run in parallel.
For a worked example, please see the parallelization example on our webpage.