Chapter 8 Sequential Monte Carlo, Particle MCMC, Iterated Filtering, and MCEM
The NIMBLE algorithm library is growing and currently includes a suite of sequential Monte Carlo (particle filtering) algorithms, particle MCMC for combining particle filters with MCMC, iterated filtering version 2 and Monte Carlo expectation maximization (MCEM) for maximum likelihood estimation, and k-fold cross-validation.
8.1 Particle filters / sequential Monte Carlo and iterated filtering
As of Version 0.10.0 of NIMBLE, all of NIMBLE’s sequential Monte Carlo/particle filtering functionality lives in the nimbleSMC
package. Please load this package before trying to use these algorithms.
8.1.1 Filtering algorithms
NIMBLE includes algorithms for four different types of sequential Monte Carlo (also known as particle filters), which can be used to sample from the latent states and approximate the log likelihood of a state-space model. These include the bootstrap filter, the auxiliary particle filter, the Liu-West filter, and the ensemble Kalman filter. The iterated filtering version 2 (IF2) is a related method for maximum-likelihood estimation. Each of these is built with the eponymous functions buildBootstrapFilter
, buildAuxiliaryFilter
, buildLiuWestFilter
, buildEnsembleKF
, and buildIteratedFilter2
. Each method requires setup arguments model
and nodes
; the latter should be a character vector specifying latent model nodes. In addition, each method can be customized using a control
list argument. Details on the control options and specifics of the algorithms can be found in the help pages for the functions.
Once built, each filter can be run by specifying the number of particles. Each filter has a modelValues object named mvEWSamples
that is populated with equally-weighted samples from the posterior distribution of the latent states (and in the case of the Liu-West filter, the posterior distribution of the top level parameters as well) as the filter is run. The bootstrap, auxiliary, and Liu-West filters, as well as the IF2 method, also have another modelValues object, mvWSamples
. This has unequally-weighted samples from the posterior distribution of the latent states, along with weights for each particle. In addition, the bootstrap and auxiliary particle filters return estimates of the log-likelihood of the given state-space model.
We first create a linear state-space model to use as an example for our particle filter algorithms.
# Building a simple linear state-space model.
# x is latent space, y is observed data
<- nimbleCode({
timeModelCode 1] ~ dnorm(mu_0, 1)
x[1] ~ dnorm(x[1], 1)
y[for(i in 2:t){
~ dnorm(x[i-1] * a + b, 1)
x[i] ~ dnorm(x[i] * c, 1)
y[i]
}
~ dunif(0, 1)
a ~ dnorm(0, 1)
b ~ dnorm(1,1)
c ~ dnorm(0, 1)
mu_0
})
# simulate some data
<- 25; mu_0 <- 1
t <- rnorm(1 ,mu_0, 1)
x <- rnorm(1, x, 1)
y <- 0.5; b <- 1; c <- 1
a for(i in 2:t){
<- rnorm(1, x[i-1] * a + b, 1)
x[i] <- rnorm(1, x[i] * c, 1)
y[i]
}
# build the model
<- nimbleModel(timeModelCode, constants = list(t = t),
rTimeModel <- list(y = y), check = FALSE )
data
# Set parameter values and compile the model
$a <- 0.5
rTimeModel$b <- 1
rTimeModel$c <- 1
rTimeModel$mu_0 <- 1
rTimeModel
<- compileNimble(rTimeModel) cTimeModel
8.1.1.1 Bootstrap filter
Here is an example of building and running the bootstrap filter.
# Build bootstrap filter
<- buildBootstrapFilter(rTimeModel, "x",
rBootF control = list(thresh = 0.8, saveAll = TRUE,
smoothing = FALSE))
# Compile filter
<- compileNimble(rBootF,project = rTimeModel)
cBootF # Set number of particles
<- 5000
parNum # Run bootstrap filter, which returns estimate of model log-likelihood
<- cBootF$run(parNum)
bootLLEst # The bootstrap filter can also return an estimate of the effective
# sample size (ESS) at each time point
<- cBootF$returnESS() bootESS
8.1.1.2 Auxiliary particle filter
Next, we provide an example of building and running the auxiliary particle filter. Note that a filter cannot be built on a model that already has a filter specialized to it, so we create a new copy of our state space model first.
# Copy our state-space model for use with the auxiliary filter
<- rTimeModel$newModel(replicate = TRUE)
auxTimeModel compileNimble(auxTimeModel)
# Build auxiliary filter
<- buildAuxiliaryFilter(auxTimeModel, "x",
rAuxF control = list(thresh = 0.5, saveAll = TRUE))
# Compile filter
<- compileNimble(rAuxF,project = auxTimeModel)
cAuxF # Run auxiliary filter, which returns estimate of model log-likelihood
<- cAuxF$run(parNum)
auxLLEst # The auxiliary filter can also return an estimate of the effective
# sample size (ESS) at each time point
<- cAuxF$returnESS() auxESS
8.1.1.3 Liu and West filter
Now we give an example of building and running the Liu and West filter, which can sample from the posterior distribution of top-level parameters as well as latent states. Note that the Liu-West filter ofen performs poorly and is provided primarily for didactic purposes. The Liu and West filter accepts an additional params
argument, specifying the top-level parameters to be sampled.
# Copy model
<- rTimeModel$newModel(replicate = TRUE)
LWTimeModel compileNimble(LWTimeModel)
# Build Liu-West filter, also
# specifying which top level parameters to estimate
<- buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"),
rLWF control = list(saveAll = FALSE))
## Warning in buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"), :
## The Liu-West filter ofen performs poorly and is provided primarily for didactic
## purposes.
# Compile filter
<- compileNimble(rLWF,project = LWTimeModel)
cLWF # Run Liu-West filter
$run(parNum) cLWF
8.1.1.4 Ensemble Kalman filter
Next we give an example of building and running the ensemble Kalman filter, which can sample from the posterior distribution of latent states.
# Copy model
<- rTimeModel$newModel(replicate = TRUE)
ENKFTimeModel compileNimble(ENKFTimeModel)
# Build and compile ensemble Kalman filter
<- buildEnsembleKF(ENKFTimeModel, "x",
rENKF control = list(saveAll = FALSE))
<- compileNimble(rENKF,project = ENKFTimeModel)
cENKF # Run ensemble Kalman filter
$run(parNum) cENKF
Once each filter has been run, we can extract samples from the posterior distribution of our latent states as follows:
# Equally-weighted samples (available from all filters)
<- as.matrix(cBootF$mvEWSamples) # alternative: as.list
bootEWSamp <- as.matrix(cAuxF$mvEWSamples)
auxEWSamp <- as.matrix(cLWF$mvEWSamples)
LWFEWSamp <- as.matrix(cENKF$mvEWSamples)
ENKFEWSamp
# Unequally-weighted samples, along with weights (available
# from bootstrap, auxiliary, and Liu and West filters)
<- as.matrix(cBootF$mvWSamples, "x")
bootWSamp <- as.matrix(cBootF$mvWSamples, "wts")
bootWts <- as.matrix(xAuxF$mvWSamples, "x")
auxWSamp <- as.matrix(cAuxF$mvWSamples, "wts")
auxWts
# Liu and West filter also returns samples
# from posterior distribution of top-level parameters:
<- as.matrix(cLWF$mvEWSamples, "a") aEWSamp
8.1.1.5 Iterated filtering 2 (IF2)
The IF2 method accomplishes maximum likelihood estimation using a scheme wherein both latent states and parameters are represented by particles that are weighted and resampled during the iterations. Iterations include perturbations to the parameter particles following a schedule of decreasing magnitude to yield convergence to the MLE.
Here we apply IF2 to Nile River flow data, specifying a changepoint in the year the Aswan Dam was constructed, as the dam altered river flows.
library(FKF)
<- nimbleCode({
flowCode for(t in 1:n)
~ dnorm(x[t], sd = sigmaMeasurements)
y[t] 1] ~ dnorm(x0, sd = sigmaInnovations)
x[for(t in 2:n)
~ dnorm((t-1==28)*meanShift1899 + x[t-1], sd = sigmaInnovations)
x[t] ~ dnorm(0, sd = 100)
logSigmaInnovations ~ dnorm(0, sd = 100)
logSigmaMeasurements <- exp(logSigmaInnovations)
sigmaInnovations <- exp(logSigmaMeasurements)
sigmaMeasurements ~ dnorm(1120, var = 100)
x0 ~ dnorm(0, sd = 100)
meanShift1899
})
<- nimbleModel(flowCode, data = list(y = Nile),
flowModel constants = list(n = length(Nile)),
inits = list(logSigmaInnovations = log(sd(Nile)),
logSigmaMeasurements = log(sd(Nile)),
meanShift1899 = -100))
Note that the prior distributions for the parameters are not used by IF2, except possibly to obtain boundaries of valid parameter values (not the case here).
Now we build the filter, specifying user-controlled standard deviations (in this case the same as the perturbation sigma
values) for use in generating the initial particles for the parameters via the control
list.
<- buildIteratedFilter2(model = flowModel,
filter nodes = 'x',
params = c('logSigmaInnovations',
'logSigmaMeasurements',
'meanShift1899'),
baselineNode = 'x0',
control = list(sigma = c(0.1, 0.1, 5),
initParamSigma = c(0.1, 0.1, 5)))
<- compileNimble(flowModel)
cFlowModel <- compileNimble(filter, project = flowModel) cFilter
We now run the algorithm with 1000 particles for 100 iterations with the schedule parameter equal to 0.2.
In addition to the estimates, we can extract the values of the log-likelihood, the estimates and the standard deviation of the parameter particles as they evolve over the iterations, in order to assess convergence.
set.seed(1)
<- cFilter$run(m = 1000, niter = 100, alpha = 0.2)
est
$estimates[95:100,] ## Last 5 iterations of parameter values cFilter
## [,1] [,2] [,3]
## [1,] 0.013306153 4.822613 -269.7837
## [2,] -0.013961938 4.857470 -271.4965
## [3,] 0.002183997 4.850475 -271.1919
## [4,] 0.015598044 4.851961 -272.3889
## [5,] -0.005571944 4.842719 -272.6390
## [6,] 0.042982317 4.837347 -271.1800
$logLik[90:100] ## Last 5 iterations of log likelihood values cFilter
## [1] -627.0497 -626.9570 -626.9856 -626.9954 -626.7448 -626.7521 -626.8472
## [8] -626.7398 -626.9869 -627.1354 -626.6648
Comparing to use of the the Kalman Filter from the FKF package, we see the log-likelihood is fairly similar:
<- matrix(0, ncol = length(Nile))
dtpred 28] <- est[3]
dtpred[<- matrix(0)
ct <- Tt <- matrix(1)
Zt
<- fkf(HHt = matrix(exp(2*est[1])),
fkfResult GGt = matrix(exp(2*est[2])),
yt = rbind(Nile),
a0 = 1120,
P0 = matrix(100),
dt = dtpred, ct = ct, Zt = Zt, Tt = Tt)
$logLik fkfResult
## [1] -626.5521
8.1.2 Particle MCMC (PMCMC)
Particle MCMC (PMCMC) is a method that uses MCMC for top-level model
parameters and uses a particle filter to approximate the time-series
likelihood for use in determining MCMC acceptance probabilities
(Andrieu, Doucet, and Holenstein 2010). NIMBLE implements PMCMC by
providing random-walk Metropolis-Hastings samplers for model
parameters that make use of particle filters in this way. These
samplers can use NIMBLE’s bootstrap filter or auxiliary particle
filter, or they can use a user-defined filter. Whichever filter is
specified will be used to obtain estimates of the likelihood of the
state-space model (marginalizing over the latent states), which is
used for calculation of the Metropolis-Hastings acceptance
probability. The RW_PF
sampler uses a univariate normal
proposal distribution, and can be used to sample scalar top-level
parameters. The RW_PF_block
sampler uses a multivariate normal
proposal distribution and can be used to jointly sample vectors of
top-level parameters. The PMCMC samplers can be specified with a call
to addSampler
with type = "RW_PF"
or type = "RW_PF_block"
,
a syntax similar to the other MCMC samplers listed
in Section 7.11.
The RW_PF
sampler and RW_PF_block
sampler can be
customized using the control
list argument to set the adaptive
properties of the sampler and options for the particle filter
algorithm to be used. In addition, providing
pfOptimizeNparticles=TRUE
in the control
list will use an
experimental algorithm to estimate the optimal number of particles to
use in the particle filter. See help(samplers)
for details. The
MCMC configuration for the timeModel
in the previous section will
serve as an example for the use of our PMCMC sampler. Here we use the
identity matrix as our proposal covariance matrix.
<- configureMCMC(rTimeModel, nodes = NULL) # empty MCMC configuration
timeConf
# Add random walk PMCMC sampler with particle number optimization.
$addSampler(target = c("a", "b", "c", "mu_0"), type = "RW_PF_block",
timeConfcontrol = list(propCov= diag(4), adaptScaleOnly = FALSE,
latents = "x", pfOptimizeNparticles = TRUE))
The type = "RW_PF"
and type = "RW_PF*block"
samplers
default to using a bootstrap filter. The adapatation control
parameters adaptive
, adaptInterval
, and
adaptScaleOnly
work in the same way as for an RW
and RW*block
samplers.
However, it is not clear if the same approach to adaptation works well
for PMCMC, so one should consider turning off adaptation and using
a well-chosen proposal covariance.
It is also possible that more efficient results
can be obtained by using a custom filtering algorithm. Choice
of filtering algorithm can be controlled by the pfType
control
list entry. The pfType
entry can be set either to
'bootstrap'
(the default), 'auxiliary'
, or the name of a
user-defined nimbleFunction that returns a likelihood approximation.
Any user-defined filtering nimbleFunction named in the pfType
control list entry must satsify the following:
The nimbleFunction must be the result of a call to
nimbleFunction()
.The
nimbleFunction
must have setup code that accepts the following (and only the following) arguments:model
, the NIMBLE model object that the MCMC algorithm is defined on.latents
, a character vector specifying the latent model nodes over which the particle filter will stochastically integrate over to estimate the log-likelihood function.control
, anR
list
object. Note that thecontrol
list can be used to pass in any additional information or arguments that the custom filter may require.
The
nimbleFunction
must have arun
function that accepts a single integer arugment (the number of particles to use), and returns a scalar double (the log-likelihood estimate).The
nimbleFunction
must define, in setup code, amodelValues
object namedmvEWSamples
that is used to contain equally weighted samples of the latent states (that is, thelatents
argument to the setup function). Each time therun()
method of thenimbleFunction
is called with number of particlesm
, themvEWSamples
modelValues
object should be resized to be of sizem
via a call toresize(mvEWSamples, m)
.
8.2 Monte Carlo Expectation Maximization (MCEM)
Suppose we have a model with missing data (or a layer of latent variables that can be treated as missing data), and we would like to maximize the marginal likelihood of the model, integrating over the missing data. A brute-force method for doing this is MCEM. This is an EM algorithm in which the missing data are simulated via Monte Carlo (often MCMC, when the full conditional distributions cannot be directly sampled from) at each iteration. MCEM can be slow, and there are other methods for maximizing marginal likelihoods that can be implemented in NIMBLE. The reason we started with MCEM is to explore the flexibility of NIMBLE and illustrate the ability to combine R and NIMBLE to run an algorithm, with R managing the highest-level processing of the algorithm and calling nimbleFunctions for computations.
NIMBLE provides an ascent-based MCEM algorithm, created using buildMCEM
, that automatically determines when the algorithm has converged by examining
the size of the changes in the likelihood between each iteration. Additionally, the MCEM algorithm can provide an estimate of the asymptotic covariance matrix of the parameters. An example of calculating the asymptotic covariance can be found in Section 8.2.1.
We will revisit the pump example to illustrate the use of NIMBLE’s MCEM algorithm.
<- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
pump data = pumpData, inits = pumpInits, check = FALSE)
compileNimble(pump)
# build an MCEM algorithm with ascent-based convergence criterion
<- buildMCEM(model = pump,
pumpMCEM latentNodes = "theta", burnIn = 300,
mcmcControl = list(adaptInterval = 100),
boxConstraints = list( list( c("alpha", "beta"),
limits = c(0, Inf) ) ),
buffer = 1e-6)
The first argument buildMCEM
, model
, is a NIMBLE model, which can be
either the uncompiled or compiled version. At the moment, the model provided cannot be part of another MCMC sampler.
Initial values for the parameters are taken to be the values in the model at the time buildMCEM
is called, unless the values in the compiled model are changed before running the MCEM.
The ascent-based MCEM algorithm has a number of control options:
The latentNodes
argument should indicate the nodes that will be
integrated over (sampled via MCMC), rather than
maximized. These
nodes must be stochastic, not deterministic! latentNodes
will
be expanded as described in Section 13.3.1.1. I.e.,
either latentNodes = "x"
or latentNodes = c("x[1]", "x[2]")
will treat x[1]
and x[2]
as latent nodes if
x
is a vector of two values. All other non-data nodes will be
maximized over. Note that latentNodes
can include discrete nodes,
but the nodes to be maximized cannot.
The burnIn
argument indicates the number of samples from the MCMC for the E-step that should be discarded when computing the expected likelihood in the M-step. Note that burnIn
can be set to values lower than in standard MCMC computations, as each iteration will start where the last left off.
The mcmcControl
argument will be passed to configureMCMC
to define the MCMC to be used.
The MCEM algorithm automatically detects box constraints for the nodes that will
be optimized, using NIMBLE’s getBounds
function. It is also possible for a user to manually specify constraints via the boxConstraints
argument.
Each constraint given should be a list
in which the first element is the names of the nodes or variables
that the constraint will be applied to and the second element is a
vector of length two, in which the first value is the lower limit and
the second is the upper limit. Values of Inf
and -Inf
are allowed. If a node is not listed, its constraints will be automatically determined by NIMBLE. These constraint arguments are passed as the lower
and upper
arguments to R’s optim
function, using method = "L-BFGS-B"
. Note that NIMBLE will give a warning if a user-provided constraint is more extreme than the constraint determined by NIMBLE.
The value of the buffer
argument shrinks the
boxConstraints
by this amount. This can help protect against
non-finite values occurring when a parameter is on the boundary.
In addition, the MCEM has some extra control options that can be used to further tune the convergence criterion. See help(buildMCEM)
for more information.
The buildMCEM
function returns a list with two elements. The first element is a function called run
, which will use the MCEM algorithm to estimate the MLEs. The second function is called estimateCov
, and is described in Section 8.2.1. The run
function can be run as follows. There is only one run-time argument, initM
, which is the number of MCMC iterations to use when the algorithm is initialized.
<- pumpMCEM$run(initM = 1000) pumpMLE
## Iteration Number: 1.
## Current number of MCMC iterations: 1000.
## Parameter Estimates:
## alpha beta
## 0.8219803 1.1492317
## Convergence Criterion: 1.001.
## Iteration Number: 2.
## Current number of MCMC iterations: 1000.
## Parameter Estimates:
## alpha beta
## 0.818586 1.237039
## Convergence Criterion: 0.02393763.
## Iteration Number: 3.
## Current number of MCMC iterations: 1000.
## Parameter Estimates:
## alpha beta
## 0.8274042 1.2673636
## Convergence Criterion: 0.002330748.
## Iteration Number: 4.
## Current number of MCMC iterations: 3845.
## Parameter Estimates:
## alpha beta
## 0.8270797 1.2730042
## Convergence Criterion: 0.0002820885.
pumpMLE
## alpha beta
## 0.8270797 1.2730042
Direct maximization after analytically integrating over the latent nodes (possible for this model but often not feasible) gives estimates of \(\hat\alpha=0.823\) and \(\hat\beta = 1.261\), so the MCEM seems to do pretty well, though tightening the convergence criteria may be warranted in actual usage.
8.2.1 Estimating the asymptotic covariance From MCEM
The second element of the list returned by a call to buildMCEM
is a function called estimateCov
, which estimates the asymptotic covariance of the parameters at their MLE values. If the run
function has been called previously, the estimateCov
function will automatically use the MLE values produced by the run
function to estimate the covariance. Alternatively, a user can supply their own MLE values using the MLEs
argument, which allows the covariance to be estimated without having called the run
function. More details about the estimateCov
function can be found by calling help(buildMCEM)
. Below is an example of using the estimateCov
function.
<- pumpMCEM$estimateCov()
pumpCov pumpCov
## alpha beta
## alpha 0.1283888 0.2180972
## beta 0.2180972 0.6377524
# Alternatively, you can manually specify the MLE values as a named vector.
<- pumpMCEM$estimateCov(MLEs = c(alpha = 0.823, beta = 1.261)) pumpCov