# Chapter 8 Sequential Monte Carlo and MCEM

The NIMBLE algorithm library is growing and as of version 0.6-7 includes a suite of Sequential Monte Carlo algorithms as well as a more robust MCEM and k-fold cross-validation.

## 8.1 Particle Filters / Sequential Monte Carlo

### 8.1.1 Filtering Algorithms

NIMBLE includes algorithms for four different types of sequential Monte Carlo (also known as particle filters), which can be used to sample from the latent states and approximate the log likelihood of a state space model. The particle filters currently implemented in NIMBLE are the bootstrap filter, the auxiliary particle filter, the Liu-West filter, and the ensemble Kalman filter, which can be built, respectively, with calls to `buildBootstrapFilter`

, `buildAuxiliaryFilter`

, `buildLiuWestFilter`

, and `buildEnsembleKF`

. Each particle filter requires setup arguments `model`

and `nodes`

; the latter should be a character vector specifying latent model nodes. In addition, each particle filter can be customized using a `control`

list argument. Details on the control options and specifics of the filtering algorithms can be found in the help pages for the functions.

Once built, each filter can be run by specifying the number of particles. Each filter has a modelValues object named `mvEWSamples`

that is populated with equally-weighted samples from the posterior distribution of the latent states (and in the case of the Liu-West filter, the posterior distribution of the top level parameters as well) as the filter is run. The bootstrap, auxiliary, and Liu-West filters also have another modelValues object, `mvWSamples`

, which has unequally-weighted samples from the posterior distribution of the latent states, along with weights for each particle. In addition, the bootstrap and auxiliary particle filters return estimates of the log-likelihood of the given state space model.

We first create a linear state-space model to use as an example for our particle filter algorithms.

```
# Building a simple linear state-space model.
# x is latent space, y is observed data
timeModelCode <- nimbleCode({
x[1] ~ dnorm(mu_0, 1)
y[1] ~ dnorm(x[1], 1)
for(i in 2:t){
x[i] ~ dnorm(x[i-1] * a + b, 1)
y[i] ~ dnorm(x[i] * c, 1)
}
a ~ dunif(0, 1)
b ~ dnorm(0, 1)
c ~ dnorm(1,1)
mu_0 ~ dnorm(0, 1)
})
# simulate some data
t <- 25; mu_0 <- 1
x <- rnorm(1 ,mu_0, 1)
y <- rnorm(1, x, 1)
a <- 0.5; b <- 1; c <- 1
for(i in 2:t){
x[i] <- rnorm(1, x[i-1] * a + b, 1)
y[i] <- rnorm(1, x[i] * c, 1)
}
# build the model
rTimeModel <- nimbleModel(timeModelCode, constants = list(t = t),
data <- list(y = y), check = FALSE )
# Set parameter values and compile the model
rTimeModel$a <- 0.5
rTimeModel$b <- 1
rTimeModel$c <- 1
rTimeModel$mu_0 <- 1
cTimeModel <- compileNimble(rTimeModel)
```

Here is an example of building and running the bootstrap filter.

```
# Build bootstrap filter
rBootF <- buildBootstrapFilter(rTimeModel, "x",
control = list(thresh = 0.8, saveAll = TRUE,
smoothing = FALSE))
# Compile filter
cBootF <- compileNimble(rBootF,project = rTimeModel)
# Set number of particles
parNum <- 5000
# Run bootstrap filter, which returns estimate of model log-likelihood
bootLLEst <- cBootF$run(parNum)
# The bootstrap filter can also return an estimate of the effective
# sample size (ESS) at each time point
bootESS <- cBootF$returnESS()
```

Next, we provide an example of building and running the auxiliary particle filter. Note that a filter cannot be built on a model that already has a filter specialized to it, so we create a new copy of our state space model first.

```
# Copy our state-space model for use with the auxiliary filter
auxTimeModel <- rTimeModel$newModel(replicate = TRUE)
compileNimble(auxTimeModel)
# Build auxiliary filter
rAuxF <- buildAuxiliaryFilter(auxTimeModel, "x",
control = list(thresh = 0.5, saveAll = TRUE))
# Compile filter
cAuxF <- compileNimble(rAuxF,project = auxTimeModel)
# Run auxiliary filter, which returns estimate of model log-likelihood
auxLLEst <- cAuxF$run(parNum)
# The auxiliary filter can also return an estimate of the effective
# sample size (ESS) at each time point
auxESS <- cAuxF$returnESS()
```

Now we give an example of building and running the Liu and West filter, which can sample from the posterior distribution of top-level parameters as well as latent states. The Liu and West filter accepts an additional `params`

argument, specifying the top-level parameters to be sampled.

```
# Copy model
LWTimeModel <- rTimeModel$newModel(replicate = TRUE)
compileNimble(LWTimeModel)
# Build Liu-West filter, also
# specifying which top level parameters to estimate
rLWF <- buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"),
control = list(saveAll = FALSE))
# Compile filter
cLWF <- compileNimble(rLWF,project = LWTimeModel)
# Run Liu-West filter
cLWF$run(parNum)
```

Finally, we give an example of building and running the ensemble Kalman filter, which can sample from the posterior distribution of latent states.

```
# Copy model
ENKFTimeModel <- rTimeModel$newModel(replicate = TRUE)
compileNimble(ENKFTimeModel)
# Build and compile ensemble Kalman filter
rENKF <- buildEnsembleKF(ENKFTimeModel, "x",
control = list(saveAll = FALSE))
cENKF <- compileNimble(rENKF,project = ENKFTimeModel)
# Run ensemble Kalman filter
cENKF$run(parNum)
```

Once each filter has been run, we can extract samples from the posterior distribution of our latent states as follows:

```
# Equally-weighted samples (available from all filters)
bootEWSamp <- as.matrix(cBootF$mvEWSamples)
auxEWSamp <- as.matrix(cAuxF$mvEWSamples)
LWFEWSamp <- as.matrix(cLWF$mvEWSamples)
ENKFEWSamp <- as.matrix(cENKF$mvEWSamples)
# Unequally-weighted samples, along with weights (available
# from bootstrap, auxiliary, and Liu and West filters)
bootWSamp <- as.matrix(cBootF$mvWSamples, "x")
bootWts <- as.matrix(cBootF$mvWSamples, "wts")
auxWSamp <- as.matrix(xAuxF$mvWSamples, "x")
auxWts <- as.matrix(cAuxF$mvWSamples, "wts")
# Liu and West filter also returns samples
# from posterior distribution of top-level parameters:
aEWSamp <- as.matrix(cLWF$mvEWSamples, "a")
```

### 8.1.2 Particle MCMC (PMCMC)

Particle MCMC (PMCMC) is a method that uses MCMC for top-level model parameters and uses a particle filter to approximate the time-series likelihood for use in determining MCMC acceptance probabilities (Andrieu, Doucet, and Holenstein 2010). NIMBLE implements PMCMC by providing random-walk Metropolis-Hastings samplers for model parameters that make use of particle filters in this way. These samplers can use NIMBLE’s bootstrap filter or auxiliary particle filter, or they can use a user-defined filter. Whichever filter is specified will be used to obtain estimates of the likelihood of the state-space model (marginalizing over the latent states), which is used for calculation of the Metropolis-Hastings acceptance probability. The `RW_PF`

sampler uses a univariate normal proposal distribution, and can be used to sample scalar top-level parameters. The `RW_PF_block`

sampler uses a multivariate normal proposal distribution and can be used to jointly sample vectors of top-level parameters. The PMCMC samplers can be specified with a call to `addSampler`

with `type = "RW_PF"`

or `type = "RW_PF_block"`

, a syntax similar to the other MCMC samplers listed in Section 7.9.

The `RW_PF`

sampler and `RW_PF_block`

sampler can be customized using the `control`

list argument to set the adaptive properties of the sampler and options for the particle filter algorithm to be used. In addition, providing `pfOptimizeNparticles=TRUE`

in the `control`

list will use an experimental algorithm to estimate the optimal number of particles to use in the particle filter. See `help(samplers)`

for details. The MCMC configuration for the `timeModel`

in the previous section will serve as an example for the use of our PMCMC sampler. Here we use the identity matrix as our proposal covariance matrix.

```
timeConf <- configureMCMC(rTimeModel) # default MCMC configuration
# Add random walk PMCMC sampler with particle number optimization.
timeConf$addSampler(target = c("a", "b", "c", "mu_0"), type = "RW_PF_block",
control <- list(propCov= diag(4), adaptScaleOnly = FALSE,
latents = "x", pfOptimizeNparticles = TRUE))
```

The `type = "RW_PF"`

and `type = "RW_PF_block"`

samplers default to using a bootstrap filter. However, more efficient results can often be obtained by using a custom filtering algorithm. Choice of filtering algorithm can be controlled by the `pfType`

control list entry. The `pfType`

entry can be set either to `'bootstrap'`

(the default), `'auxiliary'`

, or the name of a user-defined nimbleFunction that returns a likelihood approximation.

Any user-defined filtering nimbleFunction named in the `pfType`

control list entry must satsify the following:

- The nimbleFunction must be the result of a call to
`nimbleFunction()`

. The

`nimbleFunction`

must have setup code that accepts the following (and only the following) arguments:`model`

, the NIMBLE model object that the MCMC algorithm is defined on.`latents`

, a character vector specifying the latent model nodes over which the particle filter will stochastically integrate over to estimate the log-likelihood function.`control`

, an`R`

`list`

object. Note that the`control`

list can be used to pass in any additional information or arguments that the custom filter may require.

- The
`nimbleFunction`

must have a`run`

function that accepts a single integer arugment (the number of particles to use), and returns a scalar double (the log-likelihood estimate). The

`nimbleFunction`

must define, in setup code, a`modelValues`

object named`mvEWSamples`

that is used to contain equally weighted samples of the latent states (that is, the`latents`

argument to the setup function). Each time the`run()`

method of the`nimbleFunction`

is called with number of particles`m`

, the`mvEWSamples`

`modelValues`

object should be resized to be of size`m`

via a call to`resize(mvEWSamples, m)`

.

## 8.2 Monte Carlo Expectation Maximization (MCEM)

Suppose we have a model with missing data (or a layer of latent variables that can be treated as missing data), and we would like to maximize the marginal likelihood of the model, integrating over the missing data. A brute-force method for doing this is MCEM. This is an EM algorithm in which the missing data are simulated via Monte Carlo (often MCMC, when the full conditional distributions cannot be directly sampled from) at each iteration. MCEM can be slow, and there are other methods for maximizing marginal likelihoods that can be implemented in NIMBLE. The reason we started with MCEM is to explore the flexibility of NIMBLE and illustrate the ability to combine R and NIMBLE to run an algorithm, with R managing the highest-level processing of the algorithm and calling nimbleFunctions for computations.

NIMBLE provides an ascent-based MCEM algorithm, created using `buildMCEM`

, that automatically determines when the algorithm has converged by examining the size of the changes in the likelihood between each iteration. Additionally, the MCEM algorithm can provide an estimate of the asymptotic covariance matrix of the parameters. An example of calculating the asymptotic covariance can be found in Section 8.2.1.

We will revisit the *pump* example to illustrate the use of NIMBLE’s MCEM algorithm.

```
pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
data = pumpData, inits = pumpInits, check = FALSE)
compileNimble(pump)
# build an MCEM algorithm with ascent-based convergence criterion
pumpMCEM <- buildMCEM(model = pump,
latentNodes = "theta", burnIn = 300,
mcmcControl = list(adaptInterval = 100),
boxConstraints = list( list( c("alpha", "beta"),
limits = c(0, Inf) ) ),
buffer = 1e-6)
```

The first argument `buildMCEM`

, `model`

, is a NIMBLE model, which can be either the uncompiled or compiled version. At the moment, the model provided cannot be part of another MCMC sampler. The ascent-based MCEM algorithm has a number of control options:

The `latentNodes`

argument should indicate the nodes that will be integrated over (sampled via MCMC), rather than maximized. These nodes must be stochastic, not deterministic! `latentNodes`

will be expanded as described in Section 13.3.1.1. I.e., either `latentNodes = "x"`

or `latentNodes = c("x[1]", "x[2]")`

will treat `x[1]`

and `x[2]`

as latent nodes if `x`

is a vector of two values. All other non-data nodes will be maximized over. Note that `latentNodes`

can include discrete nodes, but the nodes to be maximized cannot.

The `burnIn`

argument indicates the number of samples from the MCMC for the E-step that should be discarded when computing the expected likelihood in the M-step. Note that `burnIn`

can be set to values lower than in standard MCMC computations, as each iteration will start where the last left off.

The `mcmcControl`

argument will be passed to `configureMCMC`

to define the MCMC to be used.

The MCEM algorithm automatically detects box constraints for the nodes that will be optimized, using NIMBLE’s `getBounds`

function. It is also possible for a user to manually specify constraints via the `boxConstraints`

argument. Each constraint given should be a list in which the first element is the names of the nodes or variables that the constraint will be applied to and the second element is a vector of length two, in which the first value is the lower limit and the second is the upper limit. Values of `Inf`

and `-Inf`

are allowed. If a node is not listed, its constraints will be automatically determined by NIMBLE. These constraint arguments are passed as the `lower`

and `upper`

arguments to R’s `optim`

function, using `method = "L-BFGS-B"`

. Note that NIMBLE will give a warning if a user-provided constraint is more extreme than the constraint determined by NIMBLE.

The value of the `buffer`

argument shrinks the `boxConstraints`

by this amount. This can help protect against non-finite values occurring when a parameter is on the boundary.

In addition, the MCEM has some extra control options that can be used to further tune the convergence criterion. See `help(buildMCEM)`

for more information.

The `buildMCEM`

function returns a list with two elements. The first element is a function called `run`

, which will use the MCEM algorithm to estimate the MLEs. The second function is called `estimateCov`

, and is described in Section 8.2.1. The `run`

function can be run as follows. There is only one run-time argument, `initM`

, which is the number of MCMC iterations to use when the algorithm is initialized.

`pumpMLE <- pumpMCEM$run(initM = 1000)`

```
## Iteration Number: 1.
## Current number of MCMC iterations: 1000.
## Parameter Estimates:
## alpha beta
## 0.8156929 1.1437769
## Convergence Criterion: 1.001.
## Iteration Number: 2.
## Current number of MCMC iterations: 1000.
## Parameter Estimates:
## alpha beta
## 0.8310559 1.2637092
## Convergence Criterion: 0.02684197.
## Monte Carlo error too big: increasing MCMC sample size.
## Monte Carlo error too big: increasing MCMC sample size.
## Iteration Number: 3.
## Current number of MCMC iterations: 1875.
## Parameter Estimates:
## alpha beta
## 0.8211969 1.2482232
## Convergence Criterion: 0.001014128.
## Monte Carlo error too big: increasing MCMC sample size.
## Monte Carlo error too big: increasing MCMC sample size.
## Monte Carlo error too big: increasing MCMC sample size.
## Iteration Number: 4.
## Current number of MCMC iterations: 5618.
## Parameter Estimates:
## alpha beta
## 0.8219203 1.2552947
## Convergence Criterion: 0.0002462065.
```

`pumpMLE`

```
## alpha beta
## 0.8219203 1.2552947
```

Direct maximization after analytically integrating over the latent nodes (possible for this model but often not feasible) gives estimates of \(\hat\alpha=0.823\) and \(\hat\beta = 1.261\), so the MCEM seems to do pretty well, though tightening the convergence criteria may be warranted in actual usage.

### 8.2.1 Estimating the Asymptotic Covariance From MCEM

The second element of the list returned by a call to `buildMCEM`

is a function called `estimateCov`

, which estimates the asymptotic covariance of the parameters at their MLE values. If the `run`

function has been called previously, the `estimateCov`

function will automatically use the MLE values produced by the `run`

function to estimate the covariance. Alternatively, a user can supply their own MLE values using the `MLEs`

argument, which allows the covariance to be estimated without having called the `run`

function. More details about the `estimateCov`

function can be found by calling `help(buildMCEM)`

. Below is an example of using the `estimateCov`

function.

```
pumpCov <- pumpMCEM$estimateCov()
pumpCov
```

```
## alpha beta
## alpha 0.1252970 0.2111132
## beta 0.2111132 0.6160782
```

```
# Alternatively, you can manually specify the MLE values as a named vector.
pumpCov <- pumpMCEM$estimateCov(MLEs = c(alpha = 0.823, beta = 1.261))
```

### References

Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov Chain Monte Carlo Methods.” *Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 72 (3): 269–342. doi:10.1111/j.1467-9868.2009.00736.x.