# Chapter 8 Particle Filters, PMCMC, MCEM, Laplace approximation and quadrature

The NIMBLE algorithm library includes a suite of sequential Monte Carlo (particle filtering) algorithms (particle filters, PMCMC, iterated particle filter, and ensemble Kalman filter), Monte Carlo expectation maximization (MCEM) for maximum likelihood estimation, Laplace approximation and adaptive Gauss-Hermite quadrature, and k-fold cross-validation.

## 8.1 Particle filters / sequential Monte Carlo and iterated filtering

As of Version 0.10.0 of NIMBLE, all of NIMBLE’s sequential Monte Carlo/particle filtering functionality lives in the `nimbleSMC`

package, described in Michaud et al. (2021). Please load this package before trying to use these algorithms.

### 8.1.1 Filtering algorithms

NIMBLE includes algorithms for four different types of sequential Monte Carlo (also known as particle filters), which can be used to sample from the latent states and approximate the log likelihood of a state-space model. These include the bootstrap filter, the auxiliary particle filter, the Liu-West filter, and the ensemble Kalman filter. The iterated filtering version 2 (IF2) is a related method for maximum-likelihood estimation. Each of these is built with the eponymous functions `buildBootstrapFilter`

, `buildAuxiliaryFilter`

, `buildLiuWestFilter`

, `buildEnsembleKF`

, and `buildIteratedFilter2`

. Each method requires setup arguments `model`

and `nodes`

; the latter should be a character vector specifying latent model nodes. In addition, each method can be customized using a `control`

list argument. Details on the control options and specifics of the algorithms can be found in the help pages for the functions.

Once built, each filter can be run by specifying the number of particles. Each filter has a modelValues object named `mvEWSamples`

that is populated with equally-weighted samples from the posterior distribution of the latent states (and in the case of the Liu-West filter, the posterior distribution of the top level parameters as well) as the filter is run. The bootstrap, auxiliary, and Liu-West filters, as well as the IF2 method, also have another modelValues object, `mvWSamples`

. This has unequally-weighted samples from the posterior distribution of the latent states, along with weights for each particle. In addition, the bootstrap and auxiliary particle filters return estimates of the log-likelihood of the given state-space model.

We first create a linear state-space model to use as an example for our particle filter algorithms.

```
# Building a simple linear state-space model.
# x is latent space, y is observed data
timeModelCode <- nimbleCode({
x[1] ~ dnorm(mu_0, 1)
y[1] ~ dnorm(x[1], 1)
for(i in 2:t){
x[i] ~ dnorm(x[i-1] * a + b, 1)
y[i] ~ dnorm(x[i] * c, 1)
}
a ~ dunif(0, 1)
b ~ dnorm(0, 1)
c ~ dnorm(1,1)
mu_0 ~ dnorm(0, 1)
})
# simulate some data
t <- 25; mu_0 <- 1
x <- rnorm(1 ,mu_0, 1)
y <- rnorm(1, x, 1)
a <- 0.5; b <- 1; c <- 1
for(i in 2:t){
x[i] <- rnorm(1, x[i-1] * a + b, 1)
y[i] <- rnorm(1, x[i] * c, 1)
}
# build the model
rTimeModel <- nimbleModel(timeModelCode, constants = list(t = t),
data <- list(y = y), check = FALSE )
# Set parameter values and compile the model
rTimeModel$a <- 0.5
rTimeModel$b <- 1
rTimeModel$c <- 1
rTimeModel$mu_0 <- 1
cTimeModel <- compileNimble(rTimeModel)
```

#### 8.1.1.1 Bootstrap filter

Here is an example of building and running the bootstrap filter.

```
# Build bootstrap filter
rBootF <- buildBootstrapFilter(rTimeModel, "x",
control = list(thresh = 0.8, saveAll = TRUE,
smoothing = FALSE))
# Compile filter
cBootF <- compileNimble(rBootF,project = rTimeModel)
# Set number of particles
parNum <- 5000
# Run bootstrap filter, which returns estimate of model log-likelihood
bootLLEst <- cBootF$run(parNum)
# The bootstrap filter can also return an estimate of the effective
# sample size (ESS) at each time point
bootESS <- cBootF$returnESS()
```

#### 8.1.1.2 Auxiliary particle filter

Next, we provide an example of building and running the auxiliary particle filter. Note that a filter cannot be built on a model that already has a filter specialized to it, so we create a new copy of our state space model first.

```
# Copy our state-space model for use with the auxiliary filter
auxTimeModel <- rTimeModel$newModel(replicate = TRUE)
compileNimble(auxTimeModel)
# Build auxiliary filter
rAuxF <- buildAuxiliaryFilter(auxTimeModel, "x",
control = list(thresh = 0.5, saveAll = TRUE))
# Compile filter
cAuxF <- compileNimble(rAuxF,project = auxTimeModel)
# Run auxiliary filter, which returns estimate of model log-likelihood
auxLLEst <- cAuxF$run(parNum)
# The auxiliary filter can also return an estimate of the effective
# sample size (ESS) at each time point
auxESS <- cAuxF$returnESS()
```

#### 8.1.1.3 Liu and West filter

Now we give an example of building and running the Liu and West filter, which can sample from the posterior distribution of top-level parameters as well as latent states. Note that the Liu-West filter ofen performs poorly and is provided primarily for didactic purposes. The Liu and West filter accepts an additional `params`

argument, specifying the top-level parameters to be sampled.

```
# Copy model
LWTimeModel <- rTimeModel$newModel(replicate = TRUE)
compileNimble(LWTimeModel)
# Build Liu-West filter, also
# specifying which top level parameters to estimate
rLWF <- buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"),
control = list(saveAll = FALSE))
```

```
## Warning in buildLiuWestFilter(LWTimeModel, "x", params = c("a", "b", "c"), :
## The Liu-West filter ofen performs poorly and is provided primarily for didactic
## purposes.
```

#### 8.1.1.4 Ensemble Kalman filter

Next we give an example of building and running the ensemble Kalman filter, which can sample from the posterior distribution of latent states.

```
# Copy model
ENKFTimeModel <- rTimeModel$newModel(replicate = TRUE)
compileNimble(ENKFTimeModel)
# Build and compile ensemble Kalman filter
rENKF <- buildEnsembleKF(ENKFTimeModel, "x",
control = list(saveAll = FALSE))
cENKF <- compileNimble(rENKF,project = ENKFTimeModel)
# Run ensemble Kalman filter
cENKF$run(parNum)
```

Once each filter has been run, we can extract samples from the posterior distribution of our latent states as follows:

```
# Equally-weighted samples (available from all filters)
bootEWSamp <- as.matrix(cBootF$mvEWSamples) # alternative: as.list
auxEWSamp <- as.matrix(cAuxF$mvEWSamples)
LWFEWSamp <- as.matrix(cLWF$mvEWSamples)
ENKFEWSamp <- as.matrix(cENKF$mvEWSamples)
# Unequally-weighted samples, along with weights (available
# from bootstrap, auxiliary, and Liu and West filters)
bootWSamp <- as.matrix(cBootF$mvWSamples, "x")
bootWts <- as.matrix(cBootF$mvWSamples, "wts")
auxWSamp <- as.matrix(xAuxF$mvWSamples, "x")
auxWts <- as.matrix(cAuxF$mvWSamples, "wts")
# Liu and West filter also returns samples
# from posterior distribution of top-level parameters:
aEWSamp <- as.matrix(cLWF$mvEWSamples, "a")
```

#### 8.1.1.5 Iterated filtering 2 (IF2)

The IF2 method (Ionides et al. 2015) accomplishes maximum likelihood estimation using a scheme wherein both latent states and parameters are represented by particles that are weighted and resampled during the iterations. Iterations include perturbations to the parameter particles following a schedule of decreasing magnitude to yield convergence to the MLE.

Here we apply IF2 to Nile River flow data, specifying a changepoint in the year the Aswan Dam was constructed, as the dam altered river flows.

```
library(FKF)
flowCode <- nimbleCode({
for(t in 1:n)
y[t] ~ dnorm(x[t], sd = sigmaMeasurements)
x[1] ~ dnorm(x0, sd = sigmaInnovations)
for(t in 2:n)
x[t] ~ dnorm((t-1==28)*meanShift1899 + x[t-1], sd = sigmaInnovations)
logSigmaInnovations ~ dnorm(0, sd = 100)
logSigmaMeasurements ~ dnorm(0, sd = 100)
sigmaInnovations <- exp(logSigmaInnovations)
sigmaMeasurements <- exp(logSigmaMeasurements)
x0 ~ dnorm(1120, var = 100)
meanShift1899 ~ dnorm(0, sd = 100)
})
flowModel <- nimbleModel(flowCode, data = list(y = Nile),
constants = list(n = length(Nile)),
inits = list(logSigmaInnovations = log(sd(Nile)),
logSigmaMeasurements = log(sd(Nile)),
meanShift1899 = -100))
```

Note that the prior distributions for the parameters are not used by IF2, except possibly to obtain boundaries of valid parameter values (not the case here).

Now we build the filter, specifying user-controlled standard deviations (in this case the same as the perturbation `sigma`

values) for use in generating the initial particles for the parameters via the `control`

list.

```
filter <- buildIteratedFilter2(model = flowModel,
nodes = 'x',
params = c('logSigmaInnovations',
'logSigmaMeasurements',
'meanShift1899'),
baselineNode = 'x0',
control = list(sigma = c(0.1, 0.1, 5),
initParamSigma = c(0.1, 0.1, 5)))
cFlowModel <- compileNimble(flowModel)
cFilter <- compileNimble(filter, project = flowModel)
```

We now run the algorithm with 1000 particles for 100 iterations with the schedule parameter equal to 0.2.

In addition to the estimates, we can extract the values of the log-likelihood, the estimates and the standard deviation of the parameter particles as they evolve over the iterations, in order to assess convergence.

```
set.seed(1)
est <- cFilter$run(m = 1000, niter = 100, alpha = 0.2)
cFilter$estimates[95:100,] ## Last 5 iterations of parameter values
```

```
## [,1] [,2] [,3]
## [1,] 0.013306153 4.822613 -269.7837
## [2,] -0.013961938 4.857470 -271.4965
## [3,] 0.002183997 4.850475 -271.1919
## [4,] 0.015598044 4.851961 -272.3889
## [5,] -0.005571944 4.842719 -272.6390
## [6,] 0.042982317 4.837347 -271.1800
```

```
## [1] -627.0497 -626.9570 -626.9856 -626.9954 -626.7448 -626.7521 -626.8472
## [8] -626.7398 -626.9869 -627.1354 -626.6648
```

Comparing to use of the the Kalman Filter from the FKF package, we see the log-likelihood is fairly similar:

```
dtpred <- matrix(0, ncol = length(Nile))
dtpred[28] <- est[3]
ct <- matrix(0)
Zt <- Tt <- matrix(1)
fkfResult <- fkf(HHt = matrix(exp(2*est[1])),
GGt = matrix(exp(2*est[2])),
yt = rbind(Nile),
a0 = 1120,
P0 = matrix(100),
dt = dtpred, ct = ct, Zt = Zt, Tt = Tt)
fkfResult$logLik
```

`## [1] -626.5521`

### 8.1.2 Particle MCMC (PMCMC)

Particle MCMC (PMCMC) is a method that uses MCMC for top-level model
parameters and uses a particle filter to approximate the time-series
likelihood for use in determining MCMC acceptance probabilities
(Andrieu, Doucet, and Holenstein 2010). NIMBLE implements PMCMC by
providing random-walk Metropolis-Hastings samplers for model
parameters that make use of particle filters in this way. These
samplers can use NIMBLE’s bootstrap filter or auxiliary particle
filter, or they can use a user-defined filter. Whichever filter is
specified will be used to obtain estimates of the likelihood of the
state-space model (marginalizing over the latent states), which is
used for calculation of the Metropolis-Hastings acceptance
probability. The `RW_PF`

sampler uses a univariate normal
proposal distribution, and can be used to sample scalar top-level
parameters. The `RW_PF_block`

sampler uses a multivariate normal
proposal distribution and can be used to jointly sample vectors of
top-level parameters. The PMCMC samplers can be specified with a call
to `addSampler`

with `type = "RW_PF"`

or `type = "RW_PF_block"`

,
a syntax similar to the other MCMC samplers listed
in Section 7.11.

The `RW_PF`

sampler and `RW_PF_block`

sampler can be
customized using the `control`

list argument to set the adaptive
properties of the sampler and options for the particle filter
algorithm to be used. In addition, providing
`pfOptimizeNparticles=TRUE`

in the `control`

list will use an
experimental algorithm to estimate the optimal number of particles to
use in the particle filter. See `help(samplers)`

for details. The
MCMC configuration for the `timeModel`

in the previous section will
serve as an example for the use of our PMCMC sampler. Here we use the
identity matrix as our proposal covariance matrix.

```
timeConf <- configureMCMC(rTimeModel, nodes = NULL) # empty MCMC configuration
# Add random walk PMCMC sampler with particle number optimization.
timeConf$addSampler(target = c("a", "b", "c", "mu_0"), type = "RW_PF_block",
control = list(propCov= diag(4), adaptScaleOnly = FALSE,
latents = "x", pfOptimizeNparticles = TRUE))
```

The `type = "RW_PF"`

and `type = "RW_PF*block"`

samplers
default to using a bootstrap filter. The adapatation control
parameters `adaptive`

, `adaptInterval`

, and
`adaptScaleOnly`

work in the same way as for an `RW`

and `RW*block`

samplers.
However, it is not clear if the same approach to adaptation works well
for PMCMC, so one should consider turning off adaptation and using
a well-chosen proposal covariance.

It is also possible that more efficient results
can be obtained by using a custom filtering algorithm. Choice
of filtering algorithm can be controlled by the `pfType`

control
list entry. The `pfType`

entry can be set either to
`'bootstrap'`

(the default), `'auxiliary'`

, or the name of a
user-defined nimbleFunction that returns a likelihood approximation.

Any user-defined filtering nimbleFunction named in the `pfType`

control list entry must satsify the following:

The nimbleFunction must be the result of a call to

`nimbleFunction()`

.The

`nimbleFunction`

must have setup code that accepts the following (and only the following) arguments:`model`

, the NIMBLE model object that the MCMC algorithm is defined on.`latents`

, a character vector specifying the latent model nodes over which the particle filter will stochastically integrate over to estimate the log-likelihood function.`control`

, an`R`

`list`

object. Note that the`control`

list can be used to pass in any additional information or arguments that the custom filter may require.

The

`nimbleFunction`

must have a`run`

function that accepts a single integer arugment (the number of particles to use), and returns a scalar double (the log-likelihood estimate).The

`nimbleFunction`

must define, in setup code, a`modelValues`

object named`mvEWSamples`

that is used to contain equally weighted samples of the latent states (that is, the`latents`

argument to the setup function). Each time the`run()`

method of the`nimbleFunction`

is called with number of particles`m`

, the`mvEWSamples`

`modelValues`

object should be resized to be of size`m`

via a call to`resize(mvEWSamples, m)`

.

## 8.2 Monte Carlo Expectation Maximization (MCEM)

Suppose we have a model with missing data – or latent variables or random effects, which can be viewed as missing data – and we would like to maximize the marginal likelihood of the model, integrating over the missing data. A brute-force method for doing this is MCEM. This is an EM algorithm in which the missing data are simulated via Monte Carlo (often MCMC, when the full conditional distributions cannot be directly sampled from) at each iteration. MCEM can be slow but is also a general workhorse method that is applicable to a wide range of problems.

`buildMCEM`

provides an (optionally) ascent-based MCEM algorithm based on
Caffo, Jank, and Jones (2005). `buildMCEM`

was re-written for nimble version 1.2.0 and is
not compatible with previous versions. For the E (expectation) step,
`buildMCEM`

uses an MCMC for the latent states given the data and current
parameters. For the M (maximization) step, `buildMCEM`

uses `optim`

.

The ascent-based feature uses an estimate of the standard error of the move in
one iteration to determine if that move is clearly uphill or is swamped by
noise due to using a Monte Carlo (MCMC) sample. In the latter case, the MCMC
sample size is increased until there is a clear uphill move. In a similar
manner, convergence is determined based on whether the uphill move is
confidently less than a tolerance level taken to indicate that the algorithm
is at the MLE. The associated tuning parameters, along with details on control
over each step, are described at `help(buildMCEM)`

.

The maximization part of each MCEM iteration will use gradients from automatic
differentiation if they are supported for the model and `buildDerivs=TRUE`

when `nimbleModel`

was called to create the model. If any parameters have
constraints on valid values (e.g. greater than 0), by default a parameter
transformation will be set up (see `help(parameterTransform)`

) and
optimization will be done in the transformed (unconstrained) space.

Additionally, the MCEM algorithm can provide an estimate of the asymptotic covariance matrix of the parameters. See 8.2.1.

We will revisit the *pump* example to illustrate the use of
NIMBLE’s MCEM algorithm.

```
pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
data = pumpData, inits = pumpInits, check = FALSE,
buildDerivs=TRUE)
Cpump <- compileNimble(pump)
# build an MCEM algorithm with ascent-based convergence criterion
pumpMCEM <- buildMCEM(model = pump,
latentNodes = "theta")
# The latentNodes="theta" would be determined by default but is shown for clarity.
CpumpMCEM <- compileNimble(pumpMCEM, project=pump)
```

The first argument to `buildMCEM`

, `model`

, is a NIMBLE model. At the moment, the
model provided cannot be part of another MCMC sampler.

```
## [Note] Iteration Number: 1.
## [Note] Current number of MCMC iterations: 1000.
## [Note] Parameter Estimates:
## 0.818983
## 1.14746
## [Note] Convergence Criterion: 0.684673.
## [Note] Iteration Number: 2.
## [Note] Current number of MCMC iterations: 1000.
## [Note] Parameter Estimates:
## 0.818793
## 1.21849
## [Note] Convergence Criterion: 0.0209188.
## [Note] Iteration Number: 3.
## [Note] Current number of MCMC iterations: 1000.
## [Note] Parameter Estimates:
## 0.819104
## 1.24505
## [Note] Convergence Criterion: 0.00392522.
## [Note] Iteration Number: 4.
## [Note] Current number of MCMC iterations: 1000.
## [Note] Parameter Estimates:
## 0.838402
## 1.29963
## [Note] Convergence Criterion: 0.00657192.
## [Note] Iteration Number: 5.
## [Note] Current number of MCMC iterations: 1000.
## [Note] Parameter Estimates:
## 0.821287
## 1.26525
## [Note] Convergence Criterion: 0.00361366.
## [Note] Iteration Number: 6.
## [Note] Current number of MCMC iterations: 1000.
## [Note] Parameter Estimates:
## 0.835258
## 1.29028
## [Note] Convergence Criterion: 0.00249731.
## Monte Carlo error too big: increasing MCMC sample size.
## [Note] Iteration Number: 7.
## [Note] Current number of MCMC iterations: 1167.
## [Note] Parameter Estimates:
## 0.829192
## 1.28844
## [Note] Convergence Criterion: 0.000914589.
```

`## [1] 0.8291919 1.2884366`

For this model, we can check the result by analytically integrating over the latent nodes (possible for this model but often not feasible), which gives maximum likelihood estimates of \(\hat\alpha=0.823\) and \(\hat\beta = 1.261\). Thus the MCEM seems to do pretty well, though tightening the convergence criteria may be warranted in actual usage.

For more complicated models, it is worth exploring the tuning parameters (see
`help(buildMCEM)`

). MCEM inherits from the EM algorithm the difficulty that the
convergence path can be slow, depending on the model. This can depend on the
parameterization, such as whether random effects are centered or uncentered, so
it can also be worth exploring different ways to write a model.

### 8.2.1 Estimating the asymptotic covariance From MCEM

The `vcov`

method of an MCEM nimbleFunction calculates a Monte Carlo estimate of
the asymptotic covariance of the parameters based on the Hessian matrix at the
MLE, using the method of Louis (1982). Arguments to this method allow control over
whether to generate a new MCMC sample for this purpose and other details.

Continuing the above example, here is the covariance matrix:

```
## [,1] [,2]
## [1,] 0.1295558 0.2263726
## [2,] 0.2263726 0.6758945
```

## 8.3 Laplace approximation and adaptive Gauss-Hermite quadrature

Many hierarchical models include continuous random effects that must be integrated over to obtain the (marginal) likelihood of the parameters given the data. Laplace approximation and adaptive Gauss-Hermite quadrature (AGHQ) are often accurate and fast approximations for doing so. Laplace is simply AGHQ with a single quadrature point (the conditional mode of the random effects).

NIMBLE provides these algorithms via `buildLaplace`

and `buildAGHQuad`

(the
former simply calls the latter), which take advantage of the automatic
differentiation features introduced in version 1.0.0. Laplace approximation is
introduced in section 16.2, while details for both can be found
by `help(buildLaplace)`

.