# Building Particle Filters and Particle MCMC in NIMBLE

This example shows how to construct and conduct inference on a state space model using particle filtering algorithms. `nimble`

currently has versions of the bootstrap filter, the auxiliary particle filter, the ensemble Kalman filter, and the Liu and West filter implemented. Additionally, particle MCMC samplers are available and can be specified for both univariate and multivariate parameters.

# Model Creation

Assume is the latent state and is the observation at time for . We define our state space model as

with initial states

and prior distributions

where denotes a normal distribution with mean and standard deviation , and is a shifted, scaled -distribution with center parameter , scale parameter , and degrees of freedom.

We specify and build our state space model below, using time points:

## load the nimble library and set seed library('nimble') set.seed(1) ## define the model stateSpaceCode <- nimbleCode({ a ~ dunif(-0.9999, 0.9999) b ~ dnorm(0, sd = 1000) sigPN ~ dunif(1e-04, 1) sigOE ~ dunif(1e-04, 1) x[1] ~ dnorm(b/(1 - a), sd = sigPN/sqrt((1-a*a))) y[1] ~ dt(mu = x[1], sigma = sigOE, df = 5) for (i in 2:t) { x[i] ~ dnorm(a * x[i - 1] + b, sd = sigPN) y[i] ~ dt(mu = x[i], sigma = sigOE, df = 5) } }) ## define data, constants, and initial values data <- list( y = c(0.213, 1.025, 0.314, 0.521, 0.895, 1.74, 0.078, 0.474, 0.656, 0.802) ) constants <- list( t = 10 ) inits <- list( a = 0, b = .5, sigPN = .1, sigOE = .05 ) ## build the model stateSpaceModel <- nimbleModel(stateSpaceCode, data = data, constants = constants, inits = inits, check = FALSE)

## defining model...

## building model...

## setting data and initial values...

## running calculate on model (any error reports that follow may simply ## reflect missing values in model variables) ...

##

## checking model sizes and dimensions...

## note that missing values (NAs) or non-finite values were found in model ## variables: x, lifted_a_times_x_oBi_minus_1_cB_plus_b. This is not an error, ## but some or all variables may need to be initialized for certain algorithms ## to operate properly.

##

## model building finished.

# Construct and run a bootstrap filter

We next construct a bootstrap filter to conduct inference on the latent states of our state space model. Note that the bootstrap filter, along with the auxiliary particle filter and the ensemble Kalman filter, treat the top-level parameters `a, b, sigPN`

, and `sigOE`

as fixed. Therefore, the bootstrap filter below will proceed as though `a = 0, b = .5, sigPN = .1`

, and `sigOE = .05`

, which are the initial values that were assigned to the top-level parameters.

The bootstrap filter takes as arguments the name of the model and the name of the latent state variable within the model. The filter can also take a control list that can be used to fine-tune the algorithm’s configuration.

## build bootstrap filter and compile model and filter bootstrapFilter <- buildBootstrapFilter(stateSpaceModel, nodes = 'x') compiledList <- compileNimble(stateSpaceModel, bootstrapFilter)

## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compiler details.

## compilation finished.

## run compiled filter with 10,000 particles. ## note that the bootstrap filter returns an estimate of the log-likelihood of the model. compiledList$bootstrapFilter$run(10000)

## [1] -28.13009

Particle filtering algorithms in `nimble`

store weighted samples of the filtering distribution of the latent states in the `mvSamples`

modelValues object. Equally weighted samples are stored in the `mvEWSamples`

object. By default, `nimble`

only stores samples from the final time point.

## extract equally weighted posterior samples of x[10] and create a histogram posteriorSamples <- as.matrix(compiledList$bootstrapFilter$mvEWSamples) hist(posteriorSamples)

The auxiliary particle filter and ensemble Kalman filter can be constructed and run in the same manner as the bootstrap filter.

# Conduct inference on top-level parameters using particle MCMC

Particle MCMC can be used to conduct inference on the posterior distribution of both the latent states and any top-level parameters of interest in a state space model. The particle marginal Metropolis-Hastings sampler can be specified to jointly sample the `a, b, sigPN`

, and `sigOE`

top level parameters within `nimble`

‘s MCMC framework as follows:

## create MCMC specification for the state space model stateSpaceMCMCconf <- configureMCMC(stateSpaceModel, nodes = NULL) ## add a block pMCMC sampler for a, b, sigPN, and sigOE stateSpaceMCMCconf$addSampler(target = c('a', 'b', 'sigPN', 'sigOE'), type = 'RW_PF_block', control = list(latents = 'x')) ## build and compile pMCMC sampler stateSpaceMCMC <- buildMCMC(stateSpaceMCMCconf) compiledList <- compileNimble(stateSpaceModel, stateSpaceMCMC, resetFunctions = TRUE)

## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compiler details.

## compilation finished.

## run compiled sampler for 5000 iterations compiledList$stateSpaceMCMC$run(5000)

## |-------------|-------------|-------------|-------------| ## |-------------------------------------------------------|

## NULL

## create trace plots for each parameter library('coda')

par(mfrow = c(2,2)) posteriorSamps <- as.mcmc(as.matrix(compiledList$stateSpaceMCMC$mvSamples)) traceplot(posteriorSamps[,'a'], ylab = 'a') traceplot(posteriorSamps[,'b'], ylab = 'b') traceplot(posteriorSamps[,'sigPN'], ylab = 'sigPN') traceplot(posteriorSamps[,'sigOE'], ylab = 'sigOE')

The above `RW_PF_block`

sampler uses a multivariate normal proposal distribution to sample vectors of top-level parameters. To sample a scalar top-level parameter, use the `RW_PF`

sampler instead.

# Version 0.6-3 released.

Version 0.6-3 is a very minor release primarily intended to address some CRAN packaging issues that do not affect users. We also fixed a bug involving MCEM functionality and a bug that prevented use of the sd() and var() functions in BUGS code.

For most users, there is probably no need to upgrade from version 0.6-2.

# Version 0.6-2 released!

Version 0.6-2 is a minor release with a variety of useful functionality for users.

Changes as of Version 0.6-2 include:

- user-defined distributions can be used in BUGS code without needing to call the registerDistributions() function (unless one wants to specify alternative parameterizations, distribution range or that the distribution is discrete),
- users can now specify the use of conjugate (Gibbs) samplers for nodes in a model,
- NIMBLE will now check the run code of nimbleFunctions for functions (in particular R functions) that are not part of the DSL and will not compile,
- added getBound() functionality to find the lower and upper bounds of a node either from R or in DSL code,
- added functionality to get distributional information about a node in a model or information about a distribution based on the name of the density function; these may be useful in setup code for algorithms,
- multinomial and categorical distributions now allow ‘probs’ arguments that do not sum to one (these will be internally normalized) and
- a variety of bug fixes.

Please see the NEWS file in the installed package for more details.

# NIMBLE package for hierarchical modeling (MCMC and more) faster and more flexible in version 0.6-1

NIMBLE version 0.6-1 has been released on CRAN and at r-nimble.org.

NIMBLE is a system that allows you to:

- Write general hierarchical statistical models in BUGS code and create a corresponding model object to use in R.
- Build Markov chain Monte Carlo (MCMC), particle filters, Monte Carlo Expectation Maximization (MCEM), or write generic algorithms that can be applied to any model.
- Compile models and algorithms via problem-specific generated C++ that NIMBLE interfaces to R for you.

Most people associate BUGS with MCMC, but NIMBLE is about much more than that. It implements and extends the BUGS language as a flexible system for model declaration and lets you do what you want with the resulting models. Some of the cool things you can do with NIMBLE include:

- Extend BUGS with functions and distributions you write in R as nimbleFunctions, which will be automatically turned into C++ and compiled into your model.
- Program with models written in BUGS code: get and set values of variables, control model calculations, simulate new values, use different data sets in the same model, and more.
- Write your own MCMC samplers as nimbleFunctions and use them in combination with NIMBLE’s samplers.
- Write functions that use MCMC as one step of a larger algorithm.
- Use standard particle filter methods or write your own.
- Combine particle filters with MCMC as Particle MCMC methods.
- Write other kinds of model-generic algorithms as nimbleFunctions.
- Compile a subset of R’s math syntax to C++ automatically, without writing any C++ yourself.

Some early versions of NIMBLE were not on CRAN because NIMBLE’s system for on-the-fly compilation via generating and compiling C++ from R required some extra work for CRAN packaging, but now it’s there. Compared to earlier versions, the new version is faster and more flexible in a lot of ways. Building and compiling models and algorithms could sometimes get bogged down for large models, so we streamlined those steps quite a lot. We’ve generally increased the efficiency of C++ generated by the NIMBLE compiler. We’ve added functionality to what can be compiled to C++ from nimbleFunctions. And we’ve added a bunch of better error-trapping and informative messages, although there is still a good way to go on that. Give us a holler on the nimble-users list if you run into questions.

# Version 0.5-1 of NIMBLE released!

Version 0.5-1 is officially a minor release, but it actually has quite a bit in it, in particular the addition/improvement of a number of our algorithms. In addition there are some more improvements in our speed in building and compiling models and algorithms.

Changes as of Version 0.5-1 include:

- the addition of a variety of sequential Monte Carlo (aka particle filtering) algorithms, including particle MCMC samplers for use within an MCMC,
- a greatly improved MCEM algorithm with an automated convergence and stopping criterion,
- new syntax for declaring multivariate variables in the NIMBLE DSL, namely numeric(), integer(), matrix(), and array(), with declare() now deprecated,
- addition of the multivariate-t distribution for use in BUGS and DSL code,
- a new binary MCMC sampler for discrete 0/1 nodes,
- addition of functionality to our random walk sampler to allow sampling on the log scale and use of reflection,
- more flexible use of forwardsolve(), backsolve(), and solve(), including use in BUGS code, and
- a variety of other items.

Please see the NEWS file in the source package.

# NIMBLE: A new way to do MCMC (and more) from BUGS code in R

Yesterday we released version 0.5 of NIMBLE on our web site, r-nimble.org. (We’ll get it onto CRAN soon, but it has some special needs to work out.) NIMBLE tries to fill a gap in what R programmers and analysts can do with general hierarchical models. Packages like WinBUGS, OpenBUGS, JAGS and Stan provide a language for writing a model flexibly, and then they provide one flavor of MCMC. These have been workhorses of the Bayesian revolution, but they don’t provide much control over how the MCMC works (what samplers are used) or let one do anything else with the model (though Stan provides some additional fitting methods).

The idea of NIMBLE has been to provide a layer of programmability for algorithms that use models written in BUGS. We adopted BUGS as a model declaration language because these is so much BUGS code out there and so many books that use BUGS for teaching Bayesian statistics. Our implementation processes BUGS code in R and creates a model object that you can program with. For MCMC, we provide a default set of samplers, but these choices can be modified. It is easy to write your own sampler and add it to the MCMC. And it is easy to add new distributions and functions for use in BUGS code, something that hasn’t been possible (in any easy way) before. These features can allow big gains in MCMC efficiency.

MCMCs are heavily computational, so NIMBLE includes a compiler that generates C++ specific to a model and algorithm (MCMC samplers or otherwise), compiles it, loads it into R and gives you an interface to it. To be able to compile an algorithm, you need to write it as a nimbleFunction rather than a regular R function. nimbleFunctions can interact with model objects, and they can use a subset of R for math and flow-control. Among other things, the NIMBLE compiler automatically generates code for the Eigen C++ linear algebra library and manages all the necessary interfaces.

Actually, NIMBLE is not specific to MCMC or to Bayesian methods. You can write other algorithms to use whatever model you write in BUGS code. Here’s one simple example: in the past if you wanted to do a simulation study for a model written in BUGS code, you had to re-write the model in R just to simulate from it. With NIMBLE you can simulate from the model as written in BUGS and have complete control over what parts of the model you use. You can also query the model about how nodes are related so that you can make an algorithm adapt to what it finds in a model. We have a set of sequential Monte Carlo (particle filter) methods in development that we’ll release soon. But the idea is that NIMBLE provides a platform for others to develop and disseminate model-generic algorithms.

NIMBLE also extends BUGS in a bunch of ways that I won’t go into here. And it has one major limitation right now: it doesn’t handle models with stochastic indices, like latent class membership models.

Here is a toy example of what it looks like to set up and run an MCMC using NIMBLE.

library(nimble) myBUGScode <- nimbleCode({ mu ~ dnorm(0, sd = 100) ## uninformative prior sigma ~ dunif(0, 100) for(i in 1:10) y[i] ~ dnorm(mu, sd = sigma) }) myModel <- nimbleModel(myBUGScode)

myData <- rnorm(10, mean = 2, sd = 5) myModel$setData(list(y = myData)) myModel$setInits(list(mu = 0, sigma = 1)) myMCMC <- buildMCMC(myModel) compiled <- compileNimble(myModel, myMCMC) compiled$myMCMC$run(10000)

samples <- as.matrix(compiled$myMCMC$mvSamples) plot(density(samples[,'mu']))

plot(density(samples[,'sigma']))

# NIMBLE paper in Journal of Computational and Graphical Statistics

Our paper giving an overview on the rationale and design of NIMBLE has appeared online in accepted manuscript form at the Journal of Computational and Graphical Statistics. You can get it here.

# Version 0.5 released!

We’ve just released the next major version of NIMBLE.

Changes include

- more efficient computations for conjugate sampling,
- additional automated checking of BUGS syntax to improve NIMBLE’s warning/error messages,
- new API functionality to allow the use of syntax such as model$calculate(), etc. (syntax such as calculate(model) still works),
- new API functionality for MCMC sampler specification,
- improvements in speed and memory use in building models,
- addition of forwardsolve, backsolve, and solve to the NIMBLE DSL, and
- a variety of other items.

More details in the NEWS file that accompanies the package.

We anticipate being on CRAN in coming weeks and a next release soon that will include a full suite of sequential Monte Carlo (i.e., particle filtering) algorithms.

# Version 0.4-1 released!

We’ve just released version 0.4-1, a minor release that fixes some logistical issues and adds a bit of functionality to our MCMC engine.

Changes as of Version 0.4-1 include:

- added an elliptical slice sampler to the MCMC engine,
- fixed bug preventing use of nimbleFunctions in packages depending on NIMBLE, and
- reduced C++ compiler warnings on Windows during use of compileNimble.

# We have a post-doc opening.

We have a 1-year opening for a post-doc interested in developing statistical methods in NIMBLE.

Here is the official, approved job advertisement:

POSTDOCTORAL SCHOLAR POSITION AVAILABLE IN COMPUTATIONAL STATISTICS – UNIVERSITY OF CALIFORNIA, BERKELEY

The Departments of Statistics and Environmental Science Policy, and Management have an opening for a Postdoctoral Scholar – Employee to develop and apply statistical algorithms as part of the NIMBLE software development team. NIMBLE is a NSF-funded framework for programming computational methods for general hierarchical models such as Markov chain Monte Carlo, sequential Monte Carlo, and numerical integration and approximation. More information is available at R-nimble.org. The post- doc will be supervised by co-PIs Perry de Valpine and Chris Paciorek. We seek a candidate who will build out NIMBLE’s algorithm library, which includes using it as a platform for methodological and applied research. The successful candidate will be expected to author peer-reviewed publications and contribute to software development.

BASIC QUALIFICATIONS

Candidates must have completed all degree requirements except the dissertation or be enrolled in an accredited PhD or equivalent degree in a statistical field such as Statistics or Computer Science or a field of statistical application at the time of application.

ADDITIONAL QUALIFICATIONS

Candidates must have a PhD or equivalent degree in a statistical field such as Statistics or Computer Science or in a field of statistical application such as biology, ecology, environmental science, political science, psychology, education, public health or related field by appointment start date.

PREFERRED QUALIFICATIONS

Demonstrated experience programming complex scientific computing applications using R and/or C++, Python or others. Demonstrated experience advancing computational statistical methodology by appointment start date.

APPOINTMENT

The position is available to start immediately but we seek the best candidate even if they cannot start until a later date. The initial appointment is for one-year, with renewal based on performance and funding. This is a full-time appointment.

SALARY AND BENEFITS

Salary will be commensurate with qualifications and experience. Generous benefits are included (http://vspa.berkeley.edu/postdocs)

TO APPLY

Visit: https://aprecruit.berkeley.edu/apply/JPF00860

Interested individuals should include a 1-2 page cover letter describing their research experience and publications along with a current CV and the names and contact information of three references. Letters of reference may be requested for finalists. It is optional to include a statement addressing past and/or potential contributions to diversity through research, teaching, and/or service.

This position will remain open until filled.

Questions regarding this recruitment can be directed to Maria P. Aranas, aranas4@berkeley.edu.

All letters will be treated as confidential per University of California policy and California state law. Please refer potential referees, including when letters are provided via a third party (i.e. dossier service or career center) to the UC Berkeley Statement of Confidentiality (http://apo.berkeley.edu/evalltr.html ) prior to submitting their letters.

The University of California is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age or protected veteran status. For the complete University of California nondiscrimination and affirmative action policy see: http://policy.ucop.edu/doc/4000376/NondiscrimAffirmAct

The Department is interested in candidates who will contribute to diversity and equal opportunity in higher education through their research or teaching.

The University of California, Berkeley has an excellent benefits package as well as a number of policies and programs in place to support employees as they balance work and family.