# Chapter 15 Writing nimbleFunctions to interact with models

## 15.1 Overview

When you write an R function, you say what the input arguments are, you provide the code for execution, and in that code you give the value to be returned^{26}. Using the `function`

keyword in R triggers the operation of creating an object that is the function.

Creating nimbleFunctions is similar, but there are two kinds of code and two steps of execution:

*Setup*code is provided as a regular R function, but the programmer does not control what it returns. Typically the inputs to*setup*code are objects like a model, a vector of nodes, a modelValues object or a modelValues configuration, or another nimbleFunction. The setup code, as its name implies, sets up information for run-time code. It is executed in R, so it can use any aspect of R.*Run*code is provided in the NIMBLE language, which was introduced in Chapter 11. This is similar to a narrow subset of R, but it is important to remember that it is different – defined by what can be compiled – and much more limited.*Run*code can use the objects created by the*setup*code. In addition, some information on variable types must be provided for input arguments, the return value, and in some circumstances for local variables. There are two kinds of*run*code:- There is always a primary function, given as the argument
`run`

^{27}. - There can optionally be other functions, or ‘methods’ in the language of object-oriented programming, that share the same objects created by the
*setup*function.

- There is always a primary function, given as the argument

Here is a small example to fix ideas:

```
logProbCalcPlus <- nimbleFunction(
setup = function(model, node) {
dependentNodes <- model$getDependencies(node)
valueToAdd <- 1
},
run = function(P = double(0)) {
model[[node]] <<- P + valueToAdd
return(model$calculate(dependentNodes))
returnType(double(0))
})
code <- nimbleCode({
a ~ dnorm(0, 1)
b ~ dnorm(a, 1)
})
testModel <- nimbleModel(code, check = FALSE)
logProbCalcPlusA <- logProbCalcPlus(testModel, "a")
testModel$b <- 1.5
logProbCalcPlusA$run(0.25)
```

`## [1] -2.650377`

`dnorm(1.25,0,1,TRUE)+dnorm(1.5,1.25,1,TRUE) # direct validation`

`## [1] -2.650377`

`testModel$a # "a" was set to 0.5 + valueToAdd`

`## [1] 1.25`

The call to the R function called `nimbleFunction`

returns a function, similarly to defining a function in R. That function, `logProbCalcPlus`

, takes arguments for its `setup`

function, executes it, and returns an object, `logProbCalcPlusA`

, that has a *run* member function (method) accessed by `$run`

. In this case, the `setup`

function obtains the stochastic dependencies of the `node`

using the `getDependencies`

member function of the model (see Section 13.1.3) and stores them in `dependentNodes`

. In this way, `logProbCalcPlus`

can adapt to any model. It also creates a variable, `valueToAdd`

, that can be used by the nimbleFunction.

The object `logProbCalcPlusA`

, returned by `logProbCalcPlus`

, is permanently bound to the results of the processed `setup`

function. In this case, `logProbCalcPlusA$run`

takes a scalar input value, `P`

, assigns `P + valueToAdd`

to the given node in the model, and returns the sum of the log probabilities of that node and its stochastic dependencies^{28}. We say `logProbCalcPlusA`

is an ‘instance’ of `logProbCalcPlus`

that is ‘specialized’ or ‘bound’ to `a`

and `testModel`

. Usually, the `setup`

code will be where information about the model structure is determined, and then the `run`

code can use that information without repeatedly, redundantly recomputing it. A nimbleFunction can be called repeatedly (one can think of it as a generator), each time returning a specialized nimbleFunction.

Readers familiar with object-oriented programming may find it useful to think in terms of class definitions and objects. `nimbleFunction`

creates a class definition. Each specialized nimbleFunction is one object in the class. The setup arguments are used to define member data in the object.

## 15.2 Using and compiling nimbleFunctions

To compile the nimbleFunction, together with its model, we use `compileNimble`

:

```
CnfDemo <- compileNimble(testModel, logProbCalcPlusA)
CtestModel <- CnfDemo$testModel
ClogProbCalcPlusA <- CnfDemo$logProbCalcPlusA
```

These have been initialized with the values from their uncompiled versions and can be used in the same way:

`CtestModel$a # values were initialized from testModel`

`## [1] 1.25`

`CtestModel$b`

`## [1] 1.5`

```
lpA <- ClogProbCalcPlusA$run(1.5)
lpA
```

`## [1] -5.462877`

```
# verify the answer:
dnorm(CtestModel$b, CtestModel$a, 1, log = TRUE) +
dnorm(CtestModel$a, 0, 1, log = TRUE)
```

`## [1] -5.462877`

`CtestModel$a # a was modified in the compiled model`

`## [1] 2.5`

`testModel$a # the uncompiled model was not modified`

`## [1] 1.25`

## 15.3 Writing setup code

### 15.3.1 Useful tools for setup functions

The setup function is typically used to determine information on nodes in a model, set up modelValues or nimbleList objects, set up (nested) nimbleFunctions or nimbleFunctionLists, and set up any persistent numeric objects. For example, the setup code of an MCMC nimbleFunction creates the nimbleFunctionList of sampler nimbleFunctions. The values of numeric objects created in setup code can be modified by run code and will persist across calls.

Some of the useful tools and objects to create in setup functions include:

**vectors of node names, often from a model**Often these are obtained from the`getNodeNames`

,`getDependencies`

, and other methods of a model, described in Sections 13.1-13.2.**modelValues objects**These are discussed in Sections 14.1 and 15.4.4.**nimbleList objects**New instances of`nimbleList`

s can be created from a nimbleList definition in either setup or run code. See Section 14.2 for more information.**specializations of other nimbleFunctions**A useful NIMBLE programming technique is to have one nimbleFunction contain other nimbleFunctions, which it can use in its run-time code (Section 15.4.7).**lists of other nimbleFunctions**In addition to containing single other nimbleFunctions, a nimbleFunction can contain a list of other nimbleFunctions (Section 15.4.8).

If one wants a nimbleFunction that does get specialized but has empty setup code, use `setup = function() {}`

or `setup = TRUE`

.

### 15.3.2 Accessing and modifying numeric values from setup

While models and nodes created during setup cannot be modified^{29}, numeric values and modelValues can be, as illustrated by extending the example from above.

`logProbCalcPlusA$valueToAdd # in the uncompiled version`

`## [1] 1`

```
logProbCalcPlusA$valueToAdd <- 2
ClogProbCalcPlusA$valueToAdd # or in the compiled version
```

`## [1] 1`

```
ClogProbCalcPlusA$valueToAdd <- 3
ClogProbCalcPlusA$run(1.5)
```

`## [1] -16.46288`

`CtestModel$a # a == 1.5 + 3`

`## [1] 4.5`

### 15.3.3 Determining numeric types in nimbleFunctions

For numeric variables from the `setup`

function that appear in the `run`

function or other member functions (or are declared in `setupOutputs`

), the type is determined from the values created by the setup code. The types created by setup code must be consistent across all specializations of the nimbleFunction. For example if `X`

is created as a matrix (two-dimensional double) in one specialization but as a vector (one-dimensional double) in another, there will be a problem during compilation. The sizes may differ in each specialization.

Treatment of vectors of length one presents special challenges because they could be treated as scalars or vectors. Currently they are treated as scalars. If you want a vector, ensure that the length is greater than one in the setup code and then use `setSize`

in the run-time code.

### 15.3.4 Control of setup outputs

Sometimes setup code may create variables that are not used in run code. By default, NIMBLE inspects run code and omits variables from setup that do not appear in run code from compilation. However, sometimes a programmer may want to force a numeric or character variable to be included in compilation, even if it is not used directly in run code. As shown below, such variables can be directly accessed in one nimbleFunction from another, which provides a way of using nimbleFunctions as general data structures. To force NIMBLE to include variables during compilation, for example `X`

and `Y`

, simply include

`setupOutputs(X, Y)`

anywhere in the setup code.

## 15.4 Writing run code

In Chapter 11 we described the functionality of the NIMBLE language that could be used in run code without setup code (typically in cases where no models or modelValues are needed). Next we explain the additional features that allow use of models and modelValues in the run code.

### 15.4.1 Driving models: *calculate*, *calculateDiff*, *simulate*, *getLogProb*

These four functions are the primary ways to operate a model. Their syntax was explained in Section 13.3. Except for `getLogProb`

, it is usually important for the `nodes`

vector to be sorted in topological order. Model member functions such as `getDependencies`

and `expandNodeNames`

will always return topoligically sorted node names.

Most R-like indexing of a node vector is allowed within the argument to `calculate`

, `calculateDiff`

, `simulate`

, and `getLogProb`

. For example, all of the following are allowed:

```
myModel$calculate(nodes)
myModel$calculate(nodes[i])
myModel$calculate(nodes[1:3])
myModel$calculate(nodes[c(1,3)])
myModel$calculate(nodes[2:i])
myModel$calculate(nodes[ values(model, nodes) + 0.1 < x ])
```

Note that one cannot create new vectors of nodes in run code. They can only be indexed within a call to `calculate`

, `calculateDiff`

, `simulate`

or `getLogProb`

.

### 15.4.2 Getting and setting variable and node values

#### 15.4.2.1 Using indexing with nodes

Here is an example that illustrates getting and setting of nodes, subsets of nodes, or variables. Note the following:

- In
`model[[v]]`

,`v`

can only be a single node or variable name, not a vector of multiple nodes nor an element of such a vector (`model[[ nodes[i] ]]`

does not work). The node itself may be a vector, matrix or array node. - In fact,
`v`

can be a node-name-like character string, even if it is not actually a node in the model. See example 4 in the code below. - One can also use
`model$varName`

, with the caveat that`varName`

must be a variable name. This usage would only make sense for a nimbleFunction written for models known to have a specific variable. (Note that if`a`

is a scalar node in`model`

, then`model[['a']]`

will be a scalar but`model$a`

will be a vector of length -) - one should use the
`<<-`

global assignment operator to assign into model nodes.

Note that NIMBLE does not allow variables to change dimensions. Model nodes are the same, and indeed are more restricted because they can’t change sizes. In addition, NIMBLE distinguishes between scalars and vectors of length 1. These rules, and ways to handle them correctly, are illustrated in the following code as well as in Section 11.3.

```
code <- nimbleCode({
z ~ dnorm(0, sd = sigma)
sigma ~ dunif(0, 10)
y[1:n] ~ dmnorm(zeroes[1:n], cov = C[1:5, 1:5])
})
n <- 5
m <- nimbleModel(code, constants = list(n = n, zeroes = rep(0, n),
C = diag(n)))
cm <- compileNimble(m)
nfGen <- nimbleFunction(
setup = function(model) {
# node1 and node2 would typically be setup arguments, so they could
# have different values for different models. We are assigning values
# here so the example is clearer.
node1 <- 'sigma' # a scalar node
node2 <- 'y[1:5]' # a vector node
notReallyANode <- 'y[2:4]' # y[2:4] allowed even though not a node!
},
run = function(vals = double(1)) {
tmp0 <- model[[node1]] # 1. tmp0 will be a scalar
tmp1 <- model[[node2]] # 2. tmp1 will be a vector
tmp2 <- model[[node2]][1] # 3. tmp2 will be a scalar
tmp3 <- model[[notReallyANode]] # 4. tmp3 will be a vector
tmp4 <- model$y[3:4] # 5. hard-coded access to a model variable
# 6. node1 is scalar so can be assigned a scalar:
model[[node1]] <<- runif(1)
model[[node2]][1] <<- runif(1)
# 7. an element of node2 can be assigned a scalar
model[[node2]] <<- runif(length(model[[node2]]))
# 8. a vector can be assigned to the vector node2
model[[node2]][1:3] <<- vals[1:3]
# elements of node2 can be indexed as needed
returnType(double(1))
out <- model[[node2]] # we can return a vector
return(out)
}
)
Rnf <- nfGen(m)
Cnf <- compileNimble(Rnf)
Cnf$run(rnorm(10))
```

`## [1] -1.1339167 0.4213353 -0.9245563 0.5280435 0.5232784`

Use of `[[ ]]`

allows one to programmatically access a node based on a character variable containing the node name; this character variable would generally be set in setup code. In contrast, use of `$`

hard codes the variable name and would not generally be suitable for nimbleFunctions intended for use with arbitrary models.

#### 15.4.2.2 Getting and setting more than one model node or variable at a time using `values`

Sometimes it is useful to set a collection of nodes or variables at one time. For example, one might want a nimbleFunction that will serve as the objective function for an optimizer. The input to the nimbleFunction would be a vector, which should be used to fill a collection of nodes in the model before calculating their log probabilities. This can be done using `values`

:

```
# get values from a set of model nodes into a vector
P <- values(model, nodes)
# or put values from a vector into a set of model nodes
values(model, nodes) <- P
```

where the first line would assign the collection of values from `nodes`

into `P`

, and the second would do the inverse. In both cases, values from nodes with two or more dimensions are flattened into a vector in column-wise order.

`values(model, nodes)`

may be used as a vector in other expressions, e.g.,

`Y <- A %*% values(model, nodes) + b`

One can also index elements of nodes in the argument to values, in the same manner as discussed for `calculate`

and related functions in Section 15.4.1.

Note again the potential for confusion between scalars and vectors of length 1. `values`

returns a vector and expects a vector when used on the left-hand side of an assignment. If only a single value is being assigned, it must be a vector of length 1, not a scalar. This can be achieved by wrapping a scalar in `c()`

when necessary. For example:

```
# c(rnorm(1)) creates vector of length one:
values(model, nodes[1]) <- c(rnorm(1))
# won't compile because rnorm(1) is a scalar
# values(model, nodes[1]) <- rnorm(1)
out <- values(model, nodes[1]) # out is a vector
out2 <- values(model, nodes[1])[1] # out2 is a scalar
```

### 15.4.3 Getting parameter values and node bounds

Sections 13.2.3-13.2.4 describe how to get the parameter values for a node and the range (bounds) of possible values for the node using `getParam`

and `getBound`

. Both of these can be used in run code.

### 15.4.4 Using modelValues objects

The `modelValues`

structure was introduced in Section 14.1. Inside nimbleFunctions, modelValues are designed to easily save values from a model object during the running of a nimbleFunction. A modelValues object used in run code must always exist in the setup code, either by passing it in as a setup argument or creating it in the setup code.

To illustrate this, we will create a nimbleFunction for computing importance weights for importance sampling. This function will use two modelValues objects. `propModelValues`

will contain a set of values simulated from the importance sampling distribution and a field `propLL`

for their log probabilities (densities). `savedWeights`

will contain the difference in log probability (density) between the model and the `propLL`

value provided for each set of values.

```
# Accepting modelValues as a setup argument
swConf <- modelValuesConf(vars = "w",
types = "double",
sizes = 1)
setup = function(propModelValues, model, savedWeightsConf){
# Building a modelValues in the setup function
savedWeights <- modelValues(conf = savedWeightsConf)
# List of nodes to be used in run function
modelNodes <- model$getNodeNames(stochOnly = TRUE,
includeData = FALSE)
}
```

The simplest way to pass values back and forth between models and modelValues inside of a nimbleFunction is with `copy`

, which has the synonym `nimCopy`

. See `help(nimCopy)`

for argument details.

Alternatively, the values may be accessed via indexing of individual rows, using the notation `mv[var, i]`

, where `mv`

is a modelValues object, `var`

is a variable name (not a node name), and `i`

is a row number. Likewise, the `getsize`

and `resize`

functions can be used as discussed in Section 14.1. However the function `as.matrix`

does not work in run code.

Here is a run function to use these modelValues:

```
run = function(){
# gets the number of rows of propSamples
m <- getsize(propModelValues)
# resized savedWeights to have the proper rows
resize(savedWeights, m)
for(i in 1:m){
# Copying from propSamples to model.
# Node names of propSamples and model must match!
nimCopy(from = propModelValues, to = model, row = i,
nodes = modelNodes, logProb = FALSE)
# calculates the log likelihood of the model
targLL <- model$calculate()
# retreaves the saved log likelihood from the proposed model
propLL <- propModelValues["propLL",i][1]
# saves the importance weight for the i-th sample
savedWeights["w", i][1] <<- exp(targLL - propLL)
}
# does not return anything
}
```

Once the nimbleFunction is built, the modelValues object can be accessed using `$`

, which is shown in more detail below. In fact, since modelValues, like most NIMBLE objects, are reference class objects, one can get a reference to them before the function is executed and then use that reference afterwards.

```
# simple model and modelValues for example use with code above
targetModelCode <- nimbleCode({
x ~ dnorm(0,1)
for(i in 1:4)
y[i] ~ dnorm(0,1)
})
# code for proposal model
propModelCode <- nimbleCode({
x ~ dnorm(0,2)
for(i in 1:4)
y[i] ~ dnorm(0,2)
})
# creating the models
targetModel = nimbleModel(targetModelCode, check = FALSE)
propModel = nimbleModel(propModelCode, check = FALSE)
cTargetModel = compileNimble(targetModel)
cPropModel = compileNimble(propModel)
sampleMVConf = modelValuesConf(vars = c("x", "y", "propLL"),
types = c("double", "double", "double"),
sizes = list(x = 1, y = 4, propLL = 1) )
sampleMV <- modelValues(sampleMVConf)
# nimbleFunction for generating proposal sample
PropSamp_Gen <- nimbleFunction(
setup = function(mv, propModel){
nodeNames <- propModel$getNodeNames()
},
run = function(m = integer() ){
resize(mv, m)
for(i in 1:m){
propModel$simulate()
nimCopy(from = propModel, to = mv, nodes = nodeNames, row = i)
mv["propLL", i][1] <<- propModel$calculate()
}
}
)
# nimbleFunction for calculating importance weights
# uses setup and run functions as defined in previous code chunk
impWeights_Gen <- nimbleFunction(setup = setup,
run = run)
# making instances of nimbleFunctions
# note that both functions share the same modelValues object
RPropSamp <- PropSamp_Gen(sampleMV, propModel)
RImpWeights <- impWeights_Gen(sampleMV, targetModel, swConf)
# compiling
CPropSamp <- compileNimble(RPropSamp, project = propModel)
CImpWeights <- compileNimble(RImpWeights, project = targetModel)
# generating and saving proposal sample of size 10
CPropSamp$run(10)
```

`## NULL`

```
# calculating the importance weights and saving to mv
CImpWeights$run()
```

`## NULL`

```
# retrieving the modelValues objects
# extracted objects are C-based modelValues objects
savedPropSamp_1 = CImpWeights$propModelValues
savedPropSamp_2 = CPropSamp$mv
# Subtle note: savedPropSamp_1 and savedPropSamp_2
# both provide interface to the same compiled modelValues objects!
# This is because they were both built from sampleMV.
savedPropSamp_1["x",1]
```

`## [1] 0.5615461`

`savedPropSamp_2["x",1]`

`## [1] 0.5615461`

```
savedPropSamp_1["x",1] <- 0 # example of directly setting a value
savedPropSamp_2["x",1]
```

`## [1] 0`

```
# viewing the saved importance weights
savedWeights <- CImpWeights$savedWeights
unlist(savedWeights[["w"]])
```

```
## [1] 0.3159913 0.3114850 1.3655582 0.3969088 41.3934032 0.2732388
## [7] 0.7972150 1.5456142 0.4199902 1.8547728
```

```
# viewing first 3 rows -- note that savedPropSsamp_1["x", 1] was altered
as.matrix(savedPropSamp_1)[1:3, ]
```

```
## propLL[1] x[1] y[1] y[2] y[3] y[4]
## [1,] -4.023479 0.0000000 -0.4634227 -0.4790402 0.1299793 0.6206335
## [2,] -3.994752 -0.2773491 -0.4730179 -0.7142721 -0.5309174 -0.2005022
## [3,] -6.950687 -0.8243472 0.4887699 -1.4541917 -0.1090178 -1.0216946
```

Importance sampling could also be written using simple vectors for the weights, but we illustrated putting them in a modelValues object along with model variables.

### 15.4.5 Using model variables and modelValues in expressions

Each way of accessing a variable, node, or modelValues can be used amidst mathematical expressions, including with indexing, or passed to another nimbleFunction as an argument. For example, the following two statements would be valid:

`model[["x[2:8, ]"]][2:4, 1:3] %*% Z`

if Z is a vector or matrix, and

`C[6:10] <- mv[v, i][1:5, k] + B`

if B is a vector or matrix.

The NIMBLE language allows scalars, but models defined from BUGS code are never created as purely scalar nodes. Instead, a single node such as defined by `z ~ dnorm(0, 1)`

is implemented as a vector of length 1, similar to R. When using `z`

via `model$z`

or `model[["z"]]`

, NIMBLE will try to do the right thing by treating this as a scalar. In the event of problems^{30}, a more explicit way to access `z`

is `model$z[1]`

or `model[["z"]][1]`

.

### 15.4.6 Including other methods in a nimbleFunction

Other methods can be included with the `methods`

argument to `nimbleFunction`

. These methods can use the objects created in setup code in just the same ways as the run function. In fact, the run function is just a default main method name. Any method can then call another method.

```
methodsDemo <- nimbleFunction(
setup = function() {sharedValue <- 1},
run = function(x = double(1)) {
print("sharedValues = ", sharedValue, "\n")
increment()
print("sharedValues = ", sharedValue, "\n")
A <- times(5)
return(A * x)
returnType(double(1))
},
methods = list(
increment = function() {
sharedValue <<- sharedValue + 1
},
times = function(factor = double()) {
return(factor * sharedValue)
returnType(double())
}))
methodsDemo1 <- methodsDemo()
methodsDemo1$run(1:10)
```

```
## sharedValues = 1
##
## sharedValues = 2
```

`## [1] 10 20 30 40 50 60 70 80 90 100`

```
methodsDemo1$sharedValue <- 1
CmethodsDemo1 <- compileNimble(methodsDemo1)
CmethodsDemo1$run(1:10)
```

```
## sharedValues = 1
##
## sharedValues = 2
```

`## [1] 10 20 30 40 50 60 70 80 90 100`

### 15.4.7 Using other nimbleFunctions

One nimbleFunction can use another nimbleFunction that was passed to it as a setup argument or was created in the setup function. This can be an effective way to program. When a nimbleFunction needs to access a setup variable or method of another nimbleFunction, use `$`

.

```
usePreviousDemo <- nimbleFunction(
setup = function(initialSharedValue) {
myMethodsDemo <- methodsDemo()
},
run = function(x = double(1)) {
myMethodsDemo$sharedValue <<- initialSharedValue
print(myMethodsDemo$sharedValue)
A <- myMethodsDemo$run(x[1:5])
print(A)
B <- myMethodsDemo$times(10)
return(B)
returnType(double())
})
usePreviousDemo1 <- usePreviousDemo(2)
usePreviousDemo1$run(1:10)
```

```
## 2
## sharedValues = 2
##
## sharedValues = 3
##
## 15 30 45 60 75
```

`## [1] 30`

```
CusePreviousDemo1 <- compileNimble(usePreviousDemo1)
CusePreviousDemo1$run(1:10)
```

```
## 2
## sharedValues = 2
##
## sharedValues = 3
##
## 15
## 30
## 45
## 60
## 75
```

`## [1] 30`

### 15.4.8 Virtual nimbleFunctions and nimbleFunctionLists

Often it is useful for one nimbleFunction to have a list of other nimbleFunctions, all of whose methods have the same arguments and return types. For example, NIMBLE’s MCMC engine contains a list of samplers that are each nimbleFunctions.

To make such a list, NIMBLE provides a way to declare the arguments and return types of methods: virtual nimbleFunctions created by `nimbleFunctionVirtual`

. Other nimbleFunctions can inherit from virtual nimbleFunctions, which in R is called ‘containing’ them. Readers familiar with object oriented programming will recognize this as a simple class inheritance system. In Version 0.6.12 it is limited to simple, single-level inheritance.

Here is how it works:

```
baseClass <- nimbleFunctionVirtual(
run = function(x = double(1)) {returnType(double())},
methods = list(
foo = function() {returnType(double())}
))
derived1 <- nimbleFunction(
contains = baseClass,
setup = function(){},
run = function(x = double(1)) {
print("run 1")
return(sum(x))
returnType(double())
},
methods = list(
foo = function() {
print("foo 1")
return(rnorm(1, 0, 1))
returnType(double())
}))
derived2 <- nimbleFunction(
contains = baseClass,
setup = function(){},
run = function(x = double(1)) {
print("run 2")
return(prod(x))
returnType(double())
},
methods = list(
foo = function() {
print("foo 2")
return(runif(1, 100, 200))
returnType(double())
}))
useThem <- nimbleFunction(
setup = function() {
nfl <- nimbleFunctionList(baseClass)
nfl[[1]] <- derived1()
nfl[[2]] <- derived2()
},
run = function(x = double(1)) {
for(i in seq_along(nfl)) {
print( nfl[[i]]$run(x) )
print( nfl[[i]]$foo() )
}
}
)
useThem1 <- useThem()
set.seed(1)
useThem1$run(1:5)
```

```
## run 1
## 15
## foo 1
## -0.6264538
## run 2
## 120
## foo 2
## 157.2853
```

```
CuseThem1 <- compileNimble(useThem1)
set.seed(1)
CuseThem1$run(1:5)
```

```
## run 1
## 15
## foo 1
## -0.626454
## run 2
## 120
## foo 2
## 157.285
```

`## NULL`

One can also use `seq_along`

with nimbleFunctionLists (and only with nimbleFunctionLists). As in R, `seq_along(myFunList)`

is equivalent to `1:length(myFunList)`

if the length of `myFunList`

is greater than zero. It is an empty sequence if the length is zero.

Virtual nimbleFunctions cannot define setup values to be inherited.

### 15.4.9 Character objects

NIMBLE provides limited uses of character objects in run code. Character vectors created in setup code will be available in run code, but the only thing you can really do with them is include them in a `print`

or `stop`

statement.

Note that character vectors of model node and variable names are processed during compilation. For example, in `model[[node]]`

, `node`

may be a character object, and the NIMBLE compiler processes this differently than `print("The node name was ", node)`

. In the former, the NIMBLE compiler sets up a C++ pointer directly to the `node`

in the `model`

, so that the character content of `node`

is never needed in C++. In the latter, `node`

is used as a C++ string and therefore is needed in C++.

### 15.4.10 User-defined data structures

Before the introduction of nimbleLists in Version 0.6-4, NIMBLE did not explicitly have user-defined data structures. An alternative way to create a data structure in NIMBLE is to use nimbleFunctions to achieve a similar effect. To do so, one can define setup code with whatever variables are wanted and ensure they are compiled using `setupOutputs`

. Here is an example:

```
dataNF <- nimbleFunction(
setup = function() {
X <- 1
Y <- as.numeric(c(1, 2))
Z <- matrix(as.numeric(1:4), nrow = 2)
setupOutputs(X, Y, Z)
})
useDataNF <- nimbleFunction(
setup = function(myDataNF) {},
run = function(newX = double(), newY = double(1), newZ = double(2)) {
myDataNF$X <<- newX
myDataNF$Y <<- newY
myDataNF$Z <<- newZ
})
myDataNF <- dataNF()
myUseDataNF <- useDataNF(myDataNF)
myUseDataNF$run(as.numeric(100), as.numeric(100:110),
matrix(as.numeric(101:120), nrow = 2))
myDataNF$X
```

`## [1] 100`

`myDataNF$Y`

`## [1] 100 101 102 103 104 105 106 107 108 109 110`

`myDataNF$Z`

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 101 103 105 107 109 111 113 115 117 119
## [2,] 102 104 106 108 110 112 114 116 118 120
```

`myUseDataNF$myDataNF$X`

`## [1] 100`

```
nimbleOptions(buildInterfacesForCompiledNestedNimbleFunctions = TRUE)
CmyUseDataNF <- compileNimble(myUseDataNF)
CmyUseDataNF$run(-100, -(100:110), matrix(-(101:120), nrow = 2))
```

`## NULL`

```
CmyDataNF <- CmyUseDataNF$myDataNF
CmyDataNF$X
```

`## [1] -100`

`CmyDataNF$Y`

`## [1] -100 -101 -102 -103 -104 -105 -106 -107 -108 -109 -110`

`CmyDataNF$Z`

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] -101 -103 -105 -107 -109 -111 -113 -115 -117 -119
## [2,] -102 -104 -106 -108 -110 -112 -114 -116 -118 -120
```

You’ll notice that:

- After execution of the compiled function, access to the
`X`

,`Y`

, and`Z`

is the same as for the uncompiled case. This occurs because`CmyUseDataNF`

is an interface to the compiled version of`myUseDataNF`

, and it provides access to member objects and functions. In this case, one member object is`myDataNF`

, which is an interface to the compiled version of`myUseDataNF$myDataNF`

, which in turn provides access to`X`

,`Y`

, and`Z`

. To reduce memory use, NIMBLE defaults to*not*providing full interfaces to nested nimbleFunctions like`myUseDataNF$myDataNF`

. In this example we made it provide full interfaces by setting the`buildInterfacesForCompiledNestedNimbleFunctions`

option via`nimbleOptions`

to TRUE. If we had left that option FALSE (its default value), we could still get to the values of interest using`valueInCompiledNimbleFunction(CmyDataNF, 'X')`

- We need to take care that at the time of compilation, the
`X`

,`Y`

and`Z`

values contain doubles via`as.numeric`

so that they are not compiled as integer objects. - The
`myDataNF`

could be created in the setup code. We just provided it as a setup argument to illustrate that option.

## 15.5 Example: writing user-defined samplers to extend NIMBLE’s MCMC engine

One important use of nimbleFunctions is to write additional samplers that can be used in NIMBLE’s MCMC engine. This allows a user to write a custom sampler for one or more nodes in a model, as well as for programmers to provide general samplers for use in addition to the library of samplers provided with NIMBLE.

The following code illustrates how a NIMBLE developer would implement and use a Metropolis-Hastings random walk sampler with fixed proposal standard deviation.

```
my_RW <- nimbleFunction(
contains = sampler_BASE,
setup = function(model, mvSaved, target, control) {
# proposal standard deviation
scale <- if(!is.null(control$scale)) control$scale else 1
calcNodes <- model$getDependencies(target)
},
run = function() {
# initial model logProb
model_lp_initial <- getLogProb(model, calcNodes)
# generate proposal
proposal <- rnorm(1, model[[target]], scale)
# store proposal into model
model[[target]] <<- proposal
# proposal model logProb
model_lp_proposed <- calculate(model, calcNodes)
# log-Metropolis-Hastings ratio
log_MH_ratio <- model_lp_proposed - model_lp_initial
# Metropolis-Hastings step: determine whether or
# not to accept the newly proposed value
u <- runif(1, 0, 1)
if(u < exp(log_MH_ratio)) jump <- TRUE
else jump <- FALSE
# keep the model and mvSaved objects consistent
if(jump) copy(from = model, to = mvSaved, row = 1,
nodes = calcNodes, logProb = TRUE)
else copy(from = mvSaved, to = model, row = 1,
nodes = calcNodes, logProb = TRUE)
},
methods = list( reset = function () {} )
)
```

The name of this sampler function, for the purposes of using it in an

MCMC algorithm, is `my_RW`

. Thus, this sampler can be added to an exisiting MCMC configuration object `conf`

using:

```
mcmcConf$addSampler(target = 'x', type = 'my_RW',
control = list(scale = 0.1))
```

To be used within the MCMC engine, sampler functions definitions must adhere exactly to the following:

- The nimbleFunction must include the contains statement
`contains = sampler_BASE`

. - The
`setup`

function must have the four arguments`model, mvSaved, target, control`

, in that order. - The
`run`

function must accept no arguments, and have no return value. Further, after execution it must leave the`mvSaved`

modelValues object as an up-to-date copy of the values and logProb values in the model object. - The nimbleFunction must have a member method called
`reset`

, which takes no arguments and has no return value.

The purpose of the `setup`

function is generally two-fold. First, to extract control parameters from the `control`

list; in the example, the proposal standard deviation `scale`

. It is good practice to specify default values for any control parameters that are not provided in the `control`

argument, as done in the example. Second, to generate any sets of nodes needed in the `run`

function. In many sampling algorithms, as here, `calcNodes`

is used to represent the target node(s) and dependencies up to the first layer of stochastic nodes, as this is precisely what is required for calculating the Metropolis-Hastings acceptance probability. These probability calculations are done using `model$calculate(calcNodes)`

.

In the `run`

function, the `mvSaved`

modelValues object is kept up-to-date with the current state of the model, depending on whether the proposed changed was accepted. This is done using the `copy`

function, to copy values between the model and `mvSaved`

objects.

## 15.6 Copying nimbleFunctions (and NIMBLE models)

NIMBLE relies heavily on R’s reference class system. When models, modelValues, and nimbleFunctions with setup code are created, NIMBLE generates a new, customized reference class definition for each. As a result, objects of these types are passed by reference and hence modified in place by most NIMBLE operations. This is necessary to avoid a great deal of copying and returning and having to reassign large objects, both in processing models and nimbleFunctions and in running algorithms.

One cannot generally copy NIMBLE models or nimbleFunctions (specializations or generators) in a safe fashion, because of the references to other objects embedded within NIMBLE objects. However, the model member function `newModel`

will create a new copy of the model from the same model definition (Section 6.1.3). This new model can then be used with newly instantiated nimbleFunctions.

The reliable way to create new copies of nimbleFunctions is to re-run the R function called `nimbleFunction`

and record the result in a new object. For example, say you have a `nimbleFunction`

called `foo`

and 1000 instances of `foo`

are compiled as part of an algorithm related to a model called `model1`

. If you then need to use `foo`

in an algorithm for another model, `model2`

, doing so may work without any problems. However, there are cases where the NIMBLE compiler will tell you during compilation that the second set of `foo`

instances cannot be built from the previous compiled version. A solution is to re-define `foo`

from the beginning – i.e. call `nimbleFunction`

again – and then proceed with building and compiling the algorithm for `model2`

.

## 15.7 Debugging nimbleFunctions

One of the main reasons that NIMBLE provides an R (uncompiled) version of each nimbleFunction is for debugging. One can call `debug`

on nimbleFunction methods (in particular the main *run* method, e.g., `debug(mynf$run`

) and then step through the code in R using R’s debugger. One can also insert `browser`

calls into run code and then run the nimbleFunction from R.

In contrast, directly debugging a compiled nimbleFunction is difficult, although those familiar with running R through a debugger and accessing the underlying C code may be able to operate similarly with NIMBLE code. We often resort to using `print`

statements for debugging compiled code. Expert users fluent in C++ may also try setting `nimbleOptions(pauseAfterWritingFiles = TRUE)`

and adding debugging code into the generated C++ files.

## 15.8 Timing nimbleFunctions with *run.time*

If your nimbleFunctions are correct but slow to run, you can use benchmarking tools to look for bottlenecks and to compare different implementations. If your functions are very long-running (say 1ms or more), then standard R benchmarking tools may suffice, e.g. the `microbenchmark`

package

```
library(microbenchmark)
microbenchmark(myCompiledFunVersion1(1.234),
myCompiledFunVersion2(1.234)) # Beware R <--> C++ overhead!
```

If your nimbleFunctions are very fast, say under 1ms, then `microbenchmark`

will be inaccurate due to R-to-C++ conversion overhead (that won’t happen in your actual functions). To get timing information in C++, NIMBLE provides a `run.time`

function that avoids the R-to-C++ overhead.

```
myMicrobenchmark <- compileNimble(nimbleFunction(
run = function(iters = integer(0)){
time1 <- run.time({
for (t in 1:iters) myCompiledFunVersion1(1.234)
})
time2 <- run.time({
for (t in 1:iters) myCompiledFunVersion2(1.234)
})
return(c(time1, time2))
returnType(double(1))
}))
print(myMicroBenchmark(100000))
```

## 15.9 Reducing memory usage

NIMBLE can create a lot of objects in its processing, and some of them use R features such as reference classes that are heavy in memory usage. We have noticed that building large models can use lots of memory. To help alleviate this, we provide two options, which can be controlled via `nimbleOptions`

.

As noted above, the option `buildInterfacesForCompiledNestedNimbleFunctions`

defaults to FALSE, which means NIMBLE will not build full interfaces to compiled nimbleFunctions that ony appear within other nimbleFunctions. If you want access to all such nimbleFunctions, use the option `buildInterfacesForCompiledNestedNimbleFunctions = TRUE`

.

The option `clearNimbleFunctionsAfterCompiling`

is more drastic, and it is experimental, so ‘buyer beware’. This will clear much of the contents of an uncompiled nimbleFunction object after it has been compiled in an effort to free some memory. We expect to be able to keep making NIMBLE more efficient – faster execution and lower memory use – in the future.

Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov Chain Monte Carlo Methods.” *Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 72 (3): 269–342. doi:10.1111/j.1467-9868.2009.00736.x.

Banerjee, S., B.P. Carlin, and A.E. Gelfand. 2015. *Hierarchical Modeling and Analysis for Spatial Data*. 2nd ed. Boca Raton: Chapman & Hall.

Blackwell, D, and J MacQueen. 1973. “Ferguson Distributions via Pólya Urn Schemes.” *The Annals of Statistics* 1: 353–55.

Escobar, M D. 1994. “Estimating normal means with a Dirichlet process prior.” *Journal of the American Statistical Association* 89: 268–77.

Escobar, M D, and M West. 1995. “Bayesian density estimation and inference using mixtures.” *Journal of the American Statistical Association* 90: 577–88.

Ferguson, T S. 1973. “A Bayesian Analysis of Some Nonparametric Problems.” *Annals of Statistics* 1: 209–30.

———. 1974. “Prior Distribution on the Spaces of Probability Measures.” *Annals of Statistics* 2: 615–29.

George, E.I., U.E. Makov, and A.F.M. Smith. 1993. “Conjugate Likelihood Distributions.” *Scandinavian Journal of Statistics* 20 (2): 147–56.

Ishwaran, H, and L F James. 2002. “Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information.” *Journal of Computational and Graphical Statistics* 11: 508–32.

Ishwaran, Hemant, and Lancelot F James. 2001. “Gibbs Sampling Methods for Stick-Breaking Priors.” *Journal of the American Statistical Association* 96 (453). Taylor & Francis: 161–73.

Lo, A Y. 1984. “On a Class of Bayesian Nonparametric Estimates I: Density Estimates.” *The Annals of Statistics* 12: 351–57.

Lunn, David, David Spiegelhalter, Andrew Thomas, and Nicky Best. 2009. “The BUGS Project: Evolution, Critique and Future Directions.” *Statistics in Medicine* 28 (25): 3049–67. doi:10.1002/sim.3680.

Neal, R. 2000. “Markov chain sampling methods for Dirichlet process mixture models.” *Journal of Computational and Graphical Statistics* 9: 249–65.

Neal, Radford M. 2003. “Slice Sampling.” *The Annals of Statistics* 31 (3): 705–41.

Paciorek, C.J. 2009. “Understanding Intrinsic Gaussian Markov Random Field Spatial Models, Including Intrinsic Conditional Autoregressive Models.” University of California, Berkeley. http://www.stat.berkeley.edu/~paciorek/research/techVignettes/techVignette5.pdf.

Roberts, G. O., and S. K. Sahu. 1997. “Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler.” *Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 59 (2): 291–317. doi:10.1111/1467-9868.00070.

Rue, H., and L. Held. 2005. *Gaussian Markov Random Fields: Theory and Applications*. Boca Raton: Chapman & Hall.

Sethuraman, J. 1994. “A Constructive Definition of Dirichlet Prior.” *Statistica Sinica* 2: 639–50.

Watanabe, Sumio. 2010. “Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory.” *Journal of Machine Learning Research* 11 (Dec): 3571–94.

Normally this is the value of the last evaluated code, or the argument to

`return`

.↩This can be omitted if you don’t need it.↩

Note the use of the global assignment operator to assign into the model. This is necessary for assigning into variables from the

`setup`

function, at least if you want to avoid warnings from R. These warnings come from R’s reference class system.↩Actually, they can be, but only for uncompiled nimbleFunctions.↩

Please tell us!↩