# Chapter 6 Building and using models

This chapter explains how to build and manipulate model objects starting from BUGS code.

## 6.1 Creating model objects

NIMBLE provides two functions for creating model objects: `nimbleModel`

and `readBUGSmodel`

. The first, `nimbleModel`

, is more general and was illustrated in Chapter 2. The second, `readBUGSmodel`

provides compatibility with BUGS file formats for models, variables, data, and initial values for MCMC.

In addition one can create new model objects from existing model objects.

### 6.1.1 Using *nimbleModel* to create a model

`nimbleModel`

processes BUGS code to determine all the nodes, variables, and their relationships in a model. Any constants must be provided at this step. Data and initial values can optionally be provided. BUGS code passed to `nimbleModel`

must go through `nimbleCode`

.

We look again at the pump example from the introduction:

```
pumpCode <- nimbleCode({
for (i in 1:N){
theta[i] ~ dgamma(alpha,beta);
lambda[i] <- theta[i]*t[i];
x[i] ~ dpois(lambda[i])
}
alpha ~ dexp(1.0);
beta ~ dgamma(0.1,1.0);
})
pumpConsts <- list(N = 10,
t = c(94.3, 15.7, 62.9, 126, 5.24,
31.4, 1.05, 1.05, 2.1, 10.5))
pumpData <- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))
pumpInits <- list(alpha = 1, beta = 1,
theta = rep(0.1, pumpConsts$N))
pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
data = pumpData, inits = pumpInits)
```

#### 6.1.1.1 Data and constants

NIMBLE makes a distinction between data and constants:

*Constants*can never be changed and must be provided when a model is defined. For example, a vector of known index values, such as for block indices, helps define the model graph itself and must be provided as constants. Variables used in the index ranges of for-loops must also be provided as constants.*Data*is a label for the role a node plays in the model. Nodes marked as data will by default be protected from any functions that would simulate over their values (see`simulate`

in Chapter 13), but it is possible to over-ride that default or to change their values by direct assignment. This allows an algorithm to be applied to many data sets in the same model without re-creating the model each time. It also allows simulation of data in a model.

WinBUGS, OpenBUGS and JAGS do not allow data values to be changed or different nodes to be labeled as data without starting from the beginning again. Hence they do not distinguish between constants and data.

For compatibility with BUGS and JAGS, NIMBLE allows both to be provided in the `constants`

argument to `nimbleModel`

, in which case NIMBLE handles values for stochastic nodes as data and everything else as constants.

Values for nodes that appear only on the right-hand side of BUGS declarations (e.g., covariates/predictors) can be provided as constants or as data or initial values. There is no real difference between providing as data or initial values and the values can be added after building a model via `setInits`

or `setData`

.

#### 6.1.1.2 Providing data and initial values to an existing model

Whereas constants must be provided during the call to `nimbleModel`

, data and initial values can be provided later via the model member functions `setData`

and `setInits`

. For example, if `pumpData`

is a named list of data values (as above), then `pump$setData(pumpData)`

sets the named variables to the values in the list.

`setData`

does two things: it sets the values of the data nodes, and it flags those nodes as containing data. nimbleFunction programmers can then use that information to control whether an algorithm should over-write data or not. For example, NIMBLE’s `simulate`

functions by default do not overwrite data values but can be told to do so. Values of data variables can be replaced, and the indication of which nodes should be treated as data can be reset by using the `resetData`

method, e.g. `pump$resetData()`

.

#### 6.1.1.3 Missing data values

Sometimes one needs a model variable to have a mix of data and non-data, often due to missing data values. In NIMBLE, when data values are provided, any nodes with `NA`

values will *not* be labeled as data. A node following a multivariate distribution must be either entirely observed or entirely missing.

Here’s an example of running an MCMC on the *pump* model, with two of the observations taken to be missing. Some of the steps in this example are documented more below. NIMBLE’s default MCMC configuration will treat the missing values as unknowns to be sampled, as can be seen in the MCMC output here.

```
pumpMiss <- pump$newModel()
pumpMiss$resetData()
pumpDataNew <- pumpData
pumpDataNew$x[c(1, 3)] <- NA
pumpMiss$setData(pumpDataNew)
pumpMissConf <- configureMCMC(pumpMiss)
pumpMissConf$addMonitors('x', 'alpha', 'beta', 'theta')
```

`## thin = 1: alpha, beta, x, theta`

```
pumpMissMCMC <- buildMCMC(pumpMissConf)
Cobj <- compileNimble(pumpMiss, pumpMissMCMC)
niter <- 10
set.seed(0)
Cobj$pumpMissMCMC$run(niter)
```

`## NULL`

```
samples <- as.matrix(Cobj$pumpMissMCMC$mvSamples)
samples[1:5, 13:17]
```

```
## x[1] x[2] x[3] x[4] x[5]
## [1,] 17 1 2 14 3
## [2,] 11 1 4 14 3
## [3,] 14 1 9 14 3
## [4,] 11 1 24 14 3
## [5,] 9 1 29 14 3
```

Missing values may also occur in explanatory/predictor variables. Values for such variables should be passed in via the `data`

argument to `nimbleModel`

, with `NA`

for the missing values. In some contexts, one would want to specify distributions for such explanatory variables, for example so that an MCMC would impute the missing values.

#### 6.1.1.4 Defining alternative models with the same code

Avoiding code duplication is a basic principle of good programming. In NIMBLE, one can use definition-time if-then-else statements to create different models from the same code. As a simple example, say we have a linear regression model and want to consider including or omitting `x[2]`

as an explanatory variable:

```
regressionCode <- nimbleCode({
intercept ~ dnorm(0, sd = 1000)
slope1 ~ dnorm(0, sd = 1000)
if(includeX2) {
slope2 ~ dnorm(0, sd = 1000)
for(i in 1:N)
predictedY[i] <- intercept + slope1 * x1[i] + slope2 * x2[i]
} else {
for(i in 1:N) predictedY[i] <- intercept + slope1 * x1[i]
}
sigmaY ~ dunif(0, 100)
for(i in 1:N) Y[i] ~ dnorm(predictedY[i], sigmaY)
})
includeX2 <- FALSE
modelWithoutX2 <- nimbleModel(regressionCode, constants = list(N = 30),
check=FALSE)
modelWithoutX2$getVarNames()
```

```
## [1] "intercept" "slope1"
## [3] "predictedY" "sigmaY"
## [5] "lifted_d1_over_sqrt_oPsigmaY_cP" "Y"
## [7] "x1"
```

```
includeX2 <- TRUE
modelWithX2 <- nimbleModel(regressionCode, constants = list(N = 30),
check = FALSE)
modelWithX2$getVarNames()
```

```
## [1] "intercept" "slope1"
## [3] "slope2" "predictedY"
## [5] "sigmaY" "lifted_d1_over_sqrt_oPsigmaY_cP"
## [7] "Y" "x1"
## [9] "x2"
```

Whereas the *constants* are a property of the *model definition* – since they may help determine the model structure itself – *data* nodes can be different in different copies of the model generated from the same *model definition*. The `setData`

and `setInits`

described above can be used for each copy of the model.

#### 6.1.1.5 Providing dimensions via *nimbleModel*

`nimbleModel`

can usually determine the dimensions of every variable from the declarations in the BUGS code. However, it is possible to use a multivariate object only with empty indices (e.g. `x[,]`

), in which case the dimensions must be provided as an argument to `nimbleModel`

.

Here’s an example with multivariate nodes. The first provides indices, so no `dimensions`

argument is needed, while the second omits the indices and provides a `dimensions`

argument instead.

```
code <- nimbleCode({
y[1:K] ~ dmulti(p[1:K], n)
p[1:K] ~ ddirch(alpha[1:K])
log(alpha[1:K]) ~ dmnorm(alpha0[1:K], R[1:K, 1:K])
})
K <- 5
model <- nimbleModel(code, constants = list(n = 3, K = K,
alpha0 = rep(0, K), R = diag(K)),
check = FALSE)
codeAlt <- nimbleCode({
y[] ~ dmulti(p[], n)
p[] ~ ddirch(alpha[])
log(alpha[]) ~ dmnorm(alpha0[], R[ , ])
})
model <- nimbleModel(codeAlt, constants = list(n = 3, K = K, alpha0 = rep(0, K),
R = diag(K)),
dimensions = list(y = K, p = K, alpha = K),
check = FALSE)
```

In that example, since `alpha0`

and `R`

are provided as constants, we don’t need to specify their dimensions.

### 6.1.2 Creating a model from standard BUGS and JAGS input files

Users with BUGS and JAGS experience may have files set up in standard formats for use in BUGS and JAGS. `readBUGSmodel`

can read in the model, data/constant values and initial values in those formats. It can also take information directly from R objects somewhat more flexibly than `nimbleModel`

, specifically allowing inputs set up similarly to those for BUGS and JAGS. In either case, after processing the inputs, it calls `nimbleModel`

. Note that unlike BUGS and JAGS, only a single set of initial values can be specified in creating a model. Please see `help(readBUGSmodel)`

for argument details.

As an example of using `readBUGSmodel`

, let’s create a model for the *pump* example from BUGS.

```
pumpDir <- system.file('classic-bugs', 'vol1', 'pump', package = 'nimble')
pumpModel <- readBUGSmodel('pump.bug', data = 'pump-data.R',
inits = 'pump-init.R', dir = pumpDir)
```

`## Detected x as data within 'constants'.`

Note that `readBUGSmodel`

allows one to include `var`

and `data`

blocks in the model file as in some of the BUGS examples (such as `inhaler`

). The `data`

block pre-computes constant and data values. Also note that if `data`

and `inits`

are provided as files, the files should contain R code that creates objects analogous to what would populate the list if a list were provided instead. Please see the JAGS manual examples or the `classic_bugs`

directory in the NIMBLE package for example syntax. NIMBLE by and large does not need the information given in a `var`

block but occasionally this is used to determine dimensionality, such as in the case of syntax like `xbar <- mean(x[])`

where `x`

is a variable that appears only on the right-hand side of BUGS expressions.

Note that NIMBLE does not handle formatting such as in some of the original BUGS examples in which data was indicated with syntax such as `data x in 'x.txt'`

.

### 6.1.3 Making multiple instances from the same model definition

Sometimes it is useful to have more than one copy of the same model. For example, an algorithm (i.e., nimbleFunction) such as an MCMC will be bound to a particular model before it is run. A user could build multiple algorithms to use the same model instance, or they may want each algorithm to have its own instance of the model.

There are two ways to create new instances of a model, shown in this example:

```
simpleCode <- nimbleCode({
for(i in 1:N) x[i] ~ dnorm(0, 1)
})
# Return the model definition only, not a built model
simpleModelDefinition <- nimbleModel(simpleCode, constants = list(N = 10),
returnDef = TRUE, check = FALSE)
# Make one instance of the model
simpleModelCopy1 <- simpleModelDefinition$newModel(check = FALSE)
# Make another instance from the same definition
simpleModelCopy2 <- simpleModelDefinition$newModel(check = FALSE)
# Ask simpleModelCopy2 for another copy of itself
simpleModelCopy3 <- simpleModelCopy2$newModel(check = FALSE)
```

Each copy of the model can have different nodes flagged as data and different values in any nodes. They cannot have different values of `N`

because that is a constant; it must be a constant because it helps define the model.

## 6.2 NIMBLE models are objects you can query and manipulate

NIMBLE models are objects that can be modified and manipulated from R. In this section we introduce some basic ways to use a model object. Chapter 13 covers more topics for writing algorithms that use models.

### 6.2.1 What are variables and nodes?

This section discusses some basic concepts and terminology to be able to speak about NIMBLE models clearly.

Suppose we have created a model from the following BUGS code.

```
mc <- nimbleCode({
a ~ dnorm(0, 0.001)
for(i in 1:5) {
y[i] ~ dnorm(a, sd = 0.1)
for(j in 1:3)
z[i,j] ~ dnorm(y[i], tau)
}
tau ~ dunif(0, 20)
y.squared[1:5] <- y[1:5]^2
})
model <- nimbleModel(mc, data = list(z = matrix(rnorm(15), nrow = 5)))
```

In NIMBLE terminology:

- The
*variables*of this model are`a`

,`y`

,`z`

, and`y.squared`

. - The
*nodes*of this model are`a`

,`y[1]`

\(,\ldots,\)`y[5]`

,`z[1,1]`

\(,\ldots,\)`z[5, 3]`

, and`y.squared[1:5]`

. In graph terminology, nodes are vertices in the model graph. - The
*node functions*of this model are`a ~ dnorm(0, 0.001)`

,`y[i] ~ dnorm(a, 0.1)`

,`z[i,j] ~ dnorm(y[i], sd = 0.1)`

, and`y.squared[1:5] <- y[1:5]^2`

. Each node’s calculations are handled by a node function. Sometimes the distinction between nodes and node functions is important, but when it is not important we may refer to both simply as*nodes*.

- The
*scalar elements*of this model include all the scalar nodes as well as the scalar elements`y.squared[1]`

\(,\ldots,\)`y.squared[5]`

of the multivariate node`y.squared[1:5]`

.

### 6.2.2 Determining the nodes and variables in a model

One can determine the variables in a model using `getVarNames`

and the nodes in a model using `getNodeNames`

. Optional arguments to `getNodeNames`

allow you to select only certain types of nodes, as discussed in Section 13.1.1 and in the R help for `getNodeNames`

.

`model$getVarNames()`

```
## [1] "a" "y"
## [3] "lifted_d1_over_sqrt_oPtau_cP" "z"
## [5] "tau" "y.squared"
```

`model$getNodeNames()`

```
## [1] "a" "tau"
## [3] "y[1]" "y[2]"
## [5] "y[3]" "y[4]"
## [7] "y[5]" "lifted_d1_over_sqrt_oPtau_cP"
## [9] "y.squared[1:5]" "z[1, 1]"
## [11] "z[1, 2]" "z[1, 3]"
## [13] "z[2, 1]" "z[2, 2]"
## [15] "z[2, 3]" "z[3, 1]"
## [17] "z[3, 2]" "z[3, 3]"
## [19] "z[4, 1]" "z[4, 2]"
## [21] "z[4, 3]" "z[5, 1]"
## [23] "z[5, 2]" "z[5, 3]"
```

Note that some of the nodes may be ‘lifted’ nodes introduced by NIMBLE (Section 13.1.2). In this case `lifted_d1_over_sqrt_oPtau_cP`

(this is a node for the standard deviation of the `z`

nodes using NIMBLE’s canonical parameterization of the normal distribution) is the only lifted node in the model.

To determine the dependencies of one or more nodes in the model, you can use `getDependencies`

as discussed in Section 13.1.3.

### 6.2.3 Accessing nodes

Model variables can be accessed and set just as in R using `$`

and `[[ ]]`

. For example

```
model$a <- 5
model$a
```

`## [1] 5`

`model[["a"]]`

`## [1] 5`

```
model$y[2:4] <- rnorm(3)
model$y
```

`## [1] NA -0.9261095 -0.1771040 0.4020118 NA`

```
model[["y"]][c(1, 5)] <- rnorm(2)
model$y
```

`## [1] -0.7317482 -0.9261095 -0.1771040 0.4020118 0.8303732`

`model$z[1,]`

`## [1] -0.3340008 1.2079084 0.5210227`

While nodes that are part of a variable can be accessed as above, each node also has its own name that can be used to access it directly. For example, `y[2]`

has the name ‘y[2]’ and can be accessed by that name as follows:

`model[["y[2]"]]`

`## [1] -0.9261095`

```
model[["y[2]"]] <- -5
model$y
```

`## [1] -0.7317482 -5.0000000 -0.1771040 0.4020118 0.8303732`

`model[["z[2, 3]"]]`

`## [1] -0.1587546`

`model[["z[2:4, 1:2]"]][1, 2]`

`## [1] -1.231323`

`model$z[2, 2]`

`## [1] -1.231323`

Notice that node names can include index blocks, such as `model[["z[2:4, 1:2]"]]`

, and these are not strictly required to correspond to actual nodes. Such blocks can be subsequently sub-indexed in the regular R manner, such as `model[["z[2:4, 1:2]"]][1, 2]`

.

### 6.2.4 How nodes are named

Every node has a name that is a character string including its indices, with a space after every comma. For example, `X[1, 2, 3]`

has the name ‘X[1, 2, 3]’. Nodes following multivariate distributions have names that include their index blocks. For example, a multivariate node for `X[6:10, 3]`

has the name ‘X[6:10, 3]’.

The definitive source for node names in a model is `getNodeNames`

, described previously.

In the event you need to ensure that a name is formatted correctly, you can use the `expandNodeNames`

method. For example, to get the spaces correctly inserted into ‘X[1,1:5]’:

```
multiVarCode <- nimbleCode({
X[1, 1:5] ~ dmnorm(mu[], cov[,])
X[6:10, 3] ~ dmnorm(mu[], cov[,])
})
multiVarModel <- nimbleModel(multiVarCode, dimensions =
list(mu = 5, cov = c(5,5)), calculate = FALSE)
multiVarModel$expandNodeNames("X[1,1:5]")
```

`## [1] "X[1, 1:5]"`

Alternatively, for those inclined to R’s less commonly used features, a nice trick is to use its `parse`

and `deparse`

functions.

`deparse(parse(text = "X[1,1:5]", keep.source = FALSE)[[1]])`

`## [1] "X[1, 1:5]"`

The `keep.source = FALSE`

makes `parse`

more efficient.

### 6.2.5 Why use node names?

Syntax like `model[["z[2, 3]"]]`

may seem strange at first, because the natural habit of an R user would be `model[["z"]][2,3]`

. To see its utility, consider the example of writing the nimbleFunction given in Section 2.8. By giving every scalar node a name, even if it is part of a multivariate variable, one can write functions in R or NIMBLE that access any single node by a name, regardless of the dimensionality of the variable in which it is embedded. This is particularly useful for NIMBLE, which resolves how to access a particular node during the compilation process.

### 6.2.6 Checking if a node holds data

Finally, you can query whether a node is flagged as data using the `isData`

method applied to one or more nodes or nodes within variables:

`model$isData('z[1]')`

`## [1] TRUE`

`model$isData(c('z[1]', 'z[2]', 'a'))`

`## [1] TRUE TRUE FALSE`

`model$isData('z')`

```
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE
```

`model$isData('z[1:3, 1]')`

`## [1] TRUE TRUE TRUE`