# Chapter 6 Building and using models

This chapter explains how to build and manipulate model objects starting from BUGS code.

## 6.1 Creating model objects

NIMBLE provides two functions for creating model objects:
`nimbleModel`

and `readBUGSmodel`

. The first, `nimbleModel`

,
is more general and was illustrated in Chapter 2. The
second, `readBUGSmodel`

provides compatibility with BUGS file
formats for models, variables, data, and initial values for MCMC.

In addition one can create new model objects from existing model objects.

### 6.1.1 Using *nimbleModel* to create a model

`nimbleModel`

processes BUGS code to determine all the nodes,
variables, and their relationships in a model. Any constants must be
provided at this step. Data and initial values can optionally be
provided. BUGS code passed to `nimbleModel`

must go through
`nimbleCode`

.

We look again at the pump example from the introduction:

```
<- nimbleCode({
pumpCode for (i in 1:N){
~ dgamma(alpha,beta);
theta[i] <- theta[i]*t[i];
lambda[i] ~ dpois(lambda[i])
x[i]
}~ dexp(1.0);
alpha ~ dgamma(0.1,1.0);
beta
})
<- list(N = 10,
pumpConsts t = c(94.3, 15.7, 62.9, 126, 5.24,
31.4, 1.05, 1.05, 2.1, 10.5))
<- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))
pumpData
<- list(alpha = 1, beta = 1,
pumpInits theta = rep(0.1, pumpConsts$N))
<- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
pump data = pumpData, inits = pumpInits)
```

#### 6.1.1.1 Data and constants

NIMBLE makes a distinction between data and constants:

*Constants*can never be changed and must be provided when a model is defined. For example, a vector of known index values, such as for block indices, helps define the model graph itself and must be provided as constants. Variables used in the index ranges of for-loops must also be provided as constants.*Data*is a label for the role a node plays in the model. Nodes marked as data will by default be protected from any functions that would simulate over their values (see`simulate`

in Chapter 13), but it is possible to over-ride that default or to change their values by direct assignment. This allows an algorithm to be applied to many data sets in the same model without re-creating the model each time. It also allows simulation of data in a model.

WinBUGS, OpenBUGS and JAGS do not allow data values to be changed or different nodes to be labeled as data without starting from the beginning again. Hence they do not distinguish between constants and data.

For compatibility with BUGS and JAGS, NIMBLE allows both to be
provided in the `constants`

argument to `nimbleModel`

, in
which case NIMBLE handles values for stochastic nodes as data and
everything else as constants.

Values for nodes that appear only on the right-hand side of BUGS
declarations (e.g., covariates/predictors) can be provided as constants or as data or initial values. There is no real difference between providing as data or initial values and the values can be added after building a model via `setInits`

or `setData`

. However if provided as data, calls to `setInits`

will not overwrite those values (though direct assignment of values will overwrite those values).

#### 6.1.1.2 Providing (or changing) data and initial values for an existing model

Whereas constants must be provided during the call to
`nimbleModel`

, data and initial values can be provided later via
the model member functions `setData`

and `setInits`

. For
example, if `pumpData`

is a named list of data values (as above),
then `pump$setData(pumpData)`

sets the named variables to the
values in the list.

`setData`

does two things: it sets the values of the (stochastic) data nodes,
and it flags those nodes as containing data. nimbleFunction
programmers can then use that information to control whether an
algorithm should over-write data or not. For example, NIMBLE’s
`simulate`

functions by default do not overwrite data values but
can be told to do so. Values of data variables can be replaced, and
the indication of which nodes should be treated as data can be reset
by using the `resetData`

method, e.g. `pump$resetData()`

.

Data nodes cannot be deterministic, and using `setData`

on
deterministic nodes (or passing values for deterministic nodes
via the `data`

argument to `nimbleModel`

) will not flag those
nodes as containing data. It will set the values of those nodes,
but that will presumably be overwritten as soon as the nodes
are deterministically calculated.

To change data values without any modification of which nodes are flagged as containing data, simply use R’s usual assignment syntax to assign values in a compiled (or, more rarely, an uncompiled) model, e.g.,

`$y <- c(1.5, 2.5, 1.7) cModel`

This can be useful for running an MCMC with a different dataset of the same size
(and of course the same pattern of missingness, if any) without
having to rebuild and recompile the MCMC, such as in a simulation study.
It is possible to change the data values in a compiled model using `setData`

,
but we don’t recommend doing this because `setData`

won’t modify which nodes are flagged
as containing data in the already-constructed MCMC, thereby potentially introducing confusion.

#### 6.1.1.3 Missing data values

Sometimes one needs a model variable to have a mix of data and
non-data, often due to missing data values. In NIMBLE, when data
values are provided, any nodes with `NA`

values will *not* be
labeled as data. A node following a multivariate distribution must be either entirely observed or entirely missing.

Here’s an example of running an MCMC on the *pump* model, with two
of the observations taken to be missing. Some of the steps in this
example are documented more below. NIMBLE’s default MCMC
configuration will treat the missing values as unknowns to be sampled,
as can be seen in the MCMC output here.

```
<- pump$newModel()
pumpMiss $resetData()
pumpMiss<- pumpData
pumpDataNew $x[c(1, 3)] <- NA
pumpDataNew$setData(pumpDataNew)
pumpMiss
<- configureMCMC(pumpMiss) pumpMissConf
```

```
## ===== Monitors =====
## thin = 1: alpha, beta
## ===== Samplers =====
## RW sampler (1)
## - alpha
## posterior_predictive sampler (2)
## - theta[] (2 elements)
## conjugate sampler (9)
## - beta
## - theta[] (8 elements)
## ===== Comments =====
```

`$addMonitors('x', 'alpha', 'beta', 'theta') pumpMissConf`

`## thin = 1: alpha, beta, theta, x`

```
<- buildMCMC(pumpMissConf)
pumpMissMCMC <- compileNimble(pumpMiss, pumpMissMCMC)
Cobj
<- 10
niter set.seed(0)
$pumpMissMCMC$run(niter)
Cobj<- as.matrix(Cobj$pumpMissMCMC$mvSamples)
samples
1:5, 13:17] samples[
```

```
## x[1] x[2] x[3] x[4] x[5]
## [1,] 12 1 0 14 3
## [2,] 7 1 35 14 3
## [3,] 19 1 4 14 3
## [4,] 61 1 36 14 3
## [5,] 70 1 61 14 3
```

Missing values may also occur in explanatory/predictor variables. Values for
such variables should be passed in via the `data`

argument to
`nimbleModel`

, with `NA`

for the missing values. In some
contexts, one would want to specify distributions for such explanatory
variables, for example so that an MCMC would impute the missing values.

#### 6.1.1.4 Defining alternative models with the same code

Avoiding code duplication is a basic principle of good programming. In
NIMBLE, one can use definition-time if-then-else statements to create
different models from the same code. As a simple example, say we have
a linear regression model and want to consider including or omitting
`x[2]`

as an explanatory variable:

```
<- nimbleCode({
regressionCode ~ dnorm(0, sd = 1000)
intercept ~ dnorm(0, sd = 1000)
slope1 if(includeX2) {
~ dnorm(0, sd = 1000)
slope2 for(i in 1:N)
<- intercept + slope1 * x1[i] + slope2 * x2[i]
predictedY[i] else {
} for(i in 1:N) predictedY[i] <- intercept + slope1 * x1[i]
}~ dunif(0, 100)
sigmaY for(i in 1:N) Y[i] ~ dnorm(predictedY[i], sigmaY)
})
<- FALSE
includeX2 <- nimbleModel(regressionCode, constants = list(N = 30),
modelWithoutX2 check=FALSE)
$getVarNames() modelWithoutX2
```

```
## [1] "intercept" "slope1"
## [3] "predictedY" "sigmaY"
## [5] "lifted_d1_over_sqrt_oPsigmaY_cP" "Y"
## [7] "x1"
```

```
<- TRUE
includeX2 <- nimbleModel(regressionCode, constants = list(N = 30),
modelWithX2 check = FALSE)
$getVarNames() modelWithX2
```

```
## [1] "intercept" "slope1"
## [3] "slope2" "predictedY"
## [5] "sigmaY" "lifted_d1_over_sqrt_oPsigmaY_cP"
## [7] "Y" "x1"
## [9] "x2"
```

Whereas the *constants* are a property of the *model definition*
– since they may help determine the model structure itself –
*data* nodes can be different in different copies of the model
generated from the same *model definition*. The `setData`

and
`setInits`

described above can be used for each copy of the model.

#### 6.1.1.5 Providing dimensions via *nimbleModel*

`nimbleModel`

can usually determine the dimensions of every
variable from the declarations in the BUGS code. However, it is
possible to use a multivariate object only with empty indices
(e.g. `x[,]`

), in which case the dimensions must be provided as an
argument to `nimbleModel`

.

Here’s an example with multivariate nodes. The first provides indices, so
no `dimensions`

argument is needed, while the second omits the
indices and provides a `dimensions`

argument instead.

```
<- nimbleCode({
code 1:K] ~ dmulti(p[1:K], n)
y[1:K] ~ ddirch(alpha[1:K])
p[log(alpha[1:K]) ~ dmnorm(alpha0[1:K], R[1:K, 1:K])
})
<- 5
K <- nimbleModel(code, constants = list(n = 3, K = K,
model alpha0 = rep(0, K), R = diag(K)),
check = FALSE)
<- nimbleCode({
codeAlt ~ dmulti(p[], n)
y[] ~ ddirch(alpha[])
p[] log(alpha[]) ~ dmnorm(alpha0[], R[ , ])
})
<- nimbleModel(codeAlt, constants = list(n = 3, K = K, alpha0 = rep(0, K),
model R = diag(K)),
dimensions = list(y = K, p = K, alpha = K),
check = FALSE)
```

In that example, since `alpha0`

and `R`

are provided as constants, we don’t need to specify their dimensions.

### 6.1.2 Creating a model from standard BUGS and JAGS input files

Users with BUGS and JAGS experience may have files set up in standard
formats for use in BUGS and JAGS. `readBUGSmodel`

can
read in the model, data/constant values and initial values in those
formats. It can also take information directly from R objects
somewhat more flexibly than `nimbleModel`

, specifically allowing
inputs set up similarly to those for BUGS and JAGS.
In either case, after processing the inputs, it calls `nimbleModel`

.
Note that unlike BUGS and JAGS, only a single set of initial values can be
specified in creating a model. Please see `help(readBUGSmodel)`

for argument details.

As an example of using `readBUGSmodel`

, let’s create a model for the *pump* example from BUGS.

```
<- system.file('classic-bugs', 'vol1', 'pump', package = 'nimble')
pumpDir <- readBUGSmodel('pump.bug', data = 'pump-data.R',
pumpModel inits = 'pump-init.R', dir = pumpDir)
```

Note that `readBUGSmodel`

allows one to include `var`

and `data`

blocks in the model file as in some of the BUGS examples (such as `inhaler`

). The `data`

block pre-computes constant and data values. Also note that if `data`

and `inits`

are provided
as files, the files should contain R code that creates objects
analogous to what would populate the list if a list were provided
instead. Please see the JAGS manual examples or the
`classic_bugs`

directory in the NIMBLE package for example
syntax.
NIMBLE by and large does not need the information given in a `var`

block but occasionally this is used to determine dimensionality, such
as in the case of syntax like `xbar <- mean(x[])`

where `x`

is a variable that appears only on the right-hand side of BUGS expressions.

Note that NIMBLE does not handle formatting such as in some of the
original BUGS examples in which data was indicated with syntax such as
`data x in 'x.txt'`

.

### 6.1.3 Making multiple instances from the same model definition

Sometimes it is useful to have more than one copy of the same model. For example, an algorithm (i.e., nimbleFunction) such as an MCMC will be bound to a particular model before it is run. A user could build multiple algorithms to use the same model instance, or they may want each algorithm to have its own instance of the model.

There are two ways to create new instances of a model, shown in this example:

```
<- nimbleCode({
simpleCode for(i in 1:N) x[i] ~ dnorm(0, 1)
})
# Return the model definition only, not a built model
<- nimbleModel(simpleCode, constants = list(N = 10),
simpleModelDefinition returnDef = TRUE, check = FALSE)
# Make one instance of the model
<- simpleModelDefinition$newModel(check = FALSE)
simpleModelCopy1 # Make another instance from the same definition
<- simpleModelDefinition$newModel(check = FALSE)
simpleModelCopy2 # Ask simpleModelCopy2 for another copy of itself
<- simpleModelCopy2$newModel(check = FALSE) simpleModelCopy3
```

Each copy of the model can have different nodes flagged as data and
different values in any nodes. They cannot have different values of
`N`

because that is a constant; it must be a constant because
it helps define the model.

## 6.2 NIMBLE models are objects you can query and manipulate

NIMBLE models are objects that can be modified and manipulated from R. In this section we introduce some basic ways to use a model object. Chapter 13 covers more topics for writing algorithms that use models.

### 6.2.1 What are variables and nodes?

This section discusses some basic concepts and terminology to be able to speak about NIMBLE models clearly.

Suppose we have created a model from the following BUGS code.

```
<- nimbleCode({
mc ~ dnorm(0, 0.001)
a for(i in 1:5) {
~ dnorm(a, sd = 0.1)
y[i] for(j in 1:3)
~ dnorm(y[i], tau)
z[i,j]
}~ dunif(0, 20)
tau 1:5] <- y[1:5]^2
y.squared[
})
<- nimbleModel(mc, data = list(z = matrix(rnorm(15), nrow = 5))) model
```

In NIMBLE terminology:

- The
*variables*of this model are`a`

,`y`

,`z`

, and`y.squared`

. - The
*nodes*of this model are`a`

,`y[1]`

\(,\ldots,\)`y[5]`

,`z[1,1]`

\(,\ldots,\)`z[5, 3]`

, and`y.squared[1:5]`

. In graph terminology, nodes are vertices in the model graph. - The
*node functions*of this model are`a ~ dnorm(0, 0.001)`

,`y[i] ~ dnorm(a, 0.1)`

,`z[i,j] ~ dnorm(y[i], sd = 0.1)`

, and`y.squared[1:5] <- y[1:5]^2`

. Each node’s calculations are handled by a node function. Sometimes the distinction between nodes and node functions is important, but when it is not important we may refer to both simply as*nodes*.

- The
*scalar elements*of this model include all the scalar nodes as well as the scalar elements`y.squared[1]`

\(,\ldots,\)`y.squared[5]`

of the multivariate node`y.squared[1:5]`

.

### 6.2.2 Determining the nodes and variables in a model

One can determine the variables in a model using `getVarNames`

and the nodes in a model using `getNodeNames`

. Optional arguments to `getNodeNames`

allow you to select only certain types of nodes, as discussed in Section 13.1.1 and in the R help for `getNodeNames`

.

`$getVarNames() model`

```
## [1] "a" "y"
## [3] "lifted_d1_over_sqrt_oPtau_cP" "z"
## [5] "tau" "y.squared"
```

`$getNodeNames() model`

```
## [1] "a" "tau"
## [3] "y[1]" "y[2]"
## [5] "y[3]" "y[4]"
## [7] "y[5]" "lifted_d1_over_sqrt_oPtau_cP"
## [9] "y.squared[1:5]" "z[1, 1]"
## [11] "z[1, 2]" "z[1, 3]"
## [13] "z[2, 1]" "z[2, 2]"
## [15] "z[2, 3]" "z[3, 1]"
## [17] "z[3, 2]" "z[3, 3]"
## [19] "z[4, 1]" "z[4, 2]"
## [21] "z[4, 3]" "z[5, 1]"
## [23] "z[5, 2]" "z[5, 3]"
```

Note that some of the nodes may be ‘lifted’ nodes introduced by
NIMBLE (Section 13.1.2). In this case `lifted_d1_over_sqrt_oPtau_cP`

(this is a node for the standard deviation of the `z`

nodes using NIMBLE’s canonical parameterization of the normal distribution) is the only lifted node in the model.

To determine the dependencies of one or more nodes in the model, you can use `getDependencies`

as discussed in Section 13.1.3.

### 6.2.3 Accessing nodes

Model variables can be accessed and set just as in R using `$`

and
`[[ ]]`

. For example

```
$a <- 5
model$a model
```

`## [1] 5`

`"a"]] model[[`

`## [1] 5`

```
$y[2:4] <- rnorm(3)
model$y model
```

`## [1] NA -0.7317482 0.8303732 -1.2080828 NA`

```
"y"]][c(1, 5)] <- rnorm(2)
model[[$y model
```

`## [1] -1.0479844 -0.7317482 0.8303732 -1.2080828 1.4411577`

`$z[1,] model`

`## [1] 2.0752450 0.2199248 -0.7660820`

While nodes that are part of a variable can be accessed as above, each
node also has its own name that can be used to access it directly. For
example, `y[2]`

has the name ‘y[2]’ and can be accessed by that
name as follows:

`"y[2]"]] model[[`

`## [1] -0.7317482`

```
"y[2]"]] <- -5
model[[$y model
```

`## [1] -1.0479844 -5.0000000 0.8303732 -1.2080828 1.4411577`

`"z[2, 3]"]] model[[`

`## [1] -0.4302118`

`"z[2:4, 1:2]"]][1, 2] model[[`

`## [1] -1.46725`

`$z[2, 2] model`

`## [1] -1.46725`

Notice that node names can include index blocks, such as
`model[["z[2:4, 1:2]"]]`

, and these are not strictly required to
correspond to actual nodes. Such blocks can be subsequently
sub-indexed in the regular R manner, such as `model[["z[2:4, 1:2]"]][1, 2]`

.

### 6.2.4 How nodes are named

Every node has a name that is a character string including its
indices, with a space after every comma. For example, `X[1, 2, 3]`

has the name ‘X[1, 2, 3]’. Nodes following multivariate
distributions have names that include their index blocks. For
example, a multivariate node for `X[6:10, 3]`

has the name ‘X[6:10, 3]’.

The definitive source for node names in a model is
`getNodeNames`

, described previously.

In the event you need to ensure that a name is formatted correctly,
you can use the `expandNodeNames`

method. For
example, to get the spaces correctly inserted into ‘X[1,1:5]’:

```
<- nimbleCode({
multiVarCode 1, 1:5] ~ dmnorm(mu[], cov[,])
X[6:10, 3] ~ dmnorm(mu[], cov[,])
X[
})
<- nimbleModel(multiVarCode, dimensions =
multiVarModel list(mu = 5, cov = c(5,5)), calculate = FALSE)
$expandNodeNames("X[1,1:5]") multiVarModel
```

`## [1] "X[1, 1:5]"`

Alternatively, for those inclined to R’s less commonly used features, a nice trick is
to use its `parse`

and `deparse`

functions.

`deparse(parse(text = "X[1,1:5]", keep.source = FALSE)[[1]])`

`## [1] "X[1, 1:5]"`

The `keep.source = FALSE`

makes `parse`

more efficient.

### 6.2.5 Why use node names?

Syntax like `model[["z[2, 3]"]]`

may seem strange at first, because
the natural habit of an R user would be `model[["z"]][2,3]`

. To see
its utility, consider the example of writing the nimbleFunction given in
Section 2.8. By giving every scalar node a name, even
if it is part of a multivariate variable, one can write functions in R
or NIMBLE that access any single node by a name, regardless of the
dimensionality of the variable in which it is embedded. This is particularly
useful for NIMBLE, which resolves how to access a particular node
during the compilation process.

### 6.2.6 Checking if a node holds data

Finally, you can query whether a node is flagged as data using the
`isData`

method applied to one or more nodes or nodes within variables:

`$isData('z[1]') model`

`## [1] TRUE`

`$isData(c('z[1]', 'z[2]', 'a')) model`

`## [1] TRUE TRUE FALSE`

`$isData('z') model`

`## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE`

`$isData('z[1:3, 1]') model`

`## [1] TRUE TRUE TRUE`

## 6.3 Using models in parallel

NIMBLE uses Reference Classes and R6 classes for models and algorithms. Objects of these classes are passed by reference and copies of such objects are simply new variable names that reference the same underlying object.

Thus to run an algorithm in parallel on a given model, one must create multiple copies of the model and algorithm, and compiled versions of these, by calling `nimbleModel`

, `buildMCMC`

, `compileNimble`

, etc. once for each copy. In other words all such calls should be within the parallelized block of code.

For a worked example in the context of MCMC, please see the parallelization example on our webpage.