# Chapter 13 Working with NIMBLE models

Here we describe how one can get information about NIMBLE models and
carry out operations on a model. While all of this functionality can
be used from R, its primary use occurs when writing nimbleFunctions (see Chapter 15).
Information about node types, distributions, and dimensions can be
used to determine algorithm behavior in *setup* code of nimbleFunctions. Information
about node or variable values or the parameter and bound values of a node would generally be used for algorithm
calculations in *run* code of nimbleFunctions. Similarly, carrying out numerical operations
on a model, including setting node or variable values, would generally be done in run code.

## 13.1 The variables and nodes in a NIMBLE model

Section 6.2 defines what we mean by variables and nodes in a NIMBLE model and discusses how to determine and access the nodes in a model and their dependency relationships. Here we’ll review and go into more detail on the topics of determining the nodes and node dependencies in a model.

### 13.1.1 Determining the nodes in a model

One can determine the variables in a model using `getVarNames`

and the nodes in a model using `getNodeNames`

, with optional arguments allowing you to select only certain types of nodes. We illustrate here with the pump model from Chapter 2.

`$getVarNames() pump`

```
## [1] "lifted_d1_over_beta" "theta" "lambda"
## [4] "x" "alpha" "beta"
```

`$getNodeNames() pump`

```
## [1] "alpha" "beta" "lifted_d1_over_beta"
## [4] "theta[1]" "theta[2]" "theta[3]"
## [7] "theta[4]" "theta[5]" "theta[6]"
## [10] "theta[7]" "theta[8]" "theta[9]"
## [13] "theta[10]" "lambda[1]" "lambda[2]"
## [16] "lambda[3]" "lambda[4]" "lambda[5]"
## [19] "lambda[6]" "lambda[7]" "lambda[8]"
## [22] "lambda[9]" "lambda[10]" "x[1]"
## [25] "x[2]" "x[3]" "x[4]"
## [28] "x[5]" "x[6]" "x[7]"
## [31] "x[8]" "x[9]" "x[10]"
```

`$getNodeNames(determOnly = TRUE) pump`

```
## [1] "lifted_d1_over_beta" "lambda[1]" "lambda[2]"
## [4] "lambda[3]" "lambda[4]" "lambda[5]"
## [7] "lambda[6]" "lambda[7]" "lambda[8]"
## [10] "lambda[9]" "lambda[10]"
```

`$getNodeNames(stochOnly = TRUE) pump`

```
## [1] "alpha" "beta" "theta[1]" "theta[2]" "theta[3]" "theta[4]"
## [7] "theta[5]" "theta[6]" "theta[7]" "theta[8]" "theta[9]" "theta[10]"
## [13] "x[1]" "x[2]" "x[3]" "x[4]" "x[5]" "x[6]"
## [19] "x[7]" "x[8]" "x[9]" "x[10]"
```

`$getNodeNames(dataOnly = TRUE) pump`

```
## [1] "x[1]" "x[2]" "x[3]" "x[4]" "x[5]" "x[6]" "x[7]" "x[8]" "x[9]"
## [10] "x[10]"
```

You can see one lifted node (see next section), `lifted_d1_over_beta`

, involved in a reparameterization to NIMBLE’s canonical parameterization of the gamma distribution for the `theta`

nodes.

We can determine the set of nodes contained in one or more nodes or variables using `expandNodeNames`

, illustrated here for an example with multivariate nodes. The
`returnScalarComponents`

argument also allows us to return all of the scalar elements of multivariate nodes.

```
<- nimbleCode({
multiVarCode2 1, 1:5] ~ dmnorm(mu[], cov[,])
X[6:10, 3] ~ dmnorm(mu[], cov[,])
X[for(i in 1:4)
~ dnorm(mn, 1)
Y[i]
})
<- nimbleModel(multiVarCode2,
multiVarModel2 dimensions = list(mu = 5, cov = c(5,5)),
calculate = FALSE)
$expandNodeNames("Y") multiVarModel2
```

`## [1] "Y[1]" "Y[2]" "Y[3]" "Y[4]"`

`$expandNodeNames(c("X", "Y"), returnScalarComponents = TRUE) multiVarModel2`

```
## [1] "X[1, 1]" "X[1, 2]" "X[1, 3]" "X[6, 3]" "X[7, 3]" "X[8, 3]"
## [7] "X[9, 3]" "X[10, 3]" "X[1, 4]" "X[1, 5]" "Y[1]" "Y[2]"
## [13] "Y[3]" "Y[4]"
```

As discussed in Section 6.2.6, you can determine whether a node is flagged as data using `isData`

.

### 13.1.2 Understanding lifted nodes

In some cases, NIMBLE introduces new nodes into the model that were
not specified in the BUGS code for the model, such as the
`lifted_d1_over_beta`

node in the introductory example. For
this reason, it is important that programs written to adapt to
different model structures use NIMBLE’s systems for querying the model
graph. For example, a call to `pump$getDependencies("beta")`

will
correctly include `lifted_d1_over_beta`

in the results. If one
skips this step and assumes the nodes are only those that appear in
the BUGS code, one may not get correct results.

It can be helpful to know the situations in which lifted nodes are generated. These include:

- When distribution parameters are expressions, NIMBLE creates a new deterministic node that contains the expression for a given parameter. The node is then a direct descendant of the new deterministic node. This is an optional feature, but it is currently enabled in all cases.
- As discussed in Section 5.2.6, the use of link functions causes new nodes to be introduced. This requires care if you need to initialize values in stochastic declarations with link functions.
- Use of alternative parameterizations of distributions, described in Section 5.2.4 causes new nodes to be introduced. For
example when a user provides the precision of a normal distribution
as
`tau`

, NIMBLE creates a new node`sd <- 1/sqrt(tau)`

and uses`sd`

as a parameter in the normal distribution. If many nodes use the same`tau`

, only one new`sd`

node will be created, so the computation`1/sqrt(tau)`

will not be repeated redundantly.

### 13.1.3 Determining dependencies in a model

Next we’ll see how to determine the node dependencies (or
‘descendants’ or child nodes) in a model. There are a variety of arguments to `getDependencies`

that
allow one to specify whether to include the node itself, whether to
include deterministic or stochastic or data dependents, etc. By
default `getDependencies`

returns descendants up to the next
stochastic node on all edges emanating from the node(s) specified as input. This is what would be
needed to calculate a Metropolis-Hastings acceptance probability in
MCMC, for example.

`$getDependencies("alpha") pump`

```
## [1] "alpha" "theta[1]" "theta[2]" "theta[3]" "theta[4]" "theta[5]"
## [7] "theta[6]" "theta[7]" "theta[8]" "theta[9]" "theta[10]"
```

`$getDependencies(c("alpha", "beta")) pump`

```
## [1] "alpha" "beta" "lifted_d1_over_beta"
## [4] "theta[1]" "theta[2]" "theta[3]"
## [7] "theta[4]" "theta[5]" "theta[6]"
## [10] "theta[7]" "theta[8]" "theta[9]"
## [13] "theta[10]"
```

`$getDependencies("theta[1:3]", self = FALSE) pump`

`## [1] "lambda[1]" "lambda[2]" "lambda[3]" "x[1]" "x[2]" "x[3]"`

`$getDependencies("theta[1:3]", stochOnly = TRUE, self = FALSE) pump`

`## [1] "x[1]" "x[2]" "x[3]"`

```
# get all dependencies, not just the direct descendants
$getDependencies("alpha", downstream = TRUE) pump
```

```
## [1] "alpha" "theta[1]" "theta[2]" "theta[3]" "theta[4]"
## [6] "theta[5]" "theta[6]" "theta[7]" "theta[8]" "theta[9]"
## [11] "theta[10]" "lambda[1]" "lambda[2]" "lambda[3]" "lambda[4]"
## [16] "lambda[5]" "lambda[6]" "lambda[7]" "lambda[8]" "lambda[9]"
## [21] "lambda[10]" "x[1]" "x[2]" "x[3]" "x[4]"
## [26] "x[5]" "x[6]" "x[7]" "x[8]" "x[9]"
## [31] "x[10]"
```

`$getDependencies("alpha", downstream = TRUE, dataOnly = TRUE) pump`

```
## [1] "x[1]" "x[2]" "x[3]" "x[4]" "x[5]" "x[6]" "x[7]" "x[8]" "x[9]"
## [10] "x[10]"
```

In addition, one can determine parent nodes using `getParents`

.

`$getParents("alpha") pump`

`## character(0)`

## 13.2 Accessing information about nodes and variables

### 13.2.1 Getting distributional information about a node

We briefly demonstrate some of the functionality for information about a node here,
but refer readers to the R help on `modelBaseClass`

for full details.

Here is an example model, with use of various functions to determine information about nodes or variables.

```
<- nimbleCode({
code for(i in 1:4)
~ dnorm(mu, sd = sigma)
y[i] ~ T(dnorm(0, 5), -20, 20)
mu ~ dunif(0, 10)
sigma
})<- nimbleModel(code, data = list(y = rnorm(4)),
m inits = list(mu = 0, sigma = 1))
$isEndNode('y') m
```

```
## y[1] y[2] y[3] y[4]
## TRUE TRUE TRUE TRUE
```

`$getDistribution('sigma') m`

```
## sigma
## "dunif"
```

`$isDiscrete(c('y', 'mu', 'sigma')) m`

```
## y[1] y[2] y[3] y[4] mu sigma
## FALSE FALSE FALSE FALSE FALSE FALSE
```

`$isDeterm('mu') m`

```
## mu
## FALSE
```

`$getDimension('mu') m`

```
## value
## 0
```

`$getDimension('mu', includeParams = TRUE) m`

```
## value mean sd tau var
## 0 0 0 0 0
```

Note that any variables provided to these functions are expanded into their constituent node names, so the length of results may not be the same length as the input vector of node and variable names. However the order of the results should be preserved relative to the order of the inputs, once the expansion is accounted for.

### 13.2.2 Getting information about a distribution

One can also get generic information about a distribution based on the
name of the distribution using the function `getDistributionInfo`

.
In particular, one can determine whether a distribution
was provided by the user (`isUserDefined`

), whether a distribution provides CDF and
quantile functions (`pqDefined`

), whether a distribution is a
discrete distribution (`isDiscrete`

), the parameter names (include
alternative parameterizations) for a distribution (`getParamNames`

),
and the dimension of the distribution and its parameters
(`getDimension`

). For more extensive information, please see the R
help for `getDistributionInfo`

.

### 13.2.3 Getting distribution parameter values for a node

The function `getParam`

provides access to values of the parameters
of a node’s distribution. The two arguments must
be the name of one (stochastic) node and the name of a parameter for the
distribution followed by that node. The parameter does not have to be
one of the parameters used when the node was declared. Alternative
parameterization values can also be obtained. See
Section 5.2.4.1 for available parameterizations.
(These can also be seen in `nimble:::distributionsInputList`

.)

Here is an example:

```
<- nimbleModel(
gammaModel nimbleCode({
~ dlnorm(0, 1)
a ~ dgamma(shape = 2, scale = a)
x data = list(x = 2.4), inits = list(a = 1.2))
}), $getParam('x', 'scale') gammaModel
```

`## [1] 1.2`

`$getParam('x', 'rate') gammaModel`

`## [1] 0.8333333`

`getParam`

is part of the NIMBLE language, so it can be used in run code of nimbleFunctions.

### 13.2.4 Getting distribution bounds for a node

The function `getBound`

provides access to the lower and upper
bounds of the distribution for a node. In most cases these bounds
will be fixed based on the distribution, but for the uniform distribution
the bounds are the parameters of the distribution, and when truncation is used (Section 5.2.7), the bounds will be determined
by the truncation. Like the functions described in the previous section, `getBound`

can be
used as global function taking a model as the first argument, or it
can be used as a model member function. The next two arguments must
be the name of one (stochastic) node and either `"lower"`

or `"upper"`

indicating whether the lower or upper bound is desired. For multivariate nodes
the bound is a scalar that is the bound for all elements of the node, as we
do not handle truncation for multivariate nodes.

Here is an example:

```
<- nimbleModel(
exampleModel nimbleCode({
~ T(dnorm(mu, sd = sig), a, Inf)
y ~ dunif(-1, b)
a ~ dgamma(1, 1)
b inits = list(a = -0.5, mu = 1, sig = 1, b = 4),
}), data = list(y = 4))
getBound(exampleModel, 'y', 'lower')
```

`## [1] -0.5`

`getBound(exampleModel, 'y', 'upper')`

`## [1] Inf`

```
$b <- 3
exampleModel$calculate(exampleModel$getDependencies('b')) exampleModel
```

`## [1] -4.386294`

`getBound(exampleModel, 'a', 'upper')`

`## [1] 3`

`$getBound('b','lower') exampleModel`

`## [1] 0`

`getBound`

is part of the NIMBLE language, so it can be used in run code of nimbleFunctions. In fact, we anticipate that
most use of `getBound`

will be for algorithms, such as for the reflection
version of the random walk MCMC sampler.

## 13.3 Carrying out model calculations

### 13.3.1 Core model operations: calculation and simulation

The four basic ways to operate a model are to calculate nodes, simulate into nodes, get the log probabilities (or probability densities) that have already been calculated, and compare the log probability of a new value to that of an old value. In more detail:

**calculate**For a stochastic node,`calculate`

determines the log probability value, stores it in the appropriate`logProb`

variable, and returns it. For a deterministic node,`calculate`

executes the deterministic calculation and returns 0.**simulate**For a stochastic node,`simulate`

generates a random draw. For deterministic nodes,`simulate`

is equivalent to`calculate`

without returning 0.`simulate`

always returns`NULL`

(or`void`

in C++).**getLogProb**`getLogProb`

simply returns the most recently calculated log probability value, or 0 for a deterministic node.**calculateDiff**`calculateDiff`

is identical to`calculate`

, but it returns the new log probability value minus the one that was previously stored. This is useful when one wants to change the value or values of node(s) in the model (e.g., by setting a value or`simulate`

) and then determine the change in the log probability, such as needed for a Metropolis-Hastings acceptance probability.

Each of these functions is accessed as a member function of a model
object, taking a vector of node names as an argument^{26}. If there is
more than one node name, `calculate`

and `getLogProb`

return the
sum of the log probabilities from each node, while `calculateDiff`

returns the sum of the new values minus the old values. Next we show
an example using `simulate`

.

#### 13.3.1.1 Example: simulating arbitrary collections of nodes

```
<- nimbleCode({
mc ~ dnorm(0, 0.001)
a for(i in 1:5) {
~ dnorm(a, 0.1)
y[i] for(j in 1:3)
~ dnorm(y[i], sd = 0.1)
z[i,j]
}1:5] <- y[1:5]^2
y.squared[
})
<- nimbleModel(mc, data = list(z = matrix(rnorm(15), nrow = 5)))
model
$a <- 1
model$y model
```

`## [1] NA NA NA NA NA`

```
$simulate("y[1:3]")
model# simulate(model, "y[1:3]")
$y model
```

`## [1] -4.3723097 1.7331821 0.6234022 NA NA`

```
$simulate("y")
model$y model
```

`## [1] 6.6051146 2.0859962 -0.9702564 -0.3898915 -0.5978887`

`$z model`

```
## [,1] [,2] [,3]
## [1,] -1.0314207 -0.9537793 -0.79453166
## [2,] -2.3710229 -0.3980044 -0.30881797
## [3,] -0.3245763 -0.3112171 0.36144477
## [4,] -0.9442988 0.7960927 1.39879110
## [5,] -0.7658900 0.9864283 -0.05607042
```

```
$simulate(c("y[1:3]", "z[1:5, 1:3]"))
model$y model
```

`## [1] 1.5882948 1.5578821 3.8961663 -0.3898915 -0.5978887`

`$z model`

```
## [,1] [,2] [,3]
## [1,] -1.0314207 -0.9537793 -0.79453166
## [2,] -2.3710229 -0.3980044 -0.30881797
## [3,] -0.3245763 -0.3112171 0.36144477
## [4,] -0.9442988 0.7960927 1.39879110
## [5,] -0.7658900 0.9864283 -0.05607042
```

```
$simulate(c("z[1:5, 1:3]"), includeData = TRUE)
model$z model
```

```
## [,1] [,2] [,3]
## [1,] 1.6203124 1.5516260 1.4942336
## [2,] 1.6213524 1.5516332 1.5761658
## [3,] 4.0065304 4.0713699 3.8007847
## [4,] -0.2254835 -0.4765649 -0.3632563
## [5,] -0.5756517 -0.6255796 -0.4584634
```

The example illustrates a number of features:

`simulate(model, nodes)`

is equivalent to`model$simulate(nodes)`

. You can use either, but the latter is encouraged and the former may be deprecated inthe future.- Inputs like
`"y[1:3]"`

are automatically expanded into`c("y[1]", "y[2]", "y[3]")`

. In fact, simply`"y"`

will be expanded into all nodes within`y`

. - An arbitrary number of nodes can be provided as a character vector.
- Simulations will be done in the order provided, so in practice
the nodes should often be obtained by functions such as
`getDependencies`

. These return nodes in topologically-sorted order, which means no node is manipulated before something it depends on. - The data nodes
`z`

were not simulated into until`includeData = TRUE`

was used.

Use of `calculate`

, `calculateDiff`

and `getLogProb`

are similar to `simulate`

,
except that they return a value (described above) and they have no `includeData`

argument.

### 13.3.2 Pre-defined nimbleFunctions for operating on model nodes: *simNodes*, *calcNodes*, and *getLogProbNodes*

`simNodes`

, `calcNodes`

and `getLogProbNodes`

are basic nimbleFunctions that simulate, calculate, or get the log probabilities
(densities), respectively, of the same set of nodes each time they
are called. Each of these
takes a model and a character string of node names
as inputs. If `nodes`

is left blank, then all the nodes of the model
are used.

For `simNodes`

, the nodes provided will be topologically sorted to
simulate in the correct order. For `calcNodes`

and `getLogProbNodes`

,
the nodes will be sorted and dependent nodes will be included. Recall that
the calculations must be up to date (from a `calculate`

call) for `getLogProbNodes`

to return the values you are probably looking for.

```
<- nimbleCode({
simpleModelCode for(i in 1:4){
~ dnorm(0,1)
x[i] ~ dnorm(x[i], 1) # y depends on x
y[i] ~ dnorm(y[i], 1) # z depends on y
z[i] # z conditionally independent of x
}
})
<- nimbleModel(simpleModelCode, check = FALSE)
simpleModel <- compileNimble(simpleModel)
cSimpleModel
# simulates all the x's and y's
<- simNodes(simpleModel, nodes = c('x', 'y') )
rSimXY
# calls calculate on x and its dependents (y, but not z)
<- calcNodes(simpleModel, nodes = 'x')
rCalcXDep
# calls getLogProb on x's and y's
<- getLogProbNodes(simpleModel,
rGetLogProbXDep nodes = 'x')
# compiling the functions
<- compileNimble(rSimXY, project = simpleModel)
cSimXY <- compileNimble(rCalcXDep, project = simpleModel)
cCalcXDep <- compileNimble(rGetLogProbXDep, project = simpleModel)
cGetLogProbXDep
$x cSimpleModel
```

`## [1] NA NA NA NA`

`$y cSimpleModel`

`## [1] NA NA NA NA`

```
# simulating x and y
$run()
cSimXY$x cSimpleModel
```

`## [1] -0.6589120 0.6605313 -0.0132557 -0.9314803`

`$y cSimpleModel`

`## [1] 0.5557771 -1.4278091 -0.5394082 -2.4728829`

`$run() cCalcXDep`

`## [1] -12.46535`

```
# gives correct answer because logProbs
# updated by 'calculate' after simulation
$run() cGetLogProbXDep
```

`## [1] -12.46535`

```
$run()
cSimXY
# gives old answer because logProbs
# not updated after 'simulate'
$run() cGetLogProbXDep
```

`## [1] -12.46535`

`$run() cCalcXDep`

`## [1] -9.277808`

### 13.3.3 Accessing log probabilities via *logProb* variables

For each variable that contains at least one stochastic node, NIMBLE generates a model variable with the prefix ‘logProb_’. In general users will not need to access these logProb variables directly but rather will use `getLogProb`

. However, knowing they exist can be useful, in part because these variables can be monitored in an MCMC.

When the stochastic node is scalar, the
`logProb`

variable will have the same size. For example:

`$logProb_y model`

`## [1] NA NA NA NA NA`

`$calculate("y") model`

`## [1] -11.02766`

`$logProb_y model`

`## [1] -2.087536 -2.085793 -2.489620 -2.166821 -2.197894`

Creation of `logProb`

variables for stochastic multivariate nodes
is trickier, because they can represent an arbitrary block of a larger
variable. In general NIMBLE records the logProb values using
the lowest possible indices. For example, if `x[5:10, 15:20]`

follows a Wishart distribution, its log probability (density) value
will be stored in `logProb_x[5, 15]`

. When possible, NIMBLE will
reduce the dimensions of the corresponding logProb variable. For
example, in

`for(i in 1:10) x[i,] ~ dmnorm(mu[], prec[,])`

`x`

may be \(10 \times 20\) (dimensions must be provided), but `logProb_x`

will be \(10 \times 1\). For the most part you do not need to worry about
how NIMBLE is storing the log probability values, because you can
always get them using `getLogProb`

.

Standard usage is

`model$calculate(nodes)`

but`calculate(model, nodes)`

is synonymous.↩︎