# Chapter 13 Working with NIMBLE models

Here we describe how one can get information about NIMBLE models and carry out operations on a model. While all of this functionality can be used from R, its primary use occurs when writing nimbleFunctions (see Chapter 15). Information about node types, distributions, and dimensions can be used to determine algorithm behavior in *setup* code of nimbleFunctions. Information about node or variable values or the parameter and bound values of a node would generally be used for algorithm calculations in *run* code of nimbleFunctions. Similarly, carrying out numerical operations on a model, including setting node or variable values, would generally be done in run code.

## 13.1 The variables and nodes in a NIMBLE model

Section 6.2 defines what we mean by variables and nodes in a NIMBLE model and discusses how to determine and access the nodes in a model and their dependency relationships. Here we’ll review and go into more detail on the topics of determining the nodes and node dependencies in a model.

### 13.1.1 Determining the nodes in a model

One can determine the variables in a model using `getVarNames`

and the nodes in a model using `getNodeNames`

, with optional arguments allowing you to select only certain types of nodes. We illustrate here with the pump model from Chapter 2.

`pump$getVarNames()`

```
## [1] "lifted_d1_over_beta" "theta" "lambda"
## [4] "x" "alpha" "beta"
```

`pump$getNodeNames()`

```
## [1] "alpha" "beta" "lifted_d1_over_beta"
## [4] "theta[1]" "theta[2]" "theta[3]"
## [7] "theta[4]" "theta[5]" "theta[6]"
## [10] "theta[7]" "theta[8]" "theta[9]"
## [13] "theta[10]" "lambda[1]" "lambda[2]"
## [16] "lambda[3]" "lambda[4]" "lambda[5]"
## [19] "lambda[6]" "lambda[7]" "lambda[8]"
## [22] "lambda[9]" "lambda[10]" "x[1]"
## [25] "x[2]" "x[3]" "x[4]"
## [28] "x[5]" "x[6]" "x[7]"
## [31] "x[8]" "x[9]" "x[10]"
```

`pump$getNodeNames(determOnly = TRUE)`

```
## [1] "lifted_d1_over_beta" "lambda[1]" "lambda[2]"
## [4] "lambda[3]" "lambda[4]" "lambda[5]"
## [7] "lambda[6]" "lambda[7]" "lambda[8]"
## [10] "lambda[9]" "lambda[10]"
```

`pump$getNodeNames(stochOnly = TRUE)`

```
## [1] "alpha" "beta" "theta[1]" "theta[2]" "theta[3]"
## [6] "theta[4]" "theta[5]" "theta[6]" "theta[7]" "theta[8]"
## [11] "theta[9]" "theta[10]" "x[1]" "x[2]" "x[3]"
## [16] "x[4]" "x[5]" "x[6]" "x[7]" "x[8]"
## [21] "x[9]" "x[10]"
```

`pump$getNodeNames(dataOnly = TRUE)`

```
## [1] "x[1]" "x[2]" "x[3]" "x[4]" "x[5]" "x[6]" "x[7]" "x[8]"
## [9] "x[9]" "x[10]"
```

You can see one lifted node (see next section), `lifted_d1_over_beta`

, involved in a reparameterization to NIMBLE’s canonical parameterization of the gamma distribution for the `theta`

nodes.

We can determine the set of nodes contained in one or more nodes or variables using `expandNodeNames`

, illustrated here for an example with multivariate nodes. The `returnScalarComponents`

argument also allows us to return all of the scalar elements of multivariate nodes.

```
multiVarCode2 <- nimbleCode({
X[1, 1:5] ~ dmnorm(mu[], cov[,])
X[6:10, 3] ~ dmnorm(mu[], cov[,])
for(i in 1:4)
Y[i] ~ dnorm(mn, 1)
})
multiVarModel2 <- nimbleModel(multiVarCode2,
dimensions = list(mu = 5, cov = c(5,5)),
calculate = FALSE)
multiVarModel2$expandNodeNames("Y")
```

`## [1] "Y[1]" "Y[2]" "Y[3]" "Y[4]"`

`multiVarModel2$expandNodeNames(c("X", "Y"), returnScalarComponents = TRUE)`

```
## [1] "X[1, 1]" "X[1, 2]" "X[1, 3]" "X[6, 3]" "X[7, 3]" "X[8, 3]"
## [7] "X[9, 3]" "X[10, 3]" "X[1, 4]" "X[1, 5]" "Y[1]" "Y[2]"
## [13] "Y[3]" "Y[4]"
```

As discussed in Section 6.2.6, you can determine whether a node is flagged as data using `isData`

.

### 13.1.2 Understanding lifted nodes

In some cases, NIMBLE introduces new nodes into the model that were not specified in the BUGS code for the model, such as the `lifted_d1_over_beta`

node in the introductory example. For this reason, it is important that programs written to adapt to different model structures use NIMBLE’s systems for querying the model graph. For example, a call to `pump$getDependencies("beta")`

will correctly include `lifted_d1_over_beta`

in the results. If one skips this step and assumes the nodes are only those that appear in the BUGS code, one may not get correct results.

It can be helpful to know the situations in which lifted nodes are generated. These include:

- When distribution parameters are expressions, NIMBLE creates a new deterministic node that contains the expression for a given parameter. The node is then a direct descendant of the new deterministic node. This is an optional feature, but it is currently enabled in all cases.
- As discussed in Section 5.2.6, the use of link functions causes new nodes to be introduced. This requires care if you need to initialize values in stochastic declarations with link functions.
- Use of alternative parameterizations of distributions, described in Section 5.2.4 causes new nodes to be introduced. For example when a user provides the precision of a normal distribution as
`tau`

, NIMBLE creates a new node`sd <- 1/sqrt(tau)`

and uses`sd`

as a parameter in the normal distribution. If many nodes use the same`tau`

, only one new`sd`

node will be created, so the computation`1/sqrt(tau)`

will not be repeated redundantly.

### 13.1.3 Determining dependencies in a model

Next we’ll see how to determine the node dependencies (or ‘descendants’) in a model. There are a variety of arguments to `getDependencies`

that allow one to specify whether to include the node itself, whether to include deterministic or stochastic or data dependents, etc. By default `getDependencies`

returns descendants up to the next stochastic node on all edges emanating from the node(s) specified as input. This is what would be needed to calculate a Metropolis-Hastings acceptance probability in MCMC, for example.

`pump$getDependencies("alpha")`

```
## [1] "alpha" "theta[1]" "theta[2]" "theta[3]" "theta[4]"
## [6] "theta[5]" "theta[6]" "theta[7]" "theta[8]" "theta[9]"
## [11] "theta[10]"
```

`pump$getDependencies(c("alpha", "beta"))`

```
## [1] "alpha" "beta" "lifted_d1_over_beta"
## [4] "theta[1]" "theta[2]" "theta[3]"
## [7] "theta[4]" "theta[5]" "theta[6]"
## [10] "theta[7]" "theta[8]" "theta[9]"
## [13] "theta[10]"
```

`pump$getDependencies("theta[1:3]", self = FALSE)`

`## [1] "lambda[1]" "lambda[2]" "lambda[3]" "x[1]" "x[2]" "x[3]"`

`pump$getDependencies("theta[1:3]", stochOnly = TRUE, self = FALSE)`

`## [1] "x[1]" "x[2]" "x[3]"`

```
# get all dependencies, not just the direct descendants
pump$getDependencies("alpha", downstream = TRUE)
```

```
## [1] "alpha" "theta[1]" "theta[2]" "theta[3]" "theta[4]"
## [6] "theta[5]" "theta[6]" "theta[7]" "theta[8]" "theta[9]"
## [11] "theta[10]" "lambda[1]" "lambda[2]" "lambda[3]" "lambda[4]"
## [16] "lambda[5]" "lambda[6]" "lambda[7]" "lambda[8]" "lambda[9]"
## [21] "lambda[10]" "x[1]" "x[2]" "x[3]" "x[4]"
## [26] "x[5]" "x[6]" "x[7]" "x[8]" "x[9]"
## [31] "x[10]"
```

`pump$getDependencies("alpha", downstream = TRUE, dataOnly = TRUE)`

```
## [1] "x[1]" "x[2]" "x[3]" "x[4]" "x[5]" "x[6]" "x[7]" "x[8]"
## [9] "x[9]" "x[10]"
```

## 13.2 Accessing information about nodes and variables

### 13.2.1 Getting distributional information about a node

We briefly demonstrate some of the functionality for information about a node here, but refer readers to the R help on `modelBaseClass`

for full details.

Here is an example model, with use of various functions to determine information about nodes or variables.

```
code <- nimbleCode({
for(i in 1:4)
y[i] ~ dnorm(mu, sd = sigma)
mu ~ T(dnorm(0, 5), -20, 20)
sigma ~ dunif(0, 10)
})
m <- nimbleModel(code, data = list(y = rnorm(4)),
inits = list(mu = 0, sigma = 1))
m$isEndNode('y')
```

```
## y[1] y[2] y[3] y[4]
## TRUE TRUE TRUE TRUE
```

`m$getDistribution('sigma')`

```
## sigma
## "dunif"
```

`m$isDiscrete(c('y', 'mu', 'sigma'))`

```
## y[1] y[2] y[3] y[4] mu sigma
## FALSE FALSE FALSE FALSE FALSE FALSE
```

`m$isDeterm('mu')`

```
## mu
## FALSE
```

`m$getDimension('mu')`

```
## value
## 0
```

`m$getDimension('mu', includeParams = TRUE)`

```
## value mean sd tau var
## 0 0 0 0 0
```

Note that any variables provided to these functions are expanded into their constituent node names, so the length of results may not be the same length as the input vector of node and variable names. However the order of the results should be preserved relative to the order of the inputs, once the expansion is accounted for.

### 13.2.2 Getting information about a distribution

One can also get generic information about a distribution based on the name of the distribution using the function `getDistributionInfo`

. In particular, one can determine whether a distribution was provided by the user (`isUserDefined`

), whether a distribution provides CDF and quantile functions (`pqDefined`

), whether a distribution is a discrete distribution (`isDiscrete`

), the parameter names (include alternative parameterizations) for a distribution (`getParamNames`

), and the dimension of the distribution and its parameters (`getDimension`

). For more extensive information, please see the R help for `getDistributionInfo`

.

### 13.2.3 Getting distribution parameter values for a node

The function `getParam`

provides access to values of the parameters of a node’s distribution. `getParam`

can be used as global function taking a model as the first argument, or it can be used as a model member function. The next two arguments must be the name of one (stochastic) node and the name of a parameter for the distribution followed by that node. The parameter does not have to be one of the parameters used when the node was declared. Alternative parameterization values can also be obtained. See Section 5.2.4.1 for available parameterizations. (These can also be seen in `nimble:::distributionsInputList`

.)

Here is an example:

```
gammaModel <- nimbleModel(
nimbleCode({
a ~ dlnorm(0, 1)
x ~ dgamma(shape = 2, scale = a)
}), data = list(x = 2.4), inits = list(a = 1.2))
getParam(gammaModel, 'x', 'scale')
```

`## [1] 1.2`

`getParam(gammaModel, 'x', 'rate')`

`## [1] 0.8333333`

`gammaModel$getParam('x','rate')`

`## [1] 0.8333333`

`getParam`

is part of the NIMBLE language, so it can be used in run code of nimbleFunctions.

### 13.2.4 Getting distribution bounds for a node

The function `getBound`

provides access to the lower and upper bounds of the distribution for a node. In most cases these bounds will be fixed based on the distribution, but for the uniform distribution the bounds are the parameters of the distribution, and when truncation is used (Section 5.2.7), the bounds will be determined by the truncation. Like the functions described in the previous section, `getBound`

can be used as global function taking a model as the first argument, or it can be used as a model member function. The next two arguments must be the name of one (stochastic) node and either `"lower"`

or `"upper"`

indicating whether the lower or upper bound is desired. For multivariate nodes the bound is a scalar that is the bound for all elements of the node, as we do not handle truncation for multivariate nodes.

Here is an example:

```
exampleModel <- nimbleModel(
nimbleCode({
y ~ T(dnorm(mu, sd = sig), a, Inf)
a ~ dunif(-1, b)
b ~ dgamma(1, 1)
}), inits = list(a = -0.5, mu = 1, sig = 1, b = 4),
data = list(y = 4))
getBound(exampleModel, 'y', 'lower')
```

`## [1] -0.5`

`getBound(exampleModel, 'y', 'upper')`

`## [1] Inf`

```
exampleModel$b <- 3
exampleModel$calculate(exampleModel$getDependencies('b'))
```

`## [1] -4.386294`

`getBound(exampleModel, 'a', 'upper')`

`## [1] 3`

`exampleModel$getBound('b','lower')`

`## [1] 0`

`getBound`

is part of the NIMBLE language, so it can be used in run code of nimbleFunctions. In fact, we anticipate that most use of `getBound`

will be for algorithms, such as for the reflection version of the random walk MCMC sampler.

## 13.3 Carrying out model calculations

### 13.3.1 Core model operations: calculation and simulation

The four basic ways to operate a model are to calculate nodes, simulate into nodes, get the log probabilities (or probability densities) that have already been calculated, and compare the log probability of a new value to that of an old value. In more detail:

**calculate**For a stochastic node,`calculate`

determines the log probability value, stores it in the appropriate`logProb`

variable, and returns it. For a deterministic node,`calculate`

executes the deterministic calculation and returns 0.**simulate**For a stochastic node,`simulate`

generates a random draw. For deterministic nodes,`simulate`

is equivalent to`calculate`

without returning 0.`simulate`

always returns`NULL`

(or`void`

in C++).**getLogProb**`getLogProb`

simply returns the most recently calculated log probability value, or 0 for a deterministic node.**calculateDiff**`calculateDiff`

is identical to`calculate`

, but it returns the new log probability value minus the one that was previously stored. This is useful when one wants to change the value or values of node(s) in the model (e.g., by setting a value or`simulate`

) and then determine the change in the log probability, such as needed for a Metropolis-Hastings acceptance probability.

Each of these functions is accessed as a member function of a model object, taking a vector of node names as an argument^{23}. If there is more than one node name, `calculate`

and `getLogProb`

return the sum of the log probabilities from each node, while `calculateDiff`

returns the sum of the new values minus the old values. Next we show an example using `simulate`

.

#### 13.3.1.1 Example: simulating arbitrary collections of nodes

```
mc <- nimbleCode({
a ~ dnorm(0, 0.001)
for(i in 1:5) {
y[i] ~ dnorm(a, 0.1)
for(j in 1:3)
z[i,j] ~ dnorm(y[i], sd = 0.1)
}
y.squared[1:5] <- y[1:5]^2
})
model <- nimbleModel(mc, data = list(z = matrix(rnorm(15), nrow = 5)))
model$a <- 1
model$y
```

`## [1] NA NA NA NA NA`

```
model$simulate("y[1:3]")
# simulate(model, "y[1:3]")
model$y
```

`## [1] -5.5477219 10.6631058 0.1735369 NA NA`

```
model$simulate("y")
model$y
```

`## [1] -0.4369177 1.4982502 3.9516343 1.9576271 -5.1858901`

`model$z`

```
## [,1] [,2] [,3]
## [1,] -1.26959907 1.8001123 2.2363228
## [2,] 2.34949332 1.0114402 0.3022651
## [3,] -1.41200541 -0.5637166 -1.0425066
## [4,] -0.01696149 0.2054208 -0.9835423
## [5,] -0.54431935 1.1654620 2.0057186
```

```
model$simulate(c("y[1:3]", "z[1:5, 1:3]"))
model$y
```

`## [1] 2.117981 2.424367 3.085683 1.957627 -5.185890`

`model$z`

```
## [,1] [,2] [,3]
## [1,] -1.26959907 1.8001123 2.2363228
## [2,] 2.34949332 1.0114402 0.3022651
## [3,] -1.41200541 -0.5637166 -1.0425066
## [4,] -0.01696149 0.2054208 -0.9835423
## [5,] -0.54431935 1.1654620 2.0057186
```

```
model$simulate(c("z[1:5, 1:3]"), includeData = TRUE)
model$z
```

```
## [,1] [,2] [,3]
## [1,] 2.014839 1.880879 2.085524
## [2,] 2.329938 2.347778 2.328989
## [3,] 3.045883 3.054561 3.165292
## [4,] 2.056270 1.878174 1.926745
## [5,] -5.149746 -5.046011 -5.191497
```

The example illustrates a number of features:

`simulate(model, nodes)`

is equivalent to`model$simulate(nodes)`

. You can use either, but the latter is encouraged and the former may be deprecated inthe future.- Inputs like
`"y[1:3]"`

are automatically expanded into`c("y[1]", "y[2]", "y[3]")`

. In fact, simply`"y"`

will be expanded into all nodes within`y`

. - An arbitrary number of nodes can be provided as a character vector.
- Simulations will be done in the order provided, so in practice the nodes should often be obtained by functions such as
`getDependencies`

. These return nodes in topologically-sorted order, which means no node is manipulated before something it depends on. - The data nodes
`z`

were not simulated into until`includeData = TRUE`

was used.

Use of `calculate`

, `calculateDiff`

and `getLogProb`

are similar to `simulate`

, except that they return a value (described above) and they have no `includeData`

argument.

### 13.3.2 Pre-defined nimbleFunctions for operating on model nodes: *simNodes*, *calcNodes*, and *getLogProbNodes*

`simNodes`

, `calcNodes`

and `getLogProbNodes`

are basic nimbleFunctions that simulate, calculate, or get the log probabilities (densities), respectively, of the same set of nodes each time they are called. Each of these takes a model and a character string of node names as inputs. If `nodes`

is left blank, then all the nodes of the model are used.

For `simNodes`

, the nodes provided will be topologically sorted to simulate in the correct order. For `calcNodes`

and `getLogProbNodes`

, the nodes will be sorted and dependent nodes will be included. Recall that the calculations must be up to date (from a `calculate`

call) for `getLogProbNodes`

to return the values you are probably looking for.

```
simpleModelCode <- nimbleCode({
for(i in 1:4){
x[i] ~ dnorm(0,1)
y[i] ~ dnorm(x[i], 1) # y depends on x
z[i] ~ dnorm(y[i], 1) # z depends on y
# z conditionally independent of x
}
})
simpleModel <- nimbleModel(simpleModelCode, check = FALSE)
cSimpleModel <- compileNimble(simpleModel)
# simulates all the x's and y's
rSimXY <- simNodes(simpleModel, nodes = c('x', 'y') )
# calls calculate on x and its dependents (y, but not z)
rCalcXDep <- calcNodes(simpleModel, nodes = 'x')
# calls getLogProb on x's and y's
rGetLogProbXDep <- getLogProbNodes(simpleModel,
nodes = 'x')
# compiling the functions
cSimXY <- compileNimble(rSimXY, project = simpleModel)
cCalcXDep <- compileNimble(rCalcXDep, project = simpleModel)
cGetLogProbXDep <- compileNimble(rGetLogProbXDep, project = simpleModel)
cSimpleModel$x
```

`## [1] NA NA NA NA`

`cSimpleModel$y`

`## [1] NA NA NA NA`

```
# simulating x and y
cSimXY$run()
```

`## NULL`

`cSimpleModel$x`

`## [1] -1.6988735 0.2318525 -0.1190907 1.7724929`

`cSimpleModel$y`

`## [1] -1.3554513 -0.3911972 -0.5586130 1.2671961`

`cCalcXDep$run()`

`## [1] -10.87675`

```
# gives correct answer because logProbs
# updated by 'calculate' after simulation
cGetLogProbXDep$run()
```

`## [1] -10.87675`

`cSimXY$run()`

`## NULL`

```
# gives old answer because logProbs
# not updated after 'simulate'
cGetLogProbXDep$run()
```

`## [1] -10.87675`

`cCalcXDep$run()`

`## [1] -8.568001`

### 13.3.3 Accessing log probabilities via *logProb* variables

For each variable that contains at least one stochastic node, NIMBLE generates a model variable with the prefix ‘logProb_’. In general users will not need to access these logProb variables directly but rather will use `getLogProb`

. However, knowing they exist can be useful, in part because these variables can be monitored in an MCMC.

When the stochastic node is scalar, the `logProb`

variable will have the same size. For example:

`model$logProb_y`

`## [1] NA NA NA NA NA`

`model$calculate("y")`

`## [1] -12.69171`

`model$logProb_y`

`## [1] -2.132725 -2.171672 -2.287735 -2.116084 -3.983493`

Creation of `logProb`

variables for stochastic multivariate nodes is trickier, because they can represent an arbitrary block of a larger variable. In general NIMBLE records the logProb values using the lowest possible indices. For example, if `x[5:10, 15:20]`

follows a Wishart distribution, its log probability (density) value will be stored in `logProb_x[5, 15]`

. When possible, NIMBLE will reduce the dimensions of the corresponding logProb variable. For example, in

`for(i in 1:10) x[i,] ~ dmnorm(mu[], prec[,])`

`x`

may be \(10 \times 20\) (dimensions must be provided), but `logProb_x`

will be \(10 \times 1\). For the most part you do not need to worry about how NIMBLE is storing the log probability values, because you can always get them using `getLogProb`

.

Standard usage is

`model$calculate(nodes)`

but`calculate(model, nodes)`

is synonymous.↩