Chapter 5 Writing models in NIMBLE’s dialect of BUGS

Models in NIMBLE are written using a variation on the BUGS language. From BUGS code, NIMBLE creates a model object. This chapter describes NIMBLE’s version of BUGS. The next chapter explains how to build and manipulate model objects.

5.1 Comparison to BUGS dialects supported by WinBUGS, OpenBUGS and JAGS

Many users will come to NIMBLE with some familiarity with WinBUGS, OpenBUGS, or JAGS, so we start by summarizing how NIMBLE is similar to and different from those before documenting NIMBLE’s version of BUGS more completely. In general, NIMBLE aims to be compatible with the original BUGS language and also JAGS’ version. However, at this point, there are some features not supported by NIMBLE, and there are some extensions that are planned but not implemented.

5.1.1 Supported features of BUGS and JAGS

  1. Stochastic and deterministic10 node declarations.
  2. Most univariate and multivariate distributions.
  3. Link functions.
  4. Most mathematical functions.
  5. ‘for’ loops for iterative declarations.
  6. Arrays of nodes up to 4 dimensions.
  7. Truncation and censoring as in JAGS using the T() notation and dinterval.

5.1.2 NIMBLE’s Extensions to BUGS and JAGS

NIMBLE extends the BUGS language in the following ways:

  1. User-defined functions and distributions – written as nimbleFunctions – can be used in model code. See Chapter 12.
  2. Multiple parameterizations for distributions, similar to those in R, can be used.
  3. Named parameters for distributions and functions, similar to R function calls, can be used.
  4. Linear algebra, including for vectorized calculations of simple algebra, can be used in deterministic declarations.
  5. Distribution parameters can be expressions, as in JAGS but not in WinBUGS. Caveat: parameters to multivariate distributions (e.g., dmnorm) cannot be expressions (but an expression can be defined in a separate deterministic expression and the resulting variable then used).
  6. Alternative models can be defined from the same model code by using if-then-else statements that are evaluated when the model is defined.
  7. More flexible indexing of vector nodes within larger variables is allowed. For example one can place a multivariate normal vector arbitrarily within a higher-dimensional object, not just in the last index.
  8. More general constraints can be declared using dconstraint, which extends the concept of JAGS’ dinterval.
  9. Link functions can be used in stochastic, as well as deterministic, declarations.11
  10. Data values can be reset, and which parts of a model are flagged as data can be changed, allowing one model to be used for different data sets without rebuilding the model each time.
  11. One can use stochastic/dynamic indexes, i.e., indexes can be other nodes or functions of other nodes. For a given dimension of a node being indexed, if the index is not constant, it must be a scalar value. So expressions such as mu[k[i], 3] or mu[k[i], 1:3] or mu[k[i], j[i]] are allowed, but not mu[k[i]:(k[i]+1)]12 Nested dynamic indexes such as mu[k[j[i]]] are also allowed.

5.1.3 Not-supported features of BUGS and JAGS

The following are not supported.

  1. The appearance of the same node on the left-hand side of both a <- and a \(\sim\) declaration (used in WinBUGS for data assignment for the value of a stochastic node).
  2. Multivariate nodes must appear with brackets, even if they are empty. E.g., x cannot be multivariate but x[] or x[2:5] can be.
  3. NIMBLE generally determines the dimensionality and sizes of variables from the BUGS code. However, when a variable appears with blank indices, such as in x.sum <- sum(x[]), and if the dimensions of the variable are not clearly defined in other declarations, NIMBLE currently requires that the dimensions of x be provided when the model object is created (via nimbleModel).
  4. Use of non-sequential indexes in the definition of a for loop. JAGS allows syntax such as for(i in expr) when expr evaluates to an integer vector. In NIMBLE one can use the work-around of defining a constant vector, say k, and using k[i] in the body of the for loop.
  5. Use of non-sequential indexes to subset variables. JAGS allows syntax such as sum(mu[expr]) when expr evaluates to an integer vector. In NIMBLE one can use the work-around of defining a nimbleFunction that takes mu and the index vector as arguments and does the calculation of interest.

5.2 Writing models

Here we introduce NIMBLE’s version of BUGS. The WinBUGS, OpenBUGS and JAGS manuals are also useful resources for writing BUGS models, including many examples.

5.2.1 Declaring stochastic and deterministic nodes

BUGS is a declarative language for graphical (or hierarchical) models. Most programming languages are imperative, which means a series of commands will be executed in the order they are written. A declarative language like BUGS is more like building a machine before using it. Each line declares that a component should be plugged into the machine, but it doesn’t matter in what order they are declared as long as all the right components are plugged in by the end of the code.

The machine in this case is a graphical model13. A node (sometimes called a vertex) holds one value, which may be a scalar or a vector. Edges define the relationships between nodes. A huge variety of statistical models can be thought of as graphs.

Here is the code to define and create a simple linear regression model with four observations.

library(nimble)
mc <- nimbleCode({
    intercept ~ dnorm(0, sd = 1000)
    slope ~ dnorm(0, sd = 1000)
    sigma ~ dunif(0, 100)
    for(i in 1:4) {
        predicted.y[i] <- intercept + slope * x[i]
        y[i] ~ dnorm(predicted.y[i], sd = sigma)
    }
})

model <- nimbleModel(mc, data = list(y = rnorm(4)))

library(igraph)

layout <- matrix(ncol = 2, byrow = TRUE,
   # These seem to be rescaled to fit in the plot area,
   # so I'll just use 0-100 as the scale
                 data = c(33, 100,
                          66, 100,
                          50, 0, # first three are parameters
                          15, 50, 35, 50, 55, 50, 75, 50, # x's
                          20, 75, 40, 75, 60, 75, 80, 75, # predicted.y's
                          25, 25, 45, 25, 65, 25, 85, 25) # y's
                 )

sizes <- c(45, 30, 30,
           rep(20, 4),
           rep(50, 4),
           rep(20, 4))

edge.color <- "black"
    # c(
    # rep("green", 8),
    # rep("red", 4),
    # rep("blue", 4),
    # rep("purple", 4))
stoch.color <- "deepskyblue2"
det.color <- "orchid3"
rhs.color <- "gray73"
fill.color <- c(
    rep(stoch.color, 3),
    rep(rhs.color, 4),
    rep(det.color, 4),
    rep(stoch.color, 4)
)


plot(model$graph, vertex.shape = "crectangle",
     vertex.size = sizes,
     vertex.size2 = 20,
     layout = layout,
     vertex.label.cex = 3.0,
     vertex.color = fill.color,
     edge.width = 3,
     asp = 0.5,
     edge.color = edge.color)
Graph of a linear regression model

Figure 5.1: Graph of a linear regression model

The graph representing the model is shown in Figure 5.1. Each observation, y[i], is a node whose edges say that it follows a normal distribution depending on a predicted value, predicted.y[i], and standard deviation, sigma, which are each nodes. Each predicted value is a node whose edges say how it is calculated from slope, intercept, and one value of an explanatory variable, x[i], which are each nodes.

This graph is created from the following BUGS code:

{
    intercept ~ dnorm(0, sd = 1000)
    slope ~ dnorm(0, sd = 1000)
    sigma ~ dunif(0, 100)
    for(i in 1:4) {
        predicted.y[i] <- intercept + slope * x[i]
        y[i] ~ dnorm(predicted.y[i], sd = sigma)
    }
}

In this code, stochastic relationships are declared with ‘\(\sim\)’ and deterministic relationships are declared with ‘<-’. For example, each y[i] follows a normal distribution with mean predicted.y[i] and standard deviation sigma. Each predicted.y[i] is the result of intercept + slope * x[i]. The for-loop yields the equivalent of writing four lines of code, each with a different value of i. It does not matter in what order the nodes are declared. Imagine that each line of code draws part of Figure 5.1, and all that matters is that the everything gets drawn in the end. Available distributions, default and alternative parameterizations, and functions are listed in Section 5.2.4.

An equivalent graph can be created by this BUGS code:

{
    intercept ~ dnorm(0, sd = 1000)
    slope ~ dnorm(0, sd = 1000)
    sigma ~ dunif(0, 100)
    for(i in 1:4) {
        y[i] ~ dnorm(intercept + slope * x[i], sd = sigma)
    }
}

In this case, the predicted.y[i] nodes in Figure 5.1 will be created automatically by NIMBLE and will have a different name, generated by NIMBLE.

5.2.2 More kinds of BUGS declarations

Here are some examples of valid lines of BUGS code. This code does not describe a sensible or complete model, and it includes some arbitrary indices (e.g. mvx[8:10, i]) to illustrate flexibility. Instead the purpose of each line is to illustrate a feature of NIMBLE’s version of BUGS.

{
    # 1. normal distribution with BUGS parameter order
    x ~ dnorm(a + b * c, tau) 
    # 2. normal distribution with a named parameter
    y ~ dnorm(a + b * c, sd = sigma) 
    # 3. For-loop and nested indexing
    for(i in 1:N) {
        for(j in 1:M[i]) {
            z[i,j] ~ dexp(r[ blockID[i] ]) 
        }
    }
    # 4. multivariate distribution with arbitrary indexing
    for(i in 1:3) 
        mvx[8:10, i] ~ dmnorm(mvMean[3:5], cov = mvCov[1:3, 1:3, i])
    # 5. User-provided distribution
    w ~ dMyDistribution(hello = x, world = y) 
    # 6. Simple deterministic node
    d1 <- a + b
    # 7. Vector deterministic node with matrix multiplication
    d2[] <- A[ , ] %*% mvMean[1:5] 
    # 8. Deterministic node with user-provided function
    d3 <- foo(x, hooray = y) 
}

When a variable appears only on the right-hand side, it can be provided via constants (in which case it can never be changed) or via data or inits, as discussed in Chapter 6.

Notes on the comment-numbered lines are:

  1. x follows a normal distribution with mean a + b*c and precision tau (default BUGS second parameter for dnorm).
  2. y follows a normal distribution with the same mean as x but a named standard deviation parameter instead of a precision parameter (sd = 1/sqrt(precision)).
  3. z[i, j] follows an exponential distribution with parameter r[ blockID[i] ]. This shows how for-loops can be used for indexing of variables containing multiple nodes. Variables that define for-loop indices (N and M) must also be provided as constants.
  4. The arbitrary block mvx[8:10, i] follows a multivariate normal distribution, with a named covariance matrix instead of BUGS’ default of a precision matrix. As in R, curly braces for for-loop contents are only needed if there is more than one line.
  5. w follows a user-defined distribution. See Chapter 12.
  6. d1 is a scalar deterministic node that, when calculated, will be set to a + b.
  7. d2 is a vector deterministic node using matrix multiplication in R’s syntax.
  8. d3 is a deterministic node using a user-provided function. See Chapter 12.

5.2.2.1 More about indexing

Examples of allowed indexing include:

  • x[i] # a single index
  • x[i:j] # a range of indices
  • x[i:j,k:l] # multiple single indices or ranges for higher-dimensional arrays
  • x[i:j, ] # blank indices indicating the full range
  • x[3*i+7] # computed indices
  • x[(3*i):(5*i+1)] # computed lower and upper ends of an index range
  • x[k[i]+1] # a dynamic (and computed) index
  • x[k[j[i]]] # nested dynamic indexes
  • x[k[i], 1:3] # nested indexing of rows or columns

NIMBLE does not allow multivariate nodes to be used without square brackets, which is an incompatibility with JAGS. Therefore a statement like xbar <- mean(x) in JAGS must be converted to xbar <- mean(x[]) (if x is a vector) or xbar <- mean(x[,]) (if x is a matrix) for NIMBLE14. Section 6.1.1.5 discusses how to provide NIMBLE with dimensions of x when needed.

One cannot provide a vector of indices that are not constant. For example, x[start[i]:end[i]] is allowed only if start and end are provided as constants. Also, one cannot provide a vector, or expression evaluating to a vector (apart from use of :) as an index. For example, x[inds] is not allowed. Often one can write a nimbleFunction to achieve the desired result, such as definining a nimbleFunction that takes x, start[i], and end[i] or takes x and inds as arguments and does the subsetting (possibly in combination with some other calculation). Note that x[L:U] and x[inds] are allowed in nimbleFunction code.

Generally NIMBLE supports R-like linear algebra expressions and attempts to follow the same rules as R about dimensions (although in some cases this is not possible). For example, x[1:3] %*% y[1:3] converts x[1:3] into a row vector and thus computes the inner product, which is returned as a \(1 \times 1\) matrix (use inprod to get it as a scalar, which it typically easier). Like in R, a scalar index will result in dropping a dimension unless the argument drop=FALSE is provided. For example, mymatrix[i, 1:3] will be a vector of length 3, but mymatrix[i, 1:3, drop=FALSE] will be a \(1 \times 3\) matrix. More about indexing and dimensions is discussed in Section 11.3.2.7.

5.2.3 Vectorized versus scalar declarations

Suppose you need nodes logY[i] that should be the log of the corresponding Y[i], say for i from 1 to 10. Conventionally this would be created with a for loop:

{
    for(i in 1:10) {
        logY[i] <- log(Y[i])
    }
}

Since NIMBLE supports R-like algebraic expressions, an alternative in NIMBLE’s dialect of BUGS is to use a vectorized declaration like this:

{
    logY[1:10] <- log(Y[1:10])
}

There is an important difference between the models that are created by the above two methods. The first creates 10 scalar nodes, logY[1] \(,\ldots,\) logY[10]. The second creates one vector node, logY[1:10]. If each logY[i] is used separately by an algorithm, it may be more efficient computationally if they are declared as scalars. If they are all used together, it will often make sense to declare them as a vector.

5.2.4 Available distributions

5.2.4.1 Distributions

NIMBLE supports most of the distributions allowed in BUGS and JAGS. Table 5.1 lists the distributions that are currently supported, with their default parameterizations, which match those of BUGS15. NIMBLE also allows one to use alternative parameterizations for a variety of distributions as described next. See Section 12.2 to learn how to write new distributions using nimbleFunctions.

Table 5.1: Distributions with their default order of parameters. The value of the random variable is given by \(x\).
Name Usage Density Lower Upper
Bernoulli dbern(prob = p) \(p^x (1 - p)^{1 -x}\) \(0\) \(1\)
\(0 < p < 1\)
Beta dbeta(shape1 = a, \(\frac{x^{a-1}(1-x)^{b-1}}{\beta(a,b)}\) \(0\) \(1\)
shape2 = b), \(a > 0\), \(b > 0\)
Binomial dbin(prob = p, size = n) \({n \choose x} p^x (1-p)^{n-x}\) \(0\) \(n\)
\(0 < p < 1\), \(n \in \mathbb{N}^*\)
CAR dcar_normal(adj, weights, see chapter 9 for details
(intrinsic) num, tau, c, zero_mean)
CAR dcar_proper(mu, C, adj, see chapter 9 for details
(proper) num, M, tau, gamma)
Categorical dcat(prob = p) \(\frac{p_x}{\sum_i p_i}\) \(1\) \(N\)
\(p \in (\mathbb{R}^+)^N\)
Chi-square dchisq(df = k), \(k > 0\) \(\frac{x^{\frac{k}{2}-1} \exp(-x/2)}{2^{\frac{k}{2}} \Gamma(\frac{k}{2})}\) \(0\)
Double ddexp(location = \(\mu\), \(\frac{\tau}{2} \exp(-\tau |x - \mu|)\)
exponential rate = \(\tau\)),\(\tau > 0\)
Dirichlet ddirch(alpha = \(\alpha\)) \(\Gamma(\sum_j \alpha_j) \prod_j \frac{ x_j^{\alpha_j - 1}}{ \Gamma(\alpha_j)}\) \(0\)
\(\alpha_j \geq 0\)
Exponential dexp(rate = \(\lambda\)), \(\lambda > 0\) \(\lambda \exp(-\lambda x)\) \(0\)
Flat dflat() \(\propto 1\) (improper)
Gamma dgamma(shape = r, rate = \(\lambda\)) \(\frac{ \lambda^r x^{r - 1} \exp(-\lambda x)}{ \Gamma(r)}\) \(0\)
\(\lambda > 0\), \(r > 0\)
Half flat dhalfflat() \(\propto 1\) (improper) \(0\)
Inverse dinvgamma(shape = r, scale = \(\lambda\)) \(\frac{ \lambda^r x^{-(r + 1)} \exp(-\lambda / x)}{ \Gamma(r)}\) \(0\)
gamma \(\lambda > 0\), \(r > 0\)
LKJ Correl’n dlkj_corr_cholesky(eta = \(\eta\), \(\prod_{i=2}^{p} x_{ii}^{p - i + 2\eta -2}\)
Cholesky size = p), \(\eta > 0\)
Logistic dlogis(location = \(\mu\), \(\frac{ \tau \exp\{(x - \mu) \tau\}}{\left[1 + \exp\{(x - \mu) \tau\}\right]^2}\)
rate = \(\tau\)),\(\tau > 0\)
Log-normal dlnorm(meanlog = \(\mu\), \(\left(\frac{\tau}{2\pi}\right)^{\frac{1}{2}} x^{-1} \exp \left\{-\tau (\log(x) - \mu)^2 / 2 \right\}\) \(0\)
taulog = \(\tau\)), \(\tau > 0\)
Multinomial dmulti(prob = p, size = n) \(n! \prod_j \frac{ p_j^{x_j}}{ x_j!}\)
\(\sum_j x_j = n\)
Multivariate dmnorm(mean = \(\mu\), prec = \(\Lambda\)) \((2\pi)^{-\frac{d}{2}}|\Lambda|^{\frac{1}{2}} \exp\{-\frac{(x-\mu)^T \Lambda (x-\mu)}{2}\}\)
normal \(\Lambda\) positive definite
Multivariate dmvt(mu = \(\mu\), prec = \(\Lambda\), \(\frac{\Gamma(\frac{\nu+d}{2})}{\Gamma(\frac{\nu}{2})(\nu\pi)^{d/2}}|\Lambda|^{1/2}(1+\frac{(x-\mu)^T\Lambda(x-\mu)}{\nu})^{-\frac{\nu+d}{2}}\)
Student t df = \(\nu\)), \(\Lambda\) positive def.
Negative dnegbin(prob = p, size = r) \({x + r -1 \choose x} p^r (1-p)^x\) \(0\)
binomial \(0 < p \leq 1\), \(r \geq 0\)
Normal dnorm(mean = \(\mu\), tau = \(\tau\)) \(\left(\frac{\tau}{2\pi}\right)^{\frac{1}{2}} \exp\{- \tau (x - \mu)^2 / 2\}\)
\(\tau > 0\)
Poisson dpois(lambda = \(\lambda\)) \(\frac{ \exp(-\lambda)\lambda^x}{ x!}\) \(0\)
\(\lambda > 0\)
Student t dt(mu = \(\mu\), tau = \(\tau\), df = k) \(\frac{\Gamma(\frac{k+1}{2})}{\Gamma(\frac{k}{2})}\left(\frac{\tau}{k\pi}\right)^{\frac{1}{2}}\left\{1+\frac{\tau(x-\mu)^2}{k} \right\}^{-\frac{(k+1)}{2}}\)
\(\tau > 0\), \(k > 0\)
Uniform dunif(min = a, max = b) \(\frac{ 1}{ b - a}\) \(a\) \(b\)
\(a < b\)
Weibull dweib(shape = v, lambda = \(\lambda\)) \(v \lambda x^{v - 1} \exp (- \lambda x^v)\) \(0\)
\(v > 0\), \(\lambda > 0\)
Wishart dwish(R = R, df = k) \(\frac{ |x|^{(k-p-1)/2} |R|^{k/2} \exp\{-\text{tr}(Rx)/2\}}{ 2^{pk/2} \pi^{p(p-1)/4} \prod_{i=1}^{p} \Gamma((k+1-i)/2)}\)
R \(p \times p\) pos. def., \(k \geq p\)
Inverse dinvwish(S = S, df = k) \(\frac{ |x|^{-(k+p+1)/2} |S|^{k/2} \exp\{-\text{tr}(Sx^{-1})/2\}}{ 2^{pk/2} \pi^{p(p-1)/4} \prod_{i=1}^{p} \Gamma((k+1-i)/2)}\)
Wishart S \(p \times p\) pos. def., \(k \geq p\)
5.2.4.1.1 Improper distributions

Note that dcar_normal, dflat and dhalfflat specify improper prior distributions and should only be used when the posterior distribution of the model is known to be proper. Also for these distributions, the density function returns the unnormalized density and the simulation function returns NaN so these distributions are not appropriate for algorithms that need to simulate from the prior or require proper (normalized) densities.

5.2.4.1.2 LKJ distribution for correlation matrices

NIMBLE provides the LKJ distribution via dlkj_corr_cholesky for correlation matrices, discussed in Section 24 of Stan Development Team (2021a) and based on Lewandowski, Kurowicka, and Joe (2009). This distribution has various advantages (both from a modeling perspective and an MCMC fitting perspective) over other prior distributions for correlation or covariance matrices such as the inverse Wishart distribution.

For computational efficiency, the distribution is on the Cholesky factor of the correlation matrix rather than the correlation matrix itself. As a result using this distribution in a model involves a bit of care, including efficiently creating a covariance matrix from the correlation matrix and a set of standard deviations.

Here is some example code illustrating how to use the distribution in a model.

code <- nimbleCode({
    Ustar[1:p,1:p] ~ dlkj_corr_cholesky(1.3, p)
    U[1:p,1:p] <- uppertri_mult_diag(Ustar[1:p, 1:p], sds[1:p])
    for(i in 1:n)  
        y[i, 1:p] ~ dmnorm(mu[1:p], cholesky = U[1:p, 1:p], prec_param = 0)
})

Note that we need to take the upper triangular Cholesky factor (Ustar) of the correlation matrix and convert it to the upper triangular Cholesky factor (U) of a covariance matrix, based on a vector of standard deviations (sds), and use that in the parameterization of dmnorm that directly (and therefore efficiently) uses the Cholesky of the covariance.

Other approaches are possible, but unless care is taken they are likely to be less computationally efficient than the template above.

The template above uses a user-defined nimbleFunction to efficiently combine the Cholesky of the correlation matrix with a vector of standard deviations to produce the Cholesky of the covariance matrix. That function is given here:

uppertri_mult_diag <- nimbleFunction(
    run = function(mat = double(2), vec = double(1)) {
        returnType(double(2))
        p <- length(vec)
        out <- matrix(nrow = p, ncol = p, init = FALSE)
        for(i in 1:p)
            out[ , i] <- mat[ , i] * vec[i]
        return(out)
    })

Note that NIMBLE ignores the lower triangle of the Cholesky factor of the correlation matrix; the values are taken to be zero. It also ignores the [1,1] element, which is taken to be one. Any initial values for these elements will be ignored and simply repeated in the MCMC samples.

Assuming the [1,1] element is one and the lower triangle is filled with zeroes in the MCMC samples, one can reconstruct a sample, here for the sth sample, of the correlation matrix as follows using the matrix of MCMC samples:

cols <- grep("^Ustar",colnames(samples))
Ustar_sample <- matrix(samples[s, cols], p, p)
corrMatrix_sample <- crossprod(Ustar_sample)  ## i.e., t(Ustar_sample) %*% Ustar_sample

5.2.4.2 Alternative parameterizations for distributions

NIMBLE allows one to specify distributions in model code using a variety of parameterizations, including the BUGS parameterizations. Available parameterizations are listed in Table 5.2. To understand how NIMBLE handles alternative parameterizations, it is useful to distinguish three cases, using the gamma distribution as an example:

  1. A canonical parameterization is used directly for computations16. For gamma, this is (shape, scale).
  2. The BUGS parameterization is the one defined in the original BUGS language. In general this is the parameterization for which conjugate MCMC samplers can be executed most efficiently. For dgamma, this is (shape, rate).
  3. An alternative parameterization is one that must be converted into the canonical parameterization. For dgamma, NIMBLE provides both (shape, rate) and (mean, sd) parameterization and creates nodes to calculate (shape, scale) from either (shape, rate) or (mean, sd). In the case of dgamma, the BUGS parameterization is also an alternative parameterization.

Since NIMBLE provides compatibility with existing BUGS and JAGS code, the order of parameters places the BUGS parameterization first. For example, the order of parameters for dgamma is dgamma(shape, rate, scale, mean, sd). Like R, if parameter names are not given, they are taken in order, so that (shape, rate) is the default. This happens to match R’s order of parameters, but it need not. If names are given, they can be given in any order. NIMBLE knows that rate is an alternative to scale and that (mean, sd) are an alternative to (shape, scale or rate).

Table 5.2: Distribution parameterizations allowed in NIMBLE. The first column indicates the supported parameterizations for distributions given in Table 5.1. The second column indicates the relationship to the canonical parameterization used in NIMBLE.
Parameterization NIMBLE re-parameterization
dbern(prob) dbin(size = 1, prob)
dbeta(shape1, shape2) canonical
dbeta(mean, sd) dbeta(shape1 = mean^2 * (1-mean) / sd^2 - mean,
shape2 = mean * (1 - mean)^2 / sd^2 + mean - 1)
dbin(prob, size) canonical
dcat(prob) canonical
dchisq(df) canonical
ddexp(location, scale) canonical
ddexp(location, rate) ddexp(location, scale = 1 / rate)
ddexp(location, var) ddexp(location, scale = sqrt(var / 2))
ddirch(alpha) canonical
dexp(rate) canonical
dexp(scale) dexp(rate = 1/scale)
dgamma(shape, scale) canonical
dgamma(shape, rate) dgamma(shape, scale = 1 / rate)
dgamma(mean, sd) dgamma(shape = mean^2/sd^2, scale = sd^2/mean)
dinvgamma(shape, rate) canonical
dinvgamma(shape, scale) dgamma(shape, rate = 1 / scale)
dlkj_corr_cholesky(eta) canonical
dlogis(location, scale) canonical
dlogis(location, rate) dlogis(location, scale = 1 / rate)
dlnorm(meanlog, sdlog) canonical
dlnorm(meanlog, taulog) dlnorm(meanlog, sdlog = 1 / sqrt(taulog))
dlnorm(meanlog, varlog) dlnorm(meanlog, sdlog = sqrt(varlog))
dmulti(prob, size) canonical
dmnorm(mean, cholesky, canonical (precision)
...prec_param=1)
dmnorm(mean, cholesky, canonical (covariance)
...prec_param=0)
dmnorm(mean, prec) dmnorm(mean, cholesky = chol(prec), prec_param=1)
dmnorm(mean, cov) dmnorm(mean, cholesky = chol(cov), prec_param=0)
dmvt(mu, cholesky, df, canonical (precision/inverse scale)
...prec_param=1)
dmvt(mu, cholesky, df, canonical (scale)
...prec_param=0)
dmvt(mu, prec, df) dmvt(mu, cholesky = chol(prec), df, prec_param=1)
dmvt(mu, scale, df) dmvt(mu, cholesky = chol(scale), df, prec_param=0)
dlkj_corr_cholesky( canonical
...shape, size)
dnegbin(prob, size) canonical
dnorm(mean, sd) canonical
dnorm(mean, tau) dnorm(mean, sd = 1 / sqrt(var))
dnorm(mean, var) dnorm(mean, sd = sqrt(var))
dpois(lambda) canonical
dt(mu, sigma, df) canonical
dt(mu, tau, df) dt(mu, sigma = 1 / sqrt(tau), df)
dt(mu, sigma2, df) dt(mu, sigma = sqrt(sigma2), df)
dunif(min, max) canonical
dweib(shape, scale) canonical
dweib(shape, rate) dweib(shape, scale = 1 / rate)
dweib(shape, lambda) dweib(shape, scale = lambda^(- 1 / shape))
dwish(cholesky, df, canonical (scale)
...scale_param=1)
dwish(cholesky, df, canonical (inverse scale)
...scale_param=0)
dwish(R, df) dwish(cholesky = chol(R), df, scale_param = 0)
dwish(S, df) dwish(cholesky = chol(S), df, scale_param = 1)
dinvwish(cholesky, df, canonical (scale)
...scale_param=1)
dinvwish(cholesky, df, canonical (inverse scale)
...scale_param=0)
dinvwish(R, df) dinvwish(cholesky = chol(R), df, scale_param = 0)
dinvwish(S, df) dinvwish(cholesky = chol(S), df, scale_param = 1)

Note that for multivariate normal, multivariate t, Wishart, and Inverse Wishart, the canonical parameterization uses the Cholesky decomposition of one of the precision/inverse scale or covariance/scale matrix. For example, for the multivariate normal, if prec_param=TRUE, the cholesky argument is treated as the Cholesky decomposition of a precision matrix. Otherwise it is treated as the Cholesky decomposition of a covariance matrix.

In addition, NIMBLE supports alternative distribution names, known as aliases, as in JAGS, as specified in Table 5.3.

Table 5.3: Distributions with alternative names (aliases)
Distribution Canonical name Alias
Binomial dbin dbinom
Chi-square dchisq dchisqr
Double exponential ddexp dlaplace
Dirichlet ddirch ddirich
Multinomial dmulti dmultinom
Negative binomial dnegbin dnbinom
Weibull dweib dweibull
Wishart dwish dwishart

We plan to, but do not currently, include the following distributions as part of core NIMBLE: beta-binomial, Dirichlet-multinomial, F, Pareto, or forms of the multivariate t other than the standard one provided.

5.2.5 Available BUGS language functions

Tables 5.4-5.5 show the available operators and functions. Support for more general R expressions is covered in Chapter 11 about programming with nimbleFunctions.

For the most part NIMBLE supports the functions used in BUGS and JAGS, with exceptions indicated in the table. Additional functions provided by NIMBLE are also listed. Note that we provide distribution functions for use in calculations, namely the ‘p’, ‘q’, and ‘d’ functions. See Section 11.2.4 for details on the syntax for using distribution functions as functions in deterministic calculations, as only some parameterizations are allowed and the names of some distributions differ from those used to define stochastic nodes in a model.

Table 5.4: Functions operating on scalars, many of which can operate on each element (component-wise) of vectors and matrices. Status column indicates if the function is currently provided in NIMBLE. Vector input column indicates if the function can take a vector as an argument (i.e., if the function is vectorized).
Usage Description Comments Status Vector input
x | y, x & y logical OR (\(|\)) and AND(&) yes yes
!x logical not yes yes
x > y, x >= y greater than (and or equal to) yes yes
x < y, x <= y less than (and or equal to) yes yes
x != y, x == y (not) equals yes yes
x + y, x - y, x * y component-wise operators mix of scalar and vector yes yes
x / y component-wise division vector \(x\) and scalar \(y\) yes yes
x^y, pow(x, y) power \(x^y\); vector \(x\),scalar \(y\) yes yes
x %% y modulo (remainder) yes no
min(x1, x2), min. (max.) of two scalars yes See pmin,
max(x1, x2) pmax
exp(x) exponential yes yes
log(x) natural logarithm yes yes
sqrt(x) square root yes yes
abs(x) absolute value yes yes
step(x) step function at 0 0 if \(x<0\), 1 if \(x>=0\) yes yes
equals(x, y) equality of two scalars 1 if \(x==y\), 0 if \(x != y\) yes yes
cube(x) third power \(x^3\) yes yes
sin(x), cos(x), trigonometric functions yes yes
tan(x)
asin(x), acos(x), inverse trigonometric functions yes yes
atan(x)
asinh(x), acosh(x), inv. hyperbolic trig. functions yes yes
atanh(x)
logit(x) logit \(\log(x/(1-x))\) yes yes
ilogit(x), expit(x) inverse logit \(\exp(x)/(1 + \exp(x))\) yes yes
probit(x) probit (Gaussian quantile) \(\Phi^{-1}(x)\) yes yes
iprobit(x), phi(x) inverse probit (Gaussian CDF) \(\Phi(x)\) yes yes
cloglog(x) complementary log log \(\log(-\log(1-x))\) yes yes
icloglog(x) inverse complementary log log \(1 - \exp(-\exp(x))\) yes yes
ceiling(x) ceiling function \(\lceil(x)\rceil\) yes yes
floor(x) floor function \(\lfloor(x)\rfloor\) yes yes
round(x) round to integer yes yes
trunc(x) truncation to integer yes yes
lgamma(x), loggam(x) log gamma function \(\log \Gamma(x)\) yes yes
besselK(x, nu, modified bessel function returns vector even if yes yes
...expon.scaled) of the second kind \(x\) a matrix/array
log1p(x) log of 1 + x \(\log(1+x)\) yes yes
lfactorial(x), log factorial \(\log x!\) yes yes
logfact(x)
qDIST(x, PARAMS) “q” distribution functions canonical parameterization yes yes
pDIST(x, PARAMS) “p” distribution functions canonical parameterization yes yes
rDIST(x, PARAMS) “r” distribution functions canonical parameterization yes yes
dDIST(x, PARAMS) “d” distribution functions canonical parameterization yes yes
sort(x) no
rank(x, s) no
ranked(x, s) no
order(x) no
Table 5.5: Functions operating on vectors and matrices. Status column indicates if the function is currently provided in NIMBLE.
Usage Description Comments Status
inverse(x) matrix inverse \(x\) symmetric, positive def. yes
chol(x) matrix Cholesky factorization \(x\) symmetric, positive def. yes
returns upper triang. matrix
t(x) matrix transpose \(x^\top\) yes
x%*%y matrix multiply \(xy\); \(x\), \(y\) conformant yes
inprod(x, y) dot product \(x^\top y\); \(x\) and \(y\) vectors yes
solve(x,y) solve system of equations \(x^{-1} y\); \(y\) matrix or vector yes
forwardsolve(x, y) solve lower-triangular system of equations \(x^{-1} y\); \(x\) lower-triangular yes
backsolve(x, y) solve upper-triangular system of equations \(x^{-1} y\); \(x\) upper-triangular yes
logdet(x) log matrix determinant \(\log|x|\) yes
asRow(x) convert vector to 1-row matrix sometimes automatic yes
asCol(x) convert vector to 1-column matrix sometimes automatic yes
sum(x) sum of elements of x yes
mean(x) mean of elements of x yes
sd(x) standard deviation of elements of x yes
prod(x) product of elements of x yes
min(x), max(x) min. (max.) of elements of x yes
pmin(x, y), pmax(x,y) vector of mins (maxs) of elements of yes
x and y
interp.lin(x, v1, v2) linear interpolation no
eigen(x)$values matrix eigenvalues \(x\) symmetric yes
eigen(x)$vectors matrix eigenvectors \(x\) symmetric yes
svd(x)$d matrix singular values yes
svd(x)$u matrix left singular vectors yes
svd(x)$v matrix right singular vectors yes

See Section 12.1 to learn how to use nimbleFunctions to write new functions for use in BUGS code.

5.2.7 Truncation, censoring, and constraints

NIMBLE provides three ways to declare boundaries on the value of a variable, each for different situations. We introduce these and comment on their relationships to related features of JAGS and BUGS. The three methods are:

5.2.7.1 Truncation

Either of the following forms,

x ~ dnorm(0, sd = 10) T(0, a)
x ~ T(dnorm(0, sd = 10), 0, a)

declares that x follows a normal distribution between 0 and a (inclusive of 0 and a). Either boundary may be omitted or may be another node, such as a in this example. The first form is compatible with JAGS, but in NIMBLE it can only be used when reading code from a text file. When writing model code in R, the second version must be used.

Truncation means the possible values of x are limited a priori, hence the probability density of x must be normalized17. In this example it would be the normal probability density divided by its integral from 0 to a. Like JAGS, NIMBLE also provides I as a synonym for T to accommodate older BUGS code, but T is preferred because it disambiguates multiple usages of I in BUGS.

5.2.7.2 Censoring

Censoring refers to the situation where one datum gives the lower or upper bound on an unobserved random variable. This is common in survival analysis, when for an individual still surviving at the end of a study, the age of death is not known and hence is ‘censored’ (right-censoring). NIMBLE adopts JAGS syntax for censoring, as follows:

censored[i] ~ dinterval(t[i], c[i])
t[i] ~ dweib(r, mu[i])

In the case of right-censoring, censored[i] should be given as data with a value of 1 if t[i] is right-censored (t[i] > c[i]) and 0 if it is observed. The data vector for t should have NA (indicating missing data) for any censored t[i] entries. (As a result, these nodes will be sampled in an MCMC.) The vector for c should give the censoring times corresponding to censored entries and a value above the observed times for uncensored entries (e.g., Inf). The values for c can be provided via constants, inits, or data. We recommend providing via constants for values that are fixed and via inits for values that are to be estimated or might be changed for some reason.

Left-censored observations would be specified by setting censored[i] to 0 and t[i] to NA.

Important: The value of t[i] corresponding to each censored data point should be given a valid initial value (via inits) that is consistent with censored[i]. Otherwise, NIMBLE will initialize from the prior, which may not produce a valid initial value.

The dinterval is not really a distribution but rather a trick: in the above example when censored[i] = 1 it gives a ‘probability’ of 1 if t[i] > c[i] and 0 otherwise. This means that t[i] <= c[i] is treated as impossible. More generally than simple right- or left-censoring, censored[i] ~ dinterval(t[i], c[i, ]) is defined such that for a vector of increasing cutpoints, c[i, ], t[i] is enforced to fall within the censored[i]-th cutpoint interval. This is done by setting data censored[i] as follows:

censored[i] = 0 if t[i] <= c[i, 1]

censored[i] = m if c[i, m] < t[i] <= c[i, m+1] for \(1 \leq m \leq M\)

censored[i] = M if c[i, M] < t[i]

(The i index is provided only for consistency with the previous example.) The most common uses of dinterval will be for left- and right-censored data, in which case c[i,] will be a single value (and typically given as simply c[i]), and for interval-censored data, in which case c[i,] will be a vector of two values.

Nodes following a dinterval distribution should normally be set as data with known values. Otherwise, the node may be simulated during initialization in some algorithms (e.g., MCMC) and thereby establish a permanent, perhaps unintended, constraint.

Censoring differs from truncation because censoring an observation involves bounds on a random variable that could have taken any value, while in truncation we know a priori that a datum could not have occurred outside the truncation range.

5.2.7.3 Constraints and ordering

NIMBLE provides a more general way to enforce constraints using dconstraint(cond). For example, we could specify that the sum of mu1 and mu2 must be positive like this:

mu1 ~ dnorm(0, 1) 
mu2 ~ dnorm(0, 1) 
constraint_data ~ dconstraint( mu1 + mu2 > 0 )

with constraint_data set (as data) to 1. This is equivalent to a half-normal distribution on the half-plane \(\mu_1 + \mu_2 > 0\). Nodes following dconstraint should be provided as data for the same reason of avoiding unintended initialization described above for dinterval.

Important: the model should be initialized so that the constraints are satisfied. Otherwise, NIMBLE will initialize from the prior, which may not produce a valid initial value and may cause algorithms (particularly) MCMC to fail.

Formally, dconstraint(cond) is a probability distribution on \(\left\{ 0, 1 \right\}\) such that \(P(1) = 1\) if cond is TRUE and \(P(0) = 1\) if cond is FALSE.

Of course, in many cases, parameterizing the model so that the constraints are automatically respected may be a better strategy than using dconstraint. One should be cautious about constraints that would make it hard for an MCMC or optimization to move through the parameter space (such as equality constraints that involve two or more parameters). For such restrictive constraints, general purpose algorithms that are not tailored to the constraints may fail or be inefficient. If constraints are used, it will generally be wise to ensure the model is initialized with values that satisfy them.

5.2.7.3.1 Ordering

To specify an ordering of parameters, such as \(\alpha_1 <= \alpha_2 <= \alpha_3\) one can use dconstraint as follows:

constraint_data ~ dconstraint( alpha1 <= alpha2 & alpha2 <= alpha3 )

Note that unlike in BUGS, one cannot specify prior ordering using syntax such as

alpha[1] ~ dnorm(0, 1) I(, alpha[2])
alpha[2] ~ dnorm(0, 1) I(alpha[1], alpha[3])
alpha[3] ~ dnorm(0, 1) I(alpha[2], )

as this does not represent a directed acyclic graph.

Also note that specifying prior ordering using T(,) can result in possibly unexpected results. For example:

alpha1 ~ dnorm(0, 1)
alpha2 ~ dnorm(0, 1) T(alpha1, )
alpha3 ~ dnorm(0, 1) T(alpha2, )

will enforce alpha1 \(\le\) alpha2 \(\le\) alpha3, but it does not treat the three parameters symmetrically. Instead it puts a marginal prior on alpha1 that is standard normal and then constrains alpha2 and alpha3 to follow truncated normal distributions. This is not equivalent to a symmetric prior on the three alphas that assigns zero probability density when values are not in order.

NIMBLE does not support the JAGS sort syntax.

5.2.8 Model macros

A NIMBLE macro is a succinct syntax that expands to create the NIMBLE model code for part or all of a model.

The nimbleMacros package provides an initial set of macros, currently focused on linear modeling and include:

  • LM: specify a complete generalized linear model (GLM) or generalized linear mixed model (GLMM), using standard R and lme4 model formula syntax,
  • LINPRED: specify a linear predictor using a standard R formula,
  • FORLOOP: specify a for loop by providing code with index ranges.

These macros are discussed in detail in the vignette for the nimbleMacros package and in the documentation for each macro, e.g., help(LINPRED).

Here is a brief example of using the LM macro to set up a GLMM.

We’ll consider the Chick example from the lme4 package. We’ll set up a NIMBLE model analogous to this mixed effects model specified with lme4::lmer:

lmer(weight ~ Time + (1|Chick), data = ChickWeight)

with a linear effect of time and random intercepts for each chick.

Here’s the specification of the model code in nimble, using the LM macro:

library(nimbleMacros)

code <- nimbleCode({
  LM(weight[1:N] ~ Time + (1|Chick))
})

chickConst <- list(Time = ChickWeight$Time, Chick = ChickWeight$Chick,
                    N = nrow(ChickWeight))
chickData <- list(weight = ChickWeight$weight)

mod <- nimbleModel(code, constants = chickConst, data = chickData)

mod$getCode()
## {
##     "# LM"
##     "  ## FORLOOP"
##     for (i_1 in 1:N) {
##         weight[i_1] ~ dnorm(mu_[i_1], sd = sd_residual)
##     }
##     "  ## ----"
##     "  ## LINPRED"
##     "    ### nimbleMacros::FORLOOP"
##     for (i_2 in 1:N) {
##         mu_[i_2] <- beta_Intercept + beta_Time * Time[i_2] + 
##             beta_Chick[Chick[i_2]]
##     }
##     "    ### ----"
##     "    ### nimbleMacros::LINPRED_PRIORS"
##     beta_Intercept ~ dnorm(0, sd = 1000)
##     beta_Time ~ dnorm(0, sd = 1000)
##     sd_Chick ~ dunif(0, 100)
##     "      #### nimbleMacros::FORLOOP"
##     for (i_3 in 1:50) {
##         beta_Chick[i_3] ~ dnorm(0, sd = sd_Chick)
##     }
##     "      #### ----"
##     "    ### ----"
##     "  ## ----"
##     sd_residual ~ dunif(0, 100)
##     "# ----"
## }

We can see list of the names of the model variables created by macros in the model using the getMacroParameters() function:

mod$getMacroParameters()
## $LM
## $LM[[1]]
## [1] "mu_"         "sd_residual"
## 
## 
## $FORLOOP
## $FORLOOP[[1]]
## NULL
## 
## 
## $LINPRED
## $LINPRED[[1]]
## [1] "beta_Intercept" "beta_Time"      "beta_Chick"    
## 
## 
## $`nimbleMacros::FORLOOP`
## $`nimbleMacros::FORLOOP`[[1]]
## NULL
## 
## $`nimbleMacros::FORLOOP`[[2]]
## NULL
## 
## 
## $`nimbleMacros::LINPRED_PRIORS`
## $`nimbleMacros::LINPRED_PRIORS`[[1]]
## [1] "beta_Intercept" "beta_Time"      "sd_Chick"       "beta_Chick"

That specified the entire model with a macro. One can also use macros to create the code for components of a model. Here’s a brief example.

Here’s a complete linear regression model (in this example omitting the random effects) of chick weight as a function of time making use of LINPRED and FORLOOP (similar to the code generated by LM):

code <- nimbleCode({
  mu[1:N] <- LINPRED(~Time[1:N])

  weight[1:N] ~ FORLOOP(dnorm(mu[1:N], sd = sigma))
  sigma ~ dunif(0, 100)
})
mod <- nimbleModel(code, constants = chickConst, data = chickData)
mod$getCode()
## {
##     "# LINPRED"
##     "  ## nimbleMacros::FORLOOP"
##     for (i_1 in 1:N) {
##         mu[i_1] <- beta_Intercept + beta_Time * Time[i_1]
##     }
##     "  ## ----"
##     "  ## nimbleMacros::LINPRED_PRIORS"
##     beta_Intercept ~ dnorm(0, sd = 1000)
##     beta_Time ~ dnorm(0, sd = 1000)
##     "  ## ----"
##     "# ----"
##     "# FORLOOP"
##     for (i_2 in 1:N) {
##         weight[i_2] ~ dnorm(mu[i_2], sd = sigma)
##     }
##     "# ----"
##     sigma ~ dunif(0, 100)
## }

As discussed in Section 12.4, NIMBLE users can also create new macros to use in models.


  1. NIMBLE calls non-stochastic nodes ‘deterministic’, whereas BUGS calls them ‘logical’. NIMBLE uses ‘logical’ in the way R does, to refer to boolean (TRUE/FALSE) variables.↩︎

  2. But beware of the possibility of needing to set values for ‘lifted’ nodes created by NIMBLE.↩︎

  3. In some cases, one can write a nimbleFunction to achieve the desired result, such as replacing sum(mu[start[i]:end[i]]) with a nimbleFunction that takes mu, start[i], and end[i] as arguments and does the needed summation.↩︎

  4. Technically, a directed acyclic graph↩︎

  5. In nimbleFunctions, as explained in later chapters, square brackets with blank indices are not necessary for multivariate objects.↩︎

  6. Note that the same distributions are available for writing nimbleFunctions, but in that case the default parameterizations and function names match R’s when possible. Please see Section 11.2.4 for how to use distributions in nimbleFunctions.↩︎

  7. Usually this is the parameterization in the Rmath header of R’s C implementation of distributions.↩︎

  8. NIMBLE uses the CDF and inverse CDF (quantile) functions of a distribution to do this; in some cases if one uses truncation to include only the extreme tail of a distribution, numerical difficulties can arise.↩︎