# Chapter 12 Creating user-defined BUGS distributions and functions

NIMBLE allows you to define your own functions and distributions as nimbleFunctions for use in BUGS code. As a result, NIMBLE frees you from being constrained to the functions and distributions discussed in Chapter 5. For example, instead of setting up a Dirichlet prior with multinomial data and needing to use MCMC, one could recognize that this results in a Dirichlet-multinomial distribution for the data and provide that as a user-defined distribution instead.

Since NIMBLE allows you to wrap calls to external compiled code or arbitrary R functions as nimbleFunctions, and since you can define model functions and distributions as nimbleFunctions, you can combine these features to build external compiled code or arbitrary R functions into a model. See Sections 11.6-11.7.

Note that NIMBLE generally expects user-defined distributions or functions to be defined in the global environment. If you define them in a function (which would generally be the case if you are using them in the context of parallelization), one approach would be to assign them to the global environment in your function:

## for a user-defined function
assign('myfun', myfun, envir = .GlobalEnv)
## for a user-defined distribution
assign('dfoo', dfoo, envir = .GlobalEnv)
assign('rfoo', rfoo, envir = .GlobalEnv)
## similarly for 'p' and 'q' functions if you define them

## 12.1 User-defined functions

To provide a new function for use in BUGS code, simply create a nimbleFunction that has no setup code as discussed in Chapter 11. Then use it in your BUGS code. That’s it.

Writing nimbleFunctions requires that you declare the dimensionality of arguments and the returned object (Section 11.4). Make sure that the dimensionality specified in your nimbleFunction matches how you use it in BUGS code. For example, if you define scalar parameters in your BUGS code you will want to define nimbleFunctions that take scalar arguments. Here is an example that returns twice its input argument:

timesTwo <- nimbleFunction(
run = function(x = double(0)) {
returnType(double(0))
return(2*x)
})

code <- nimbleCode({
for(i in 1:3) {
mu[i] ~ dnorm(0, 1)
mu_times_two[i] <- timesTwo(mu[i])
}
})

The x = double(0) argument and returnType(double(0)) establish that the input and output will both be zero-dimensional (scalar) numbers.

You can define nimbleFunctions that take inputs and outputs with more dimensions. Here is an example that takes a vector (1-dimensional) as input and returns a vector with twice the input values:

vectorTimesTwo  <- nimbleFunction(
run = function(x = double(1)) {
returnType(double(1))
return(2*x)
}
)
code <- nimbleCode({
for(i in 1:3) {
mu[i] ~ dnorm(0, 1)
}
mu_times_two[1:3] <- vectorTimesTwo(mu[1:3])
})

There is a subtle difference between the mu_times_two variables in the two examples. In the first example, there are individual nodes for each mu_times_two[i]. In the second example, there is a single multivariate node, mu_times_two[1:3]. Each implementation could be more efficient for different needs. For example, suppose an algorithm modifies the value of mu[2] and then updates nodes that depend on it. In the first example, mu_times_two[2] would be updated. In the second example mu_times_two[1:3] would be updated because it is a single, vector node.

At present in compiled use of a model, you cannot provide a scalar argument where the user-defined nimbleFunction expects a vector; unlike in R, scalars are not simply vectors of length 1.

## 12.2 User-defined distributions

To provide a user-defined distribution, you need to define density (‘d’) and (optionally) simulation (‘r’) nimbleFunctions, without setup code, for your distribution. In many cases you can then simply use your distribution in BUGS code as you would any distribution already provided by NIMBLE, while in a few special cases23 you need to explicitly register your distribution as described in Section 12.2.1.

You need to provide the simulation (‘r’) function if any algorithm used with a model that uses the distribution needs to simulate from the distribution. This is not the case for NIMBLE’s built-in MCMC sampler functions, and therefore the simulation function is not generally required for standard MCMC in NIMBLE. However, the ‘r’ function is necessary for initialization of nodes that are assigned the user-defined distribution if no initial value is provided, and for sampling posterior predictive nodes (those nodes with no downstream data dependencies) that are assigned the user-defined distribution.

You can optionally provide distribution (‘p’) and quantile (‘q’) functions, which will allow truncation to be applied to a user-defined distribution. You can also provide a list of alternative parameterizations, but only if you explicitly register the distribution.

Here is an extended example of providing a univariate exponential distribution (solely for illustration as this is already provided by NIMBLE) and a multivariate Dirichlet-multinomial distribution.

dmyexp <- nimbleFunction(
run = function(x = double(0), rate = double(0, default = 1),
log = integer(0, default = 0)) {
returnType(double(0))
logProb <- log(rate) - x*rate
if(log) return(logProb)
else return(exp(logProb))
})

rmyexp <- nimbleFunction(
run = function(n = integer(0), rate = double(0, default = 1)) {
returnType(double(0))
if(n != 1) print("rmyexp only allows n = 1; using n = 1.")
dev <- runif(1, 0, 1)
return(-log(1-dev) / rate)
})

pmyexp <- nimbleFunction(
run = function(q = double(0), rate = double(0, default = 1),
lower.tail = integer(0, default = 1),
log.p = integer(0, default = 0)) {
returnType(double(0))
if(!lower.tail) {
logp <- -rate * q
if(log.p) return(logp)
else return(exp(logp))
} else {
p <- 1 - exp(-rate * q)
if(!log.p) return(p)
else return(log(p))
}
})

qmyexp <- nimbleFunction(
run = function(p = double(0), rate = double(0, default = 1),
lower.tail = integer(0, default = 1),
log.p = integer(0, default = 0)) {
returnType(double(0))
if(log.p) p <- exp(p)
if(!lower.tail) p <- 1 - p
return(-log(1 - p) / rate)
})

ddirchmulti <- nimbleFunction(
run = function(x = double(1), alpha = double(1), size = double(0),
log = integer(0, default = 0)) {
returnType(double(0))
logProb <- lgamma(size+1) - sum(lgamma(x+1)) + lgamma(sum(alpha)) -
sum(lgamma(alpha)) + sum(lgamma(alpha + x)) -
lgamma(sum(alpha) + size)
if(log) return(logProb)
else return(exp(logProb))
})

rdirchmulti <- nimbleFunction(
run = function(n = integer(0), alpha = double(1), size = double(0)) {
returnType(double(1))
if(n != 1) print("rdirchmulti only allows n = 1; using n = 1.")
p <- rdirch(1, alpha)
return(rmulti(1, size = size, prob = p))
})

code <- nimbleCode({
y[1:K] ~ ddirchmulti(alpha[1:K], n)
for(i in 1:K) {
alpha[i] ~ dmyexp(1/3)
}
})

model <- nimbleModel(code, constants = list(K = 5, n = 10))

The distribution-related functions should take as input the parameters for a single parameterization, which will be the canonical parameterization that NIMBLE will use.

Here are more details on the requirements for distribution-related nimbleFunctions, which follow R’s conventions:

• Your distribution-related functions must have names that begin with ‘d’, ‘r’, ‘p’ and ‘q’. The name of the distribution must not be identical to any of the NIMBLE-provided distributions.
• All simulation (‘r’) functions must take n as their first argument. Note that you may simply have your function only handle n=1 and return an warning for other values of n.
• NIMBLE uses doubles for numerical calculations, so we suggest simply using doubles in general, even for integer-valued parameters or values of random variables. In fact, non-scalars must be declared as doubles.
• All density functions must have as their last argument log and implement return of the log probability density. NIMBLE algorithms typically use only log = 1, but we recommend you implement the log = 0 case for completeness.
• All distribution and quantile functions must have their last two arguments be (in order) lower.tail and log.p. These functions must work for lower.tail = 1 (i.e., TRUE) and log.p = 0 (i.e., FALSE), as these are the inputs we use when working with truncated distributions. It is your choice whether you implement the necessary calculations for other combinations of these inputs, but again we recommend doing so for completeness.
• Define the nimbleFunctions in R’s global environment. Don’t expect R’s standard scoping to work24.

### 12.2.1 Using registerDistributions for alternative parameterizations and providing other information

Behind the scenes, NIMBLE uses the function registerDistributions to set up new distributions for use in BUGS code. In some circumstances, you will need to call registerDistributions directly to provide information that NIMBLE can’t obtain automatically from the nimbleFunctions you write.

The cases in which you’ll need to explicitly call registerDistributions are when you want to do any of the following:

• provide alternative parameterizations,
• indicate a distribution is discrete, and
• provide the range of possible values for a distribution.

If you would like to allow for multiple parameterizations, you can do this via the Rdist element of the list provided to registerDistributions as illustrated below. If you provide CDF (‘p’) and inverse CDF (quantile, i.e. ‘q’) functions, be sure to specify pqAvail = TRUE when you call registerDistributions. Here’s an example of using registerDistributions to provide an alternative parameterization (scale instead of rate) and to provide the range for the user-defined exponential distribution. We can then use the alternative parameterization in our BUGS code.

registerDistributions(list(
dmyexp = list(
BUGSdist = "dmyexp(rate, scale)",
Rdist = "dmyexp(rate = 1/scale)",
altParams = c("scale = 1/rate", "mean = 1/rate"),
pqAvail = TRUE,
range = c(0, Inf)
)
))

code <- nimbleCode({
y[1:K] ~ ddirchmulti(alpha[1:K], n)
for(i in 1:K) {
alpha[i] ~ T(dmyexp(scale = 3), 0, 100)
}
})

model <- nimbleModel(code, constants = list(K = 5, n = 10),
inits = list(alpha = rep(1, 5)))

There are a few rules for how you specify the information about a distribution that you provide to registerDistributions:

• The function name in the BUGSdist entry in the list provided to registerDistributions will be the name you can use in BUGS code.
• The names of your nimbleFunctions must match the function name in the Rdist entry. If missing, the Rdist entry defaults to be the same as the BUGSdist entry.
• Your distribution-related functions must take as arguments the parameters in default order, starting as the second argument and in the order used in the parameterizations in the Rdist argument to registerDistributions or the BUGSdist argument if there are no alternative parameterizations.
• You must specify a types entry in the list provided to registerDistributions if the distribution is multivariate or if any parameter is non-scalar.

Further details on using registerDistributions can be found via R help on registerDistributions. NIMBLE uses the same list format as registerDistributions to define its distributions. This list can be found in the R/distributions_inputList.R file in the package source code directory or as the R list nimble:::distributionsInputList.

1. These include providing alternative parameterizations, specifying the range of the distribution, or specifying that the distribution is a discrete distribution.

2. NIMBLE can’t use R’s standard scoping because it doesn’t work for R reference classes, and nimbleFunctions are implemented as custom-generated reference classes.