# Chapter 12 Creating user-defined BUGS distributions and functions

NIMBLE allows you to define your own functions and distributions as *nimbleFunction*s for use in BUGS code. As a result, NIMBLE frees you from being constrained to the functions and distributions discussed in Chapter 5. For example, instead of setting up a Dirichlet prior with multinomial data and needing to use MCMC, one could recognize that this results in a Dirichlet-multinomial distribution for the data and provide that as a user-defined distribution instead.

Since NIMBLE allows you to wrap calls to external compiled code or arbitrary R functions as nimbleFunctions, and since you can define model functions and distributions as nimbleFunctions, you can combine these features to build external compiled code or arbitrary R functions into a model. See Sections 11.6-11.7.

## 12.1 User-defined functions

To provide a new function for use in BUGS code, simply create a nimbleFunction that has no *setup* code as discussed in Chapter 11. Then use it in your BUGS code. That’s it.

Writing nimbleFunctions requires that you declare the dimensionality of arguments and the returned object (Section 11.4). Make sure that the dimensionality specified in your nimbleFunction matches how you use it in BUGS code. For example, if you define scalar parameters in your BUGS code you will want to define nimbleFunctions that take scalar arguments. Here is an example that returns twice its input argument:

```
timesTwo <- nimbleFunction(
run = function(x = double(0)) {
returnType(double(0))
return(2*x)
})
code <- nimbleCode({
for(i in 1:3) {
mu[i] ~ dnorm(0, 1)
mu_times_two[i] <- timesTwo(mu[i])
}
})
```

The `x = double(0)`

argument and `returnType(double(0))`

establish that the input and output will both be zero-dimensional (scalar) numbers.

You can define nimbleFunctions that take inputs and outputs with more dimensions. Here is an example that takes a vector (1-dimensional) as input and returns a vector with twice the input values:

```
vectorTimesTwo <- nimbleFunction(
run = function(x = double(1)) {
returnType(double(1))
return(2*x)
}
)
code <- nimbleCode({
for(i in 1:3) {
mu[i] ~ dnorm(0, 1)
}
mu_times_two[1:3] <- vectorTimesTwo(mu[1:3])
})
```

There is a subtle difference between the `mu_times_two`

variables in the two examples. In the first example, there are individual nodes for each `mu_times_two[i]`

. In the second example, there is a single multivariate node, `mu_times_two[1:3]`

. Each implementation could be more efficient for different needs. For example, suppose an algorithm modifies the value of `mu[2]`

and then updates nodes that depend on it. In the first example, `mu_times_two[2]`

would be updated. In the second example `mu_times_two[1:3]`

would be updated because it is a single, vector node.

At present in compiled use of a model, you cannot provide a scalar argument where the user-defined nimbleFunction expects a vector; unlike in R, scalars are not simply vectors of length 1.

## 12.2 User-defined distributions

To provide a user-defined distribution, you need to define density (‘d’) and simulation (‘r’) `nimbleFunctions`

, without setup code, for your distribution. In some cases you can then simply use your distribution in BUGS code as you would any distribution already provided by NIMBLE, while in others you need to explicitly register your distribution as described in Section 12.2.1.

You can optionally provide distribution (‘p’) and quantile (‘q’) functions, which will allow truncation to be applied to a user-defined distribution. You can also provide a list of alternative parameterizations, but only if you explicitly register the distribution.

Here is an extended example of providing a univariate exponential distribution (solely for illustration as this is already provided by NIMBLE) and a multivariate Dirichlet-multinomial distribution.

```
dmyexp <- nimbleFunction(
run = function(x = double(0), rate = double(0, default = 1),
log = integer(0, default = 0)) {
returnType(double(0))
logProb <- log(rate) - x*rate
if(log) return(logProb)
else return(exp(logProb))
})
rmyexp <- nimbleFunction(
run = function(n = integer(0), rate = double(0, default = 1)) {
returnType(double(0))
if(n != 1) print("rmyexp only allows n = 1; using n = 1.")
dev <- runif(1, 0, 1)
return(-log(1-dev) / rate)
})
pmyexp <- nimbleFunction(
run = function(q = double(0), rate = double(0, default = 1),
lower.tail = integer(0, default = 1),
log.p = integer(0, default = 0)) {
returnType(double(0))
if(!lower.tail) {
logp <- -rate * q
if(log.p) return(logp)
else return(exp(logp))
} else {
p <- 1 - exp(-rate * q)
if(!log.p) return(p)
else return(log(p))
}
})
qmyexp <- nimbleFunction(
run = function(p = double(0), rate = double(0, default = 1),
lower.tail = integer(0, default = 1),
log.p = integer(0, default = 0)) {
returnType(double(0))
if(log.p) p <- exp(p)
if(!lower.tail) p <- 1 - p
return(-log(1 - p) / rate)
})
ddirchmulti <- nimbleFunction(
run = function(x = double(1), alpha = double(1), size = double(0),
log = integer(0, default = 0)) {
returnType(double(0))
logProb <- lgamma(size) - sum(lgamma(x)) + lgamma(sum(alpha)) -
sum(lgamma(alpha)) + sum(lgamma(alpha + x)) - lgamma(sum(alpha) +
size)
if(log) return(logProb)
else return(exp(logProb))
})
rdirchmulti <- nimbleFunction(
run = function(n = integer(0), alpha = double(1), size = double(0)) {
returnType(double(1))
if(n != 1) print("rdirchmulti only allows n = 1; using n = 1.")
p <- rdirch(1, alpha)
return(rmulti(1, size = size, prob = p))
})
code <- nimbleCode({
y[1:K] ~ ddirchmulti(alpha[1:K], n)
for(i in 1:K) {
alpha[i] ~ dmyexp(1/3)
}
})
model <- nimbleModel(code, constants = list(K = 5, n = 10))
```

```
## Registering the following user-provided distributions: ddirchmulti .
## NIMBLE has registered ddirchmulti as a distribution based on its use in BUGS code. Note that if you make changes to the nimbleFunctions for the distribution, you must call 'deregisterDistributions' before using the distribution in BUGS code for those changes to take effect.
## Registering the following user-provided distributions: dmyexp .
## NIMBLE has registered dmyexp as a distribution based on its use in BUGS code. Note that if you make changes to the nimbleFunctions for the distribution, you must call 'deregisterDistributions' before using the distribution in BUGS code for those changes to take effect.
```

The distribution-related functions should take as input the parameters for a single parameterization, which will be the canonical parameterization that NIMBLE will use.

Here are more details on the requirements for distribution-related nimbleFunctions, which follow R’s conventions:

- Your distribution-related functions must have names that begin with ‘d’, ‘r’, ‘p’ and ‘q’. The name of the distribution must not be identical to any of the NIMBLE-provided distributions.
- All simulation (‘r’) functions must take
`n`

as their first argument. Note that you may simply have your function only handle`n=1`

and return an warning for other values of`n`

. - NIMBLE uses doubles for numerical calculations, so we suggest simply using doubles in general, even for integer-valued parameters or values of random variables.
- All density functions must have as their last argument
`log`

and implement return of the log probability density. NIMBLE algorithms typically use only`log = 1`

, but we recommend you implement the`log = 0`

case for completeness. - All distribution and quantile functions must have their last two arguments be (in order)
`lower.tail`

and`log.p`

. These functions must work for`lower.tail = 1`

(i.e., TRUE) and`log.p = 0`

(i.e., FALSE), as these are the inputs we use when working with truncated distributions. It is your choice whether you implement the necessary calculations for other combinations of these inputs, but again we recommend doing so for completeness. - Define the nimbleFunctions in R’s global environment. Don’t expect R’s standard scoping to work
^{22}.

### 12.2.1 Using *registerDistributions* for alternative parameterizations and providing other information

Behind the scenes, NIMBLE uses the function `registerDistributions`

to set up new distributions for use in BUGS code. In some circumstances, you will need to call `registerDistributions`

directly to provide information that NIMBLE can’t obtain automatically from the nimbleFunctions you write.

The cases in which you’ll need to explicitly call `registerDistributions`

are when you want to do any of the following:

- provide alternative parameterizations,
- indicate a distribution is discrete, and
- provide the range of possible values for a distribution.

If you would like to allow for multiple parameterizations, you can do this via the `Rdist`

element of the list provided to `registerDistributions`

as illustrated below. If you provide CDF (‘p’) and inverse CDF (quantile, i.e. ‘q’) functions, be sure to specify `pqAvail = TRUE`

when you call `registerDistributions`

. Here’s an example of using `registerDistributions`

to provide an alternative parameterization (scale instead of rate) and to provide the range for the user-defined exponential distribution. We can then use the alternative parameterization in our BUGS code.

```
registerDistributions(list(
dmyexp = list(
BUGSdist = "dmyexp(rate, scale)",
Rdist = "dmyexp(rate = 1/scale)",
altParams = c("scale = 1/rate", "mean = 1/rate"),
pqAvail = TRUE,
range = c(0, Inf)
)
))
```

```
## Registering the following user-provided distributions: dmyexp .
## Overwriting the following user-supplied distributions: dmyexp .
```

```
code <- nimbleCode({
y[1:K] ~ ddirchmulti(alpha[1:K], n)
for(i in 1:K) {
alpha[i] ~ T(dmyexp(scale = 3), 0, 100)
}
})
model <- nimbleModel(code, constants = list(K = 5, n = 10),
inits = list(alpha = rep(1, 5)))
```

There are a few rules for how you specify the information about a distribution that you provide to `registerDistributions`

:

- The function name in the
`BUGSdist`

entry in the list provided to`registerDistributions`

will be the name you can use in BUGS code. - The names of your nimbleFunctions must match the function name in the
`Rdist`

entry. If missing, the`Rdist`

entry defaults to be the same as the`BUGSdist`

entry. - Your distribution-related functions must take as arguments the parameters in default order, starting as the second argument and in the order used in the parameterizations in the
`Rdist`

argument to`registerDistributions`

or the`BUGSdist`

argument if there are no alternative parameterizations. - You must specify a
`types`

entry in the list provided to`registerDistributions`

if the distribution is multivariate or if any parameter is non-scalar.

Further details on using `registerDistributions`

can be found via R help on `registerDistributions`

. NIMBLE uses the same list format as `registerDistributions`

to define its distributions. This list can be found in the `R/distributions_inputList.R`

file in the package source code directory or as the R list `nimble:::distributionsInputList`

.

NIMBLE can’t use R’s standard scoping because it doesn’t work for R reference classes, and nimbleFunctions are implemented as custom-generated reference classes.↩