Chapter 12 Creating user-defined BUGS distributions and functions
NIMBLE allows you to define your own functions and distributions as nimbleFunctions for use in BUGS code. As a result, NIMBLE frees you from being constrained to the functions and distributions discussed in Chapter 5. For example, instead of setting up a Dirichlet prior with multinomial data and needing to use MCMC, one could recognize that this results in a Dirichlet-multinomial distribution for the data and provide that as a user-defined distribution instead.
Since NIMBLE allows you to wrap calls to external compiled code or arbitrary R functions as nimbleFunctions, and since you can define model functions and distributions as nimbleFunctions, you can combine these features to build external compiled code or arbitrary R functions into a model. See Sections 11.6-11.7.
Note that NIMBLE generally expects user-defined distributions or functions to be defined in the global environment. If you define them in a function (which would generally be the case if you are using them in the context of parallelization), one approach would be to assign them to the global environment in your function:
## for a user-defined function
assign('myfun', myfun, envir = .GlobalEnv)
## for a user-defined distribution
assign('dfoo', dfoo, envir = .GlobalEnv)
assign('rfoo', rfoo, envir = .GlobalEnv)
## similarly for 'p' and 'q' functions if you define them
12.1 User-defined functions
To provide a new function for use in BUGS code, simply create a nimbleFunction that has no setup code as discussed in Chapter 11. Then use it in your BUGS code. That’s it.
Writing nimbleFunctions requires that you declare the dimensionality of arguments and the returned object (Section 11.4). Make sure that the dimensionality specified in your nimbleFunction matches how you use it in BUGS code. For example, if you define scalar parameters in your BUGS code you will want to define nimbleFunctions that take scalar arguments. Here is an example that returns twice its input argument:
<- nimbleFunction(
timesTwo run = function(x = double(0)) {
returnType(double(0))
return(2*x)
})
<- nimbleCode({
code for(i in 1:3) {
~ dnorm(0, 1)
mu[i] <- timesTwo(mu[i])
mu_times_two[i]
} })
The x = double(0)
argument and returnType(double(0))
establish that the input and output will both be zero-dimensional
(scalar) numbers.
You can define nimbleFunctions that take inputs and outputs with more dimensions. Here is an example that takes a vector (1-dimensional) as input and returns a vector with twice the input values:
<- nimbleFunction(
vectorTimesTwo run = function(x = double(1)) {
returnType(double(1))
return(2*x)
}
)<- nimbleCode({
code for(i in 1:3) {
~ dnorm(0, 1)
mu[i]
}1:3] <- vectorTimesTwo(mu[1:3])
mu_times_two[ })
There is a subtle difference between the mu_times_two
variables in the two examples. In the first example, there are individual nodes for each mu_times_two[i]
. In the second example, there is a single multivariate node, mu_times_two[1:3]
. Each implementation could be more efficient for different needs. For example, suppose an algorithm modifies the value of mu[2]
and then updates nodes that depend on it. In the first example, mu_times_two[2]
would be updated. In the second example mu_times_two[1:3]
would be updated because it is a single, vector node.
At present in compiled use of a model, you cannot provide a scalar argument where the user-defined nimbleFunction expects a vector; unlike in R, scalars are not simply vectors of length 1.
12.2 User-defined distributions
To provide a user-defined distribution, you need to define density
(‘d’) and (optionally)
simulation (‘r’) nimbleFunctions
, without setup
code, for your distribution. In many cases you can then
simply use your distribution in BUGS code as you would any
distribution already provided by NIMBLE, while in a few special cases24
you need to explicitly register your distribution as described in Section 12.2.1.
You need to provide the simulation (‘r’) function if any algorithm used with a model that uses the distribution needs to simulate from the distribution. This is not the case for NIMBLE’s built-in MCMC sampler functions, and therefore the simulation function is not generally required for standard MCMC in NIMBLE. However, the ‘r’ function is necessary for initialization of nodes that are assigned the user-defined distribution if no initial value is provided, and for sampling posterior predictive nodes (those nodes with no downstream data dependencies) that are assigned the user-defined distribution.
You can optionally provide distribution (‘p’) and quantile (‘q’) functions, which will allow truncation to be applied to a user-defined distribution. You can also provide a list of alternative parameterizations, but only if you explicitly register the distribution.
Here is an extended example of providing a univariate exponential distribution (solely for illustration as this is already provided by NIMBLE) and a multivariate Dirichlet-multinomial distribution.
<- nimbleFunction(
dmyexp run = function(x = double(0), rate = double(0, default = 1),
log = integer(0, default = 0)) {
returnType(double(0))
<- log(rate) - x*rate
logProb if(log) return(logProb)
else return(exp(logProb))
})
<- nimbleFunction(
rmyexp run = function(n = integer(0), rate = double(0, default = 1)) {
returnType(double(0))
if(n != 1) print("rmyexp only allows n = 1; using n = 1.")
<- runif(1, 0, 1)
dev return(-log(1-dev) / rate)
})
<- nimbleFunction(
pmyexp run = function(q = double(0), rate = double(0, default = 1),
lower.tail = integer(0, default = 1),
log.p = integer(0, default = 0)) {
returnType(double(0))
if(!lower.tail) {
<- -rate * q
logp if(log.p) return(logp)
else return(exp(logp))
else {
} <- 1 - exp(-rate * q)
p if(!log.p) return(p)
else return(log(p))
}
})
<- nimbleFunction(
qmyexp run = function(p = double(0), rate = double(0, default = 1),
lower.tail = integer(0, default = 1),
log.p = integer(0, default = 0)) {
returnType(double(0))
if(log.p) p <- exp(p)
if(!lower.tail) p <- 1 - p
return(-log(1 - p) / rate)
})
<- nimbleFunction(
ddirchmulti run = function(x = double(1), alpha = double(1), size = double(0),
log = integer(0, default = 0)) {
returnType(double(0))
<- lgamma(size+1) - sum(lgamma(x+1)) + lgamma(sum(alpha)) -
logProb sum(lgamma(alpha)) + sum(lgamma(alpha + x)) -
lgamma(sum(alpha) + size)
if(log) return(logProb)
else return(exp(logProb))
})
<- nimbleFunction(
rdirchmulti run = function(n = integer(0), alpha = double(1), size = double(0)) {
returnType(double(1))
if(n != 1) print("rdirchmulti only allows n = 1; using n = 1.")
<- rdirch(1, alpha)
p return(rmulti(1, size = size, prob = p))
})
<- nimbleCode({
code 1:K] ~ ddirchmulti(alpha[1:K], n)
y[for(i in 1:K) {
~ dmyexp(1/3)
alpha[i]
}
})
<- nimbleModel(code, constants = list(K = 5, n = 10)) model
The distribution-related functions should take as input the parameters for a single parameterization, which will be the canonical parameterization that NIMBLE will use.
Here are more details on the requirements for distribution-related nimbleFunctions, which follow R’s conventions:
- Your distribution-related functions must have names that begin with ‘d’, ‘r’, ‘p’ and ‘q’. The name of the distribution must not be identical to any of the NIMBLE-provided distributions.
- All simulation (‘r’) functions must take
n
as their first argument. Note that you may simply have your function only handlen=1
and return an warning for other values ofn
. - NIMBLE uses doubles for numerical calculations, so we suggest simply using doubles in general, even for integer-valued parameters or values of random variables. In fact, non-scalars must be declared as doubles.
- All density functions must have as their last argument
log
and implement return of the log probability density. NIMBLE algorithms typically use onlylog = 1
, but we recommend you implement thelog = 0
case for completeness. - All distribution and quantile functions must have their last two arguments be (in order)
lower.tail
andlog.p
. These functions must work forlower.tail = 1
(i.e., TRUE) andlog.p = 0
(i.e., FALSE), as these are the inputs we use when working with truncated distributions. It is your choice whether you implement the necessary calculations for other combinations of these inputs, but again we recommend doing so for completeness. - Define the nimbleFunctions in R’s global environment. Don’t expect R’s standard scoping to work25.
12.2.1 Using registerDistributions for alternative parameterizations and providing other information
Behind the scenes, NIMBLE uses the function
registerDistributions
to set up new distributions for use in BUGS
code. In some circumstances, you will need to call
registerDistributions
directly to provide information that NIMBLE
can’t obtain automatically from the nimbleFunctions you write.
The cases in which you’ll need to explicitly call
registerDistributions
are when you want to do any of the following:
- provide alternative parameterizations,
- indicate a distribution is discrete, and
- provide the range of possible values for a distribution.
If you would like to allow for
multiple parameterizations, you can do this via the Rdist
element
of the list provided to registerDistributions
as illustrated below. If
you provide CDF (‘p’) and inverse CDF (quantile, i.e. ‘q’)
functions, be sure to specify pqAvail = TRUE
when you call
registerDistributions
. Here’s an example of using registerDistributions
to provide an alternative parameterization (scale instead of rate) and to provide the range
for the user-defined exponential distribution. We can then use
the alternative parameterization in our BUGS code.
registerDistributions(list(
dmyexp = list(
BUGSdist = "dmyexp(rate, scale)",
Rdist = "dmyexp(rate = 1/scale)",
altParams = c("scale = 1/rate", "mean = 1/rate"),
pqAvail = TRUE,
range = c(0, Inf)
)
))
<- nimbleCode({
code 1:K] ~ ddirchmulti(alpha[1:K], n)
y[for(i in 1:K) {
~ T(dmyexp(scale = 3), 0, 100)
alpha[i]
}
})
<- nimbleModel(code, constants = list(K = 5, n = 10),
model inits = list(alpha = rep(1, 5)))
There are a few rules for how you specify the information about a
distribution that you provide to registerDistributions
:
- The function name in the
BUGSdist
entry in the list provided toregisterDistributions
will be the name you can use in BUGS code. - The names of your nimbleFunctions must match the function name in the
Rdist
entry. If missing, theRdist
entry defaults to be the same as theBUGSdist
entry. - Your distribution-related functions must take as arguments the
parameters in default order, starting as the second argument and in the order used in the parameterizations in the
Rdist
argument toregisterDistributions
or theBUGSdist
argument if there are no alternative parameterizations. - You must specify a
types
entry in the list provided toregisterDistributions
if the distribution is multivariate or if any parameter is non-scalar.
Further details on using registerDistributions
can be found via R help on registerDistributions
. NIMBLE uses the same list format as registerDistributions
to define its distributions. This list can be found in the R/distributions_inputList.R
file in the package source code directory or as the R list nimble:::distributionsInputList
.
These include providing alternative parameterizations, specifying the range of the distribution, or specifying that the distribution is a discrete distribution.↩︎
NIMBLE can’t use R’s standard scoping because it doesn’t work for R reference classes, and nimbleFunctions are implemented as custom-generated reference classes.↩︎