Four ways to set up linear predictors in a regression model

In NIMBLE one can specify linear predictors in a regression-type model in a variety of ways. We’ll illustrate with linear regression, but the ideas hold for other models with linear predictors.

Here’s a small simulated dataset we can use to illustrate coding of the model. (This is also used in our example of variable selection using reversible jump MCMC.)

set.seed(1)
p <- 15    # number of explanatory variables
n <- 100   # number of observations
X <- matrix(rnorm(p*n), nrow = n, ncol = p) # explanatory variables
true_betas <- c(c(0.1, 0.2, 0.3, 0.4, 0.5), rep(0, p-5)) # coefficients
sigma <- 1
y <- rnorm(n, X %*% true_betas, sigma)

Manual specification

With a small number of predictors (also know as covariates, independent variables, or explanatory variables), one can simply ‘manually’ add each covariate. Here we’ll just use the first two predictors.

library(nimble, warn.conflicts = FALSE)

## nimble version 1.3.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit https://R-nimble.org.
## 
## Note for advanced users who have written their own MCMC samplers:
##   As of version 0.13.0, NIMBLE's protocol for handling posterior
##   predictive nodes has changed in a way that could affect user-defined
##   samplers in some situations. Please see Section 15.5.1 of the User Manual.

code <- nimbleCode({
  beta0 ~ dnorm(0, sd = 100)
  beta1 ~ dnorm(0, sd = 100)
  beta2 ~ dnorm(0, sd = 100)
  sigma ~ dunif(0, 100)        # prior for variance components based on Gelman (2006)
  for(i in 1:n) {
    y[i] ~ dnorm(beta0 + beta1*x1[i] + beta2*x2[i], sd = sigma) # manual entry of linear predictors
  }
})

## extract data for two predictors and center for better MCMC performance
x1 <- X[,1] - mean(X[,1])
x2 <- X[,2] - mean(X[,2])

constants <- list(n = n, x1 = x1, x2 = x2)
data <- list(y = y)
inits <- list(beta0 = mean(y), beta1 = 0, beta2 = 0, sigma = 1)
model <- nimbleModel(code, constants = constants, data = data, inits = inits) # build model

## Defining model

## Building model

## Setting data and initial values

## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.

## Checking model sizes and dimensions

mcmcConf <- configureMCMC(model) # assign default samplers to nodes

## ===== Monitors =====
## thin = 1: beta0, beta1, beta2, sigma
## ===== Samplers =====
## RW sampler (1)
##   - sigma
## conjugate sampler (3)
##   - beta0
##   - beta1
##   - beta2

mcmcConf$printSamplers() # look at default sampler assignments

## [1] conjugate_dnorm_dnorm_additive sampler: beta0
## [2] conjugate_dnorm_dnorm_linear sampler: beta1
## [3] conjugate_dnorm_dnorm_linear sampler: beta2
## [4] RW sampler: sigma

Here by default NIMBLE has assigned univariate scalar samplers to each beta coefficient.

Using `inprod`

Alternatively, we can use a vectorized representation with inprod (inner product).

code <- nimbleCode({
  beta0 ~ dnorm(0, sd = 100)
  for(k in 1:p)
    beta[k] ~ dnorm(0, sd = 100)
  sigma ~ dunif(0, 100)  # prior for variance components based on Gelman (2006)
  for(i in 1:n) {
    y[i] ~ dnorm(beta0 + inprod(beta[1:p], x[i, 1:p]), sd = sigma)
  }
})

X <- sweep(X, 2, colMeans(X))  # center for better MCMC performance

constants <- list(n = n, p = p, x = X)
data <- list(y = y)
inits <- list(beta0 = mean(y), beta = rep(0, p), sigma = 0.5)
model <- nimbleModel(code, constants = constants, data = data, inits = inits)

## Defining model

## Building model

## Setting data and initial values

## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.

## Checking model sizes and dimensions

mcmcConf <- configureMCMC(model)

## ===== Monitors =====
## thin = 1: beta, beta0, sigma
## ===== Samplers =====
## RW sampler (1)
##   - sigma
## conjugate sampler (16)
##   - beta0
##   - beta[]  (15 elements)

mcmcConf$printSamplers() ## Look at sampler assignments in this case.

## [1]  conjugate_dnorm_dnorm_additive sampler: beta0
## [2]  conjugate_dnorm_dnorm_linear sampler: beta[1]
## [3]  conjugate_dnorm_dnorm_linear sampler: beta[2]
## [4]  conjugate_dnorm_dnorm_linear sampler: beta[3]
## [5]  conjugate_dnorm_dnorm_linear sampler: beta[4]
## [6]  conjugate_dnorm_dnorm_linear sampler: beta[5]
## [7]  conjugate_dnorm_dnorm_linear sampler: beta[6]
## [8]  conjugate_dnorm_dnorm_linear sampler: beta[7]
## [9]  conjugate_dnorm_dnorm_linear sampler: beta[8]
## [10] conjugate_dnorm_dnorm_linear sampler: beta[9]
## [11] conjugate_dnorm_dnorm_linear sampler: beta[10]
## [12] conjugate_dnorm_dnorm_linear sampler: beta[11]
## [13] conjugate_dnorm_dnorm_linear sampler: beta[12]
## [14] conjugate_dnorm_dnorm_linear sampler: beta[13]
## [15] conjugate_dnorm_dnorm_linear sampler: beta[14]
## [16] conjugate_dnorm_dnorm_linear sampler: beta[15]
## [17] RW sampler: sigma

Again, by default NIMBLE has assigned univariate scalar samplers to each beta coefficient, because each one is a scalar node in the model.

In some cases, it may make sense to sample the elements of beta jointly by modifying the MCMC configuration to use a block sampler. Alternatively if one places a multivariate prior on beta[1:p] the default sampler will be a block sampler, as shown next.

(Note that beta0 could be included in the beta vector if a corresponding column of ones is included in x.)

Using matrix algebra

Finally, we can use matrix algebra, but we need to be careful about matrix vs. scalar types, as the matrix-vector multiplication produces a (one-row, one-column) matrix. To use that as a scalar, we need to extract the [1,1] element.

NOTE: this approach now works cleanly in NIMBLE version 0.9.0.

code <- nimbleCode({
  beta0 ~ dnorm(0, sd = 100)
  beta[1:p] ~ dmnorm(zeros[1:p], omega[1:p, 1:p])
  sigma ~ dunif(0, 100)  # prior for variance components based on Gelman (2006)
  for(i in 1:n) {
    y[i] ~ dnorm(beta0 + (beta[1:p] %*% x[i, 1:p])[1,1], sd = sigma)
  }
})

constants <- list(n = n, p = p, x = X, zeros = rep(0, p), omega = 0.0001 * diag(p))
data <- list(y = y)
inits <- list(beta0 = mean(y), beta = rep(0, p), sigma = 0.5)
model <- nimbleModel(code, constants = constants, data = data, inits = inits)

## Defining model

## Building model

## Setting data and initial values

## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.

## Checking model sizes and dimensions

mcmcConf <- configureMCMC(model)

## ===== Monitors =====
## thin = 1: beta, beta0, sigma
## ===== Samplers =====
## RW_block sampler (1)
##   - beta[1:15] 
## RW sampler (1)
##   - sigma
## conjugate sampler (1)
##   - beta0

mcmcConf$printSamplers()

## [1] conjugate_dnorm_dnorm_additive sampler: beta0
## [2] RW sampler: sigma
## [3] RW_block sampler: beta[1:15]

Here by specifying a joint prior for beta[1:p], NIMBLE assigns a block sampler.

In some cases it may be more efficient to compute all n values of the linear predictor in a single vectorized step. This time we need to be careful about matrix vs. vector types, converting the matrix-valued result of the matrix multiplication into a vector.

code <- nimbleCode({
     beta0 ~ dnorm(0, sd = 100)
     beta[1:p] ~ dmnorm(zeros[1:p], omega[1:p, 1:p])
     sigma ~ dunif(0, 100)  # prior for variance components based on Gelman (2006)
     linpred[1:n] <- (x[1:n, 1:p] %*% beta[1:p])[1:n,1]
     for(i in 1:n) {
           y[i] ~ dnorm(beta0 + linpred[i], sd = sigma)
       }
})

constants <- list(n = n, p = p, x = X, zeros = rep(0, p), omega = 0.0001 * diag(p))
data <- list(y = y)
inits <- list(beta0 = mean(y), beta = rep(0, p), sigma = 0.5)
model <- nimbleModel(code, constants = constants, data = data, inits = inits)

## Defining model

## Building model

## Setting data and initial values

## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.

## Checking model sizes and dimensions

mcmcConf <- configureMCMC(model)

## ===== Monitors =====
## thin = 1: beta, beta0, sigma
## ===== Samplers =====
## RW_block sampler (1)
##   - beta[1:15] 
## RW sampler (1)
##   - sigma
## conjugate sampler (1)
##   - beta0

mcmcConf$printSamplers()

## [1] conjugate_dnorm_dnorm_additive sampler: beta0
## [2] RW sampler: sigma
## [3] RW_block sampler: beta[1:15]

Unfortunately at the moment, NIMBLE does not recognize conjugacy in this situation where the prior is dmnorm and the dependents are dnorm.

If n is very large (say 10s of thousands or more) one may be able to reduce the model building time and potentially the MCMC run time by setting up y[1:n] to follow a vectorized distribution by using a user-defined distribution. When done this way, there is a single vector node for y[1:n] in the NIMBLE model rather than n scalar nodes, one for each element y[i].

The distribution dnorm_vec below simply calculates the probability density of all y[i] elements together. The re-written model below is equivalent the models above.

dnorm_vec <- nimbleFunction( ## Define the distribution
    run = function(x = double(1), mean = double(1), sd = double(0), log = integer(0, default = 0)) {
        returnType(double(0))
        logProb <- sum(dnorm(x, mean, sd, log = TRUE))
        if(log) return(logProb)
        else return(exp(logProb)) 
    })

rnorm_vec <- nimbleFunction( ## Define a simulation function, optionally.
    run = function(n = integer(0), mean = double(1), sd = double(0)) {
        returnType(double(1))
        if(n != 1) print("rnorm_vec only allows n = 1; using n = 1.")
        smp <- rnorm(n, mean, sd)
        return(smp)
    })

code <- nimbleCode({
     beta0 ~ dnorm(0, sd = 100)
     beta[1:p] ~ dmnorm(zeros[1:p], omega[1:p, 1:p])
     sigma ~ dunif(0, 100)  # prior for variance components based on Gelman (2006)
     linpred[1:n] <- beta0 + x[1:n, 1:p] %*% beta[1:p]
     y[1:n] ~ dnorm_vec(linpred[1:n], sigma)
})

constants <- list(n = n, p = p, x = X, zeros = rep(0, p), omega = 0.0001 * diag(p))
data <- list(y = y)
inits <- list(beta0 = mean(y), beta = rep(0, p), sigma = 0.5)
model <- nimbleModel(code, constants = constants, data = data, inits = inits)

## Defining model

##   [Note] Registering 'dnorm_vec' as a distribution based on its use in BUGS code. If you make changes to the nimbleFunctions for the distribution, you must call 'deregisterDistributions' before using the distribution in BUGS code for those changes to take effect.

## Building model

## Setting data and initial values

## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.

## Checking model sizes and dimensions

mcmcConf <- configureMCMC(model)

## ===== Monitors =====
## thin = 1: beta, beta0, sigma
## ===== Samplers =====
## RW_block sampler (1)
##   - beta[1:15] 
## RW sampler (2)
##   - beta0
##   - sigma

mcmcConf$printSamplers() ## Look at sampler assignments

## [1] RW sampler: beta0
## [2] RW sampler: sigma
## [3] RW_block sampler: beta[1:15]

model$getNodeNames()     ## Look at node names in the re-written model

## [1] "beta0"                                                    
## [2] "sigma"                                                    
## [3] "lifted_chol_oPomega_oB1to15_comma_1to15_cB_cP[1:15, 1:15]"
## [4] "beta[1:15]"                                               
## [5] "linpred[1:100]"                                           
## [6] "y[1:100]"

Four ways to set up linear predictors in a regression model

Manual specification

Using inprod

Using matrix algebra

Using `inprod`