Reversible Jump MCMC (RJMCMC) is a method for sampling across models with a different numbers of dimensions, which finds natural application in Bayesian variable selection problems. In this setting, RJMCMC requires a proposal distribution for the coefficient of a variable when the variable is proposed to be included in the model. NIMBLE uses a univariate normal proposal distribution.

There are two ways to use RJMCMC variable selection in your model. If you know the prior probability for inclusion of a variable in the model, you can use that directly in RJMCMC without modifying your model. If you need the prior probability for inclusion in the model to be a model node itself, such as if it will have a prior and be estimated, you will need to write the model with indicator variables. The latter is more flexible and is illustrated in the example below.

More information can be found in the user manual and via `help(configureRJ)`

.

For simplicity we consider a linear regression example with 15 explanatory variables, of which five have real effects. The other 10 have no effect, i.e.Â the corresponding coefficient is equal to zero.

`library(nimble, warn.conflicts = FALSE)`

```
## nimble version 0.9.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit http://R-nimble.org.
```

```
lmCode <- nimbleCode({
sigma ~ dunif(0, 20)
psi ~ dunif(0,1) ## prior on inclusion probability
for(i in 1:numVars) {
z[i] ~ dbern(psi) ## indicator variable for each coefficient
beta[i] ~ dnorm(0, sd = 100)
zbeta[i] <- z[i] * beta[i] ## indicator * beta
}
for(i in 1:n) {
pred.y[i] <- inprod(X[i, 1:numVars], zbeta[1:numVars])
y[i] ~ dnorm(pred.y[i], sd = sigma)
}
})
## Generate data and define model constants
set.seed(1)
numVars = 15
n = 100
## Simulate explanatory variables
X <- matrix(rnorm(n*numVars), nrow = n, ncol = numVars)
lmConstants <- list(numVars = numVars, n = n, X = X)
lmModel <- nimbleModel(lmCode, constants = lmConstants)
```

`## defining model...`

`## Adding X as data for building model.`

`## building model...`

`## setting data and initial values...`

```
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ...
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
```

We can simulate data using the `nimbleModel`

we just created by initializing the model parameters and calling the `simulate()`

method.

```
## Set the first give coefficients to be non-zero and the remaining
## ones to be zero.
true_betas <- c(c(0.1, 0.2, 0.3, 0.4, 0.5),
rep(0, 10))
lmModel$beta <- true_betas
lmModel$sigma <- 1
lmModel$z <- rep(1, 15)
lmModel$psi <- 0.5
this_will_be_NA <- lmModel$calculate() ## Update nodes that depend on values we just set..
set.seed(0) ## Make this example reproducible
lmModel$simulate('y') ## Simulate y
lmModel$calculate() ## Update calculations
```

`## [1] -226.7382`

```
lmModel$setData('y') ## Mark the simulated y values as data
lmData = list(y = lmModel$y) ## Record the simulated y values for use below
```

The RJCMCM sampler can be added to the MCMC configuration by calling the function `configureRJ()`

. In this example, `z`

are indicator variables associated with the coefficients `beta`

. We pass them to `configureRJ`

using the arguments `indicatorNodes`

and `targetNodes`

. The `control`

argument allows us to specify the mean and the scale of the normal proposal distribution.

```
lmConf <- configureMCMC(lmModel)
lmConf$addMonitors('z')
```

`## thin = 1: sigma, psi, beta, z`

```
configureRJ(lmConf,
targetNodes = 'beta',
indicatorNodes = 'z',
control = list(mean = 0, scale = .2))
```

Checking the assigned samplers, we can see that indicator parameters are assigned to an `RJ_indicator`

sampler whose `targetNode`

is the corresponding coefficient, while `beta`

parameters are assigned to a `RJ_toggled`

sampler, which is used only when its target parameter is included in the model.

```
## Check the assigned samplers
lmConf$printSamplers(c("z[1]", "beta[1]"))
```

```
## [3] RJ_indicator sampler: z[1], mean: 0, scale: 0.2, targetNode: beta[1]
## [4] RJ_toggled sampler: beta[1], samplerType: conjugate_dnorm_dnorm
```

```
mcmcRJ <- buildMCMC(lmConf)
cLmModel <- compileNimble(lmModel)
```

`## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.`

`## compilation finished.`

`CMCMCRJ <- compileNimble(mcmcRJ, project = lmModel)`

```
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
```

```
set.seed(100)
system.time(samplesRJ <- runMCMC(CMCMCRJ, niter = 100000, nburnin = 10000, thin = 10))
```

`## runMCMC's handling of nburnin changed in nimble version 0.6-11. Previously, nburnin samples were discarded *post-thinning*. Now nburnin samples are discarded *pre-thinning*. The number of samples returned will be floor((niter-nburnin)/thin).`

`## running chain 1...`

```
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
```

```
## user system elapsed
## 27.410 0.000 27.411
```

For each variable we can look at the sampled values of the indicator and corresponding coefficient.

`plot(samplesRJ[,'z[5]'], pch = '.', main = "z[5] traceplot")`

`plot(samplesRJ[,'beta[5]'], pch = '.', main = "beta[5] traceplot")`

`psi`

Here is the posterior density of inclusion probability.

`plot(density(samplesRJ[,'psi']), main = "Inclusion probability", xlab = "Probability", ylab = "Density")`

Here are the marginal probabilities of inclusion for each variable.

```
zNames <- lmModel$expandNodeNames('z')
zCols <- which(colnames(samplesRJ) %in% zNames)
posterior_inclusion_prob <- colMeans(samplesRJ[,zCols])
plot(1:length(true_betas), posterior_inclusion_prob,
xaxt = 'none',
xlab = "beta", ylab = "Inclusion probability",
main = "Inclusion probabilities for each beta")
axis(1, seq(0,15,1))
```

Finally we can look at the posterior probabilities for every model that was explored by the MCMC, coded by the vector of indicator variables `z`

.

```
library(data.table)
binary <- as.data.table((samplesRJ[, zCols] != 0)+ 0)
res <- binary[,.N, by=names(binary)]
res <-res[order(N, decreasing = T)]
res <- res[, prob := N/dim(samplesRJ)[1]]
res[1:5, ]
```

```
## z[1] z[2] z[3] z[4] z[5] z[6] z[7] z[8] z[9] z[10] z[11] z[12] z[13]
## 1: 0 0 1 1 1 0 0 0 0 0 0 0 0
## 2: 0 0 0 1 1 0 0 0 0 0 0 0 0
## 3: 0 0 1 0 1 0 0 0 0 0 0 0 0
## 4: 0 0 0 0 1 0 0 0 0 0 0 0 0
## 5: 0 0 0 0 0 0 0 0 0 0 0 0 0
## z[14] z[15] N prob
## 1: 0 0 3305 0.36722222
## 2: 0 0 3044 0.33822222
## 3: 0 0 1222 0.13577778
## 4: 0 0 1130 0.12555556
## 5: 0 0 174 0.01933333
```