Chapter 17 Example: maximum likelihood estimation using optim
with gradients from nimDerivs
.
In this example, we will obtain maximum likelihood estimates of a Poisson GLM. It will be similar to the GLMM above, but without the random effects. Maximization of a likelihood function can be done with methods that don’t require gradients, but it is much faster to use methods that do use gradients.
To illustrate the parameter transformation system, we will give p
a uniform prior, constrained to values between 0 and 1. The maximum likelihood estimation below will not use the prior probability, but nevertheless the prior will be interpreted for its constraints on valid values in this case. This need for parameter transformation is somewhat artifical in the interest of simplicity.
model_code <- nimbleCode({
# priors (will be ignored for MLE)
p ~ dunif(0, 1)
log_p <- log(p)
beta ~ dnorm(0, sd = 100)
# random effects and data
for(i in 1:50) {
# data
y[i] ~ dpois(exp(log_p + beta*X[i]))
}
})
set.seed(123)
X <- rnorm(50) # Simulate values for X
model <- nimbleModel(model_code, constants = list(X = X), calculate = FALSE,
buildDerivs = TRUE) # Create nimble model object
As in the first example above, we will simulate data using the model itself. One can instead provide data as an argument to nimbleModel
.
model$p <- 0.5 # Assign parameter values and simulate data (y).
model$beta <- 0.2
model$calculate() # We expect this to be NA because there are no y values yet.
## [1] NA
model$simulate(model$getDependencies(c('beta', 'p'), self = FALSE)) # Simulate y values.
model$calculate() # Now we expect a valid result.
## [1] -53.33015
model$setData('y') # Now the model has y marked as data, with values from simulation.
Cmodel <- compileNimble(model) # Make compiled version.
Now that the example model is set up, we’ll write a nimbleFunction to provide the negative log likelihood and its gradient using an unconstrained parameter space. The default behavior of optim
below is to do a minimization problem, so returning the negative log likelihood means that optim
will find the maximum likelihood parameters.
logLikelihood_nf <- nimbleFunction(
setup = function(model, paramNodes) {
# Determine nodes for calculating the log likelihood for parameters given by
# paramNodes, ignoring any priors.
calcNodes <- model$getDependencies(paramNodes, self = FALSE)
# Set up the additional arguments for nimDerivs involving model$calculate
derivsInfo <- makeModelDerivsInfo(model, paramNodes, calcNodes)
updateNodes <- derivsInfo$updateNodes
constantNodes <- derivsInfo$constantNodes
# Create a parameter transformation between original and unconstrained
# parameter spaces.
transformer <- parameterTransform(model, paramNodes)
},
methods = list(
neg_logLikelihood_p = function(p = double(1)) {
# Put values in model and calculate negative log likelihood.
values(model, paramNodes) <<- p
return(-model$calculate(calcNodes))
returnType(double())
},
neg_logLikelihood = function(ptrans = double(1)) {
# Objective function for optim,
# using transformed parameter space.
p <- transformer$inverseTransform(ptrans)
return(neg_logLikelihood_p(p))
returnType(double())
},
gr_neg_logLikelihood = function(ptrans = double(1)) {
# Gradient of neg log likelihood
p <- transformer$inverseTransform(ptrans)
d <- derivs(neg_logLikelihood_p(p), wrt = 1:length(p), order = 1,
model = model, updateNodes = updateNodes,
constantNodes = constantNodes)
return(d$jacobian[1,])
returnType(double(1))
},
transform = function(p = double(1)) {
# Give user access to the transformation ...
return(transformer$transform(p))
returnType(double(1))
},
inverse = function(ptrans = double(1)) { # ... and its inverse.
return(transformer$inverseTransform(ptrans))
returnType(double(1))
}
),
buildDerivs = 'neg_logLikelihood_p'
)
Sometimes you might want a call to transformer$transform
or transformer$inverseTransform
to be included in a derivative calculation. In this case, we don’t.
Next we build and compile an instance of logLikelihood_nf
specialized (by the `setup`` function) to our model and its parameters.
Now let’s see how the parameter transformation works for this model. For beta
, it already has an unconstrained range of valid values, so it has no transformation (i.e. it has an indentity transformation). For p
, it is constrained beween 0 and 1, so it will be logit-transformed.
## [1] -1.386294 1.000000
# The inverse transformation goes back to the original parameters, within rounding:
Cll_glm$inverse(c(-1.386294, 1))
## [1] 0.2000001 1.0000000
Now we are ready to use the methods provided in Cll_glm
in a call to R’s optim
to find the MLE.
MLE <- optim(c(0,0), # initial values
fn = Cll_glm$neg_logLikelihood, # function to be minimized
gr = Cll_glm$gr_neg_logLikelihood, # gradient of function to be minimized
method = "BFGS") # optimization method to use
MLE$par
## [1] 0.08257249 0.25577134
## [1] -47.73243
Those outputs show the MLE in the parameters logit(p)
, and beta
, following by the maximum log likelihood value.
Now let’s confirm that it’s working correctly by comparing to results from glm
.
## (Intercept) X
## -0.6527680 0.2557751
## 'log Lik.' -47.73243 (df=2)
We see that the coefficient for X
matches beta
(within numerical tolerance), and the maximum log likelihoods match. To understand the intercept, we need to be careful about the parameter transformation used in nimble. The inverse transformation will give p
, and the log of that should match the intercept estimated by glm.
## [1] 0.5206314 0.2557713
## [1] -0.652713
It looks like it worked.