Chapter 17 Example: maximum likelihood estimation using `optim` with gradients from `nimDerivs`.

In this example, we will obtain maximum likelihood estimates of a Poisson GLM. It will be similar to the GLMM above, but without the random effects. Maximization of a likelihood function can be done with methods that don’t require gradients, but it is much faster to use methods that do use gradients.

To illustrate the parameter transformation system, we will give p a uniform prior, constrained to values between 0 and 1. The maximum likelihood estimation below will not use the prior probability, but nevertheless the prior will be interpreted for its constraints on valid values in this case. This need for parameter transformation is somewhat artifical in the interest of simplicity.

model_code <- nimbleCode({
  # priors (will be ignored for MLE)
  p ~ dunif(0, 1)
  log_p <- log(p)
  beta ~ dnorm(0, sd = 100)
  # random effects and data  
  for(i in 1:50) {
    # data
    y[i] ~ dpois(exp(log_p + beta*X[i]))
  }
})

set.seed(123)
X <- rnorm(50) # Simulate values for X

model <- nimbleModel(model_code, constants = list(X = X), calculate = FALSE,
                     buildDerivs = TRUE) # Create nimble model object

As in the first example above, we will simulate data using the model itself. One can instead provide data as an argument to nimbleModel.

model$p <- 0.5     # Assign parameter values and simulate data (y).
model$beta <- 0.2
model$calculate()  # We expect this to be NA because there are no y values yet.

## [1] NA

model$simulate(model$getDependencies(c('beta', 'p'), self = FALSE)) # Simulate y values.
model$calculate()  # Now we expect a valid result.

## [1] -53.33015

model$setData('y') # Now the model has y marked as data, with values from simulation.
Cmodel <- compileNimble(model) # Make compiled version.

Now that the example model is set up, we’ll write a nimbleFunction to provide the negative log likelihood and its gradient using an unconstrained parameter space. The default behavior of optim below is to do a minimization problem, so returning the negative log likelihood means that optim will find the maximum likelihood parameters.

logLikelihood_nf <- nimbleFunction(
  setup = function(model, paramNodes) {
    # Determine nodes for calculating the log likelihood for parameters given by
    # paramNodes, ignoring any priors.
    calcNodes <- model$getDependencies(paramNodes, self = FALSE)
    # Set up the additional arguments for nimDerivs involving model$calculate
    derivsInfo <- makeModelDerivsInfo(model, paramNodes, calcNodes)
    updateNodes <- derivsInfo$updateNodes
    constantNodes <- derivsInfo$constantNodes
    # Create a parameter transformation between original and unconstrained
    # parameter spaces.
    transformer <- parameterTransform(model, paramNodes)
  },
  methods = list(
    neg_logLikelihood_p = function(p = double(1)) {
      # Put values in model and calculate negative log likelihood.
      values(model, paramNodes) <<- p
      return(-model$calculate(calcNodes))
      returnType(double())
    },
    neg_logLikelihood = function(ptrans = double(1)) {
      # Objective function for optim,
      # using transformed parameter space.
      p <- transformer$inverseTransform(ptrans)
      return(neg_logLikelihood_p(p))
      returnType(double())
    },
    gr_neg_logLikelihood = function(ptrans = double(1)) {
      # Gradient of neg log likelihood
      p <- transformer$inverseTransform(ptrans)
      d <- derivs(neg_logLikelihood_p(p), wrt = 1:length(p), order = 1,
                  model = model, updateNodes = updateNodes,
                  constantNodes = constantNodes)
      return(d$jacobian[1,])
      returnType(double(1))
    },
    transform = function(p = double(1)) {
      # Give user access to the transformation ...
      return(transformer$transform(p))
      returnType(double(1))
    },
    inverseTransform = function(ptrans = double(1)) { # ... and its inverse.
      return(transformer$inverseTransform(ptrans))
      returnType(double(1))
    }
  ),
  buildDerivs = 'neg_logLikelihood_p'
)

Sometimes you might want a call to transformer$transform or transformer$inverseTransform to be included in a derivative calculation. In this case, we don’t.

Next we build and compile an instance of logLikelihood_nf specialized (by the `setup`` function) to our model and its parameters.

ll_glm <- logLikelihood_nf(model, c("p", "beta"))
Cll_glm <- compileNimble(ll_glm, project = model)

Now let’s see how the parameter transformation works for this model. For beta, it already has an unconstrained range of valid values, so it has no transformation (i.e. it has an indentity transformation). For p, it is constrained beween 0 and 1, so it will be logit-transformed.

Cll_glm$transform(c(0.2, 1))

## [1] -1.386294  1.000000

# The inverse transformation goes back to the original parameters, within rounding:
Cll_glm$inverseTransform(c(-1.386294, 1))

## [1] 0.2000001 1.0000000

Now we are ready to use the methods provided in Cll_glm in a call to R’s optim to find the MLE.

MLE <- optim(c(0,0), # initial values
         fn = Cll_glm$neg_logLikelihood, # function to be minimized
         gr = Cll_glm$gr_neg_logLikelihood, # gradient of function to be minimized
         method = "BFGS") # optimization method to use
MLE$par

## [1] 0.08257249 0.25577134

-MLE$value

## [1] -47.73243

Those outputs show the MLE in the parameters logit(p), and beta, following by the maximum log likelihood value.

Now let’s confirm that it’s working correctly by comparing to results from glm.

glm_fit <- glm(I(model$y) ~ X, family = poisson)
coef(glm_fit)

## (Intercept)           X 
##  -0.6527680   0.2557751

logLik(glm_fit)

## 'log Lik.' -47.73243 (df=2)

We see that the coefficient for X matches beta (within numerical tolerance), and the maximum log likelihoods match. To understand the intercept, we need to be careful about the parameter transformation used in nimble. The inverse transformation will give p, and the log of that should match the intercept estimated by glm.

Cll_glm$inverseTransform(MLE$par)

## [1] 0.5206314 0.2557713

log(Cll_glm$inverseTransform(MLE$par)[1]) # This should match glm's intercept

## [1] -0.652713

It looks like it worked.

Chapter 17 Example: maximum likelihood estimation using optim with gradients from nimDerivs.

Chapter 17 Example: maximum likelihood estimation using `optim` with gradients from `nimDerivs`.