Item Response Theory

Item Response Theory (IRT) refers to a family of models that attempt to explain the relationship between latent traits and their manifestation in observations of the individual. The basic assumption is that an individual’s observable behavior is related to this latent trait. Common contexts include educational testing and health status.

In a testing context, the standard setting considers \(I\) items and \(P\) persons for which responses \(y_{pi} \in \{0, 1\}\) are observed with values encoding incorrect and correct answers. The probability of a correct answer for each item/person \(\pi_{pi} = \text{Pr}(y_{pi} = 1)\) is modeled via logistic or probit regression, conditionally on a set of parameters encoding the characteristics of respondents and items. Typically a parameter \(\eta_p\) is defined for each respondent \(p = 1, \ldots, P\) representing the unobserved ability of the respondent. Parameters \(\{\eta_p\}_1^P\) are assumed to be drawn from a common distribution, taking the role of ‘random effects’.

Different models can be defined according to the number of parameters associated with each item. Here we consider two classic models for dichotomous item responses, namely the one-parameter logistic (1PL) model and the two-parameter logistic (2PL) model.

For this example, we use the Law School Admission Test (LSAT) data from the ltm package, which is a classic example in educational testing (Bock and Lieberman 1970). The dataset comprises \(1000\) responses to \(5\) questions.

LSATdata <- ltm::LSAT

1PL model

The one-parameter logistic model (1PL) assumes the presence of one parameter, \(\beta_i\), for each item \(i = 1, \ldots, I\) encoding the difficulty of the item. We choose a non-informative normal prior \(\mathcal{N}(0, 100)\) for the difficulty parameters, and a uniform distribution for abilities variance.

library(nimble, warn.conflicts = FALSE)
## nimble version 1.3.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit https://R-nimble.org.
## 
## Note for advanced users who have written their own MCMC samplers:
##   As of version 0.13.0, NIMBLE's protocol for handling posterior
##   predictive nodes has changed in a way that could affect user-defined
##   samplers in some situations. Please see Section 15.5.1 of the User Manual.
code1PL <- nimbleCode({  
  for(i in 1:I) {
    for(p in 1:P) {
      y[p,i] ~ dbern(pi[p,i])
      logit(pi[p, i]) <-  eta[p] - beta[i]
    }
    beta[i] ~ dnorm(0, var = 100)
  }  
  
  for(p in 1:P) {
    eta[p] ~ dnorm(0, sd = sd_eta)
  }

  sd_eta ~ dunif(0, 100)  # prior for variance components based on Gelman (2006)
})

constants <- list(I = ncol(LSATdata), P = nrow(LSATdata))
data <- list(y = LSATdata)

set.seed(1)
inits <- list(beta    = rnorm(constants$I, 0, 1),
              eta     = rnorm(constants$P, 0, 1), 
              sd_eta = 3)

monitors = c("beta", "eta", "sd_eta")

model1PL <- nimbleModel(code1PL, constants, data, inits)
## Defining model
## Building model
## Setting data and initial values
## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.
## Checking model sizes and dimensions
cModel1PL <- compileNimble(model1PL)
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.

Building and running an MCMC to fit the model

conf1PL <- configureMCMC(model1PL, monitors = monitors)
## ===== Monitors =====
## thin = 1: beta, eta, sd_eta
## ===== Samplers =====
## RW sampler (1006)
##   - beta[]  (5 elements)
##   - sd_eta
##   - eta[]  (1000 elements)
model1PLMCMC <- buildMCMC(conf1PL)
cModel1PLMCMC <- compileNimble(model1PLMCMC, project = model1PL)
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.

Run the model and save posterior samples.

system.time(samples1PL <- runMCMC(cModel1PLMCMC, niter = 20000, nburnin = 10000))
## running chain 1...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
##    user  system elapsed 
##  26.225   0.083  26.309

Checking the results

We can obtain a summary of the estimates of item difficulties.

betaCols <- grep("beta", colnames(samples1PL))
sd_etaCol <- grep("sd_eta", colnames(samples1PL))
samplesSummary(samples1PL[, c(betaCols, sd_etaCol)])
##               Mean     Median    St.Dev.  95%CI_low  95%CI_upp
## beta[1] -2.7426819 -2.7411894 0.13037380 -2.9963098 -2.4855966
## beta[2] -1.0011783 -1.0009559 0.07833695 -1.1523951 -0.8495846
## beta[3] -0.2433821 -0.2453595 0.07151934 -0.3847355 -0.1014021
## beta[4] -1.3106872 -1.3104853 0.08184534 -1.4721001 -1.1478006
## beta[5] -2.1100184 -2.1103943 0.10782355 -2.3244911 -1.8996280
## sd_eta   0.7672692  0.7669722 0.06783533  0.6371932  0.9018647

Let’s check posterior convergence. In this case, the variance component appears to be the slowest mixing, and we would probably want to run the MCMC for longer.

par(mfrow = c(1, 3), cex = 1.1)
for(i in 1:5)
  ts.plot(samples1PL[ , betaCols[i]], xlab = 'iteration', ylab = colnames(samples1PL)[ betaCols[i]])

ts.plot(samples1PL[ , sd_etaCol], xlab = 'iteration', ylab = colnames(samples1PL)[sd_etaCol])

Item characteristic curve

We can use the posterior samples to estimate the posterior Item Characteristic Curve (ICC) for an interval of ability values.

## plot of ICC 
abilityGrid <- seq(-8, 5, length = 1000)
iccVals <- matrix(0, 1000, constants$I)
for(i in seq_len(constants$I)) {
    tmp <- sapply(abilityGrid, function(x) expit(x - samples1PL[, betaCols[i]]))
    iccVals[,i] <- colMeans(tmp)
}

library(ggplot2)
rownames(iccVals) <- abilityGrid
df <- reshape2::melt(iccVals)
colnames(df) <- c("Ability", "Item", "Probability")
df$Item <- factor(df$Item, levels = 1:5, labels = paste0("Item ", 1:5))

ggplot(df, aes(x = Ability, y = Probability, group = Item)) +
  geom_line(aes(color = Item)) + theme_bw() +
  ggtitle("Item characteristic curves - 1PL model")

2PL model

A generalization of the 1PL model is the two-parameter logistic model, which considers an additional parameter, \(\lambda_i\), for each item \(i = 1, \ldots, I\), often referred as discrimination parameter, because items with a large \(\lambda_i\) are better at discriminating between subjects with different abilities. Typically the \(\lambda_i\)’s are assumed to be positive, so we choose independent lognormal distributions as priors.

Given the additional parameter, the metric (location and scale) of the individual parameters is only known up to a linear transformation. In the literature, several types of restrictions are proposed to anchor the metric, either acting on the item or ability parameters. A simple restriction is to assume that abilities are drawn from a \(\mathcal{N}(0,1)\) and freely estimate item parameters.

code2PL <- nimbleCode({
  for(i in 1:I) {
    for(p in 1:P) {
      y[p, i] ~ dbern(pi[p, i])
      logit(pi[p, i]) <-  lambda[i]*eta[p] + beta[i]
    }
  }  
  
  for(i in 1:I) {
    beta[i] ~ dnorm(0,  var = 100)  
    lambda[i] ~ dlnorm(meanlog = 0, sdlog = 6)
  }
   
  for(p in 1:P) {
    eta[p] ~ dnorm(0, 1)
  }
})

constants <- list(I = ncol(LSATdata), P = nrow(LSATdata))

data <- list(y = LSATdata)

set.seed(2)
inits <- list(beta   = rnorm(constants$I, 0, 3),
              eta    = rnorm(constants$P, 0, 3),
              lambda = rep(2, constants$I))

monitors = c("beta", "lambda", "eta")

model2PL <- nimbleModel(code2PL, constants, data, inits)
## Defining model
## Building model
## Setting data and initial values
## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.
## Checking model sizes and dimensions

Building and running an MCMC to fit the model

cModel2PL <- compileNimble(model2PL)
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
conf2PL <- configureMCMC(model2PL, monitors = monitors)
## ===== Monitors =====
## thin = 1: beta, eta, lambda
## ===== Samplers =====
## RW sampler (1010)
##   - beta[]  (5 elements)
##   - lambda[]  (5 elements)
##   - eta[]  (1000 elements)
model2PLMCMC <- buildMCMC(conf2PL)
cModel2PLMCMC <- compileNimble(model2PLMCMC, project = model2PL)
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
system.time(samples2PL <- runMCMC(cModel2PLMCMC, niter = 20000, nburnin = 10000))
## running chain 1...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
##    user  system elapsed 
##  34.785   0.076  34.862

Checking the results

Similarly to the previous case, we can obtain posterior summaries and check the convergence and the item characteristic curves. Once again we would want to run the MCMC for longer.

betaCols <- grep("beta", colnames(samples2PL))
lambdaCols <- grep("lambda", colnames(samples2PL))

samplesSummary(samples2PL[, c(betaCols, lambdaCols)])
##                Mean    Median    St.Dev.  95%CI_low 95%CI_upp
## beta[1]   2.7649371 2.7352719 0.22614207 2.40438805 3.2748143
## beta[2]   0.9853351 0.9813277 0.09174312 0.81542706 1.1738045
## beta[3]   0.2538296 0.2513119 0.08101397 0.09858091 0.4177354
## beta[4]   1.2798913 1.2760248 0.10112151 1.09499863 1.4976457
## beta[5]   2.0507451 2.0401487 0.14104010 1.80770396 2.3664747
## lambda[1] 0.7497019 0.7373295 0.31388119 0.09284423 1.4224005
## lambda[2] 0.6829034 0.6634760 0.20829358 0.33010676 1.1345390
## lambda[3] 0.9375830 0.8847020 0.31662639 0.48497486 1.7623309
## lambda[4] 0.6498310 0.6327764 0.19970123 0.30185887 1.0783772
## lambda[5] 0.6004614 0.6008976 0.25028211 0.10374343 1.1119834
par(mfrow = c(1, 5), cex = 1.1)
for(i in 1:5)
  ts.plot(samples2PL[ , betaCols[i]], xlab = 'iteration', ylab = colnames(samples2PL)[ betaCols[i]])

par(mfrow = c(1, 5), cex = 1.1)
for(i in 1:5)
  ts.plot(samples2PL[ , lambdaCols[i]], xlab = 'iteration', ylab = colnames(samples2PL)[ lambdaCols[i]])

Item characteristic curve

abilityGrid <- seq(-8, 12, length = 1000)
iccVals <- matrix(0, 1000, constants$I)
for(i in seq_len(constants$I)) {
    ## evaluate probability at each ability point
    tmp <- sapply(abilityGrid, function(x) expit(samples2PL[, lambdaCols[i]]*(x - samples2PL[, betaCols[i]])))
    iccVals[,i] <- apply(tmp, 2, mean)
}

rownames(iccVals) <- abilityGrid
df <- reshape2::melt(iccVals)
colnames(df) <- c("Ability", "Item", "Probability")
df$Item <- factor(df$Item, levels = 1:5, labels = paste0("Item ", 1:5))

ggplot(df, aes(x = Ability, y = Probability, group = Item)) +
  geom_line(aes(color = Item)) + theme_bw() +
  ggtitle("Item characteristic curves - 2PL model")

References

Bock, R. D., and M. Lieberman. 1970. “Fitting a Response Model for n Dichotomously Scored Items.” Psychometrika 35 (2): 179–97. https://doi.org/10.1007/BF02291262.