Computer Adaptive Testing (CAT) Theory

CAT

2026-05-05

9 min read

irufano

Ctrl+F
Contents
Computer Adaptive Testing (CAT) Theory image

Thumbnail Credit

What is CAT?

Computer Adaptive Testing is a method of administering tests where the difficulty of each question adapts in real-time based on the test-taker's ability. Unlike traditional fixed tests where everyone gets the same questions, CAT selects the most informative question for each individual at each step [1][2].

The core idea:

  • If you answer correctly → next question is harder
  • If you answer incorrectly → next question is easier
  • The test converges on your true ability with fewer questions than a fixed test

"CAT yields a high and equal degree of precision for all candidates, and requires fewer items than a conventional test to reach a given degree of precision." — Eggen & Verschoor (2006) [3]


Core Components

1. Item Response Theory (IRT)

IRT is the mathematical foundation of CAT. It models the probability of a correct response as a function of the person's ability and the item's properties [4].

The most common model is the 3-Parameter Logistic (3PL) model:

Where:

  • θ (theta) — the person's ability (typically ranges from -3 to +3)
  • a — discrimination parameter (how well the item differentiates abilities)
  • b — difficulty parameter (the ability level where P = 0.5)
  • c — guessing parameter (probability of correct answer by chance)

Simpler models [4]:

  • 1PL (Rasch) — only difficulty b, assumes equal discrimination
  • 2PL — difficulty b + discrimination a, no guessing
  • 3PL — full model with guessing c

2. Item Parameters

Each item in your item bank has parameters that describe its characteristics [4][5]:

Difficulty (b)

  • Ranges typically from -3 to +3
  • b = -2 → very easy item (most people get it right)
  • b = 0 → medium item (50% chance at average ability)
  • b = +2 → very hard item (only high ability people get it right)

Discrimination (a)

  • Ranges typically from 0 to 3
  • Higher a = better at separating high vs low ability test-takers
  • a < 0.5 → poor discrimination, item barely helps
  • a = 1.0 → good discrimination
  • a > 2.0 → excellent discrimination

Guessing (c)

  • Ranges from 0 to 1
  • For multiple choice with 4 options, c ≈ 0.25
  • Accounts for the fact that low-ability test-takers can guess correctly

3. Item Information Function (IIF)

The information an item provides tells us how useful it is for estimating ability at a given theta level [4]:

Key insight:

  • An item provides maximum information when its difficulty matches the person's ability (b ≈ θ)
  • Items that are too easy or too hard provide little information
  • Higher discrimination a = more information overall

This is why CAT selects items where difficulty ≈ current theta — it's the point of maximum information [5].


4. Theta Estimation

After each response, the person's ability estimate is updated. The main methods are [4][6]:

Maximum Likelihood Estimation (MLE)

Finds the theta that maximizes the likelihood of the observed response pattern:

Where u = 1 if correct, u = 0 if incorrect.

Problem: MLE is undefined if all answers are correct or all incorrect.

Bayesian Estimation (EAP/MAP)

Uses a prior distribution (usually normal) combined with the likelihood [6]:

  • EAP (Expected A Posteriori) — takes the mean of the posterior
  • MAP (Maximum A Posteriori) — takes the mode of the posterior

Bayesian methods work even with all-correct or all-incorrect patterns, making them preferred for CAT [3][6].

Simple Approximation

if correct:   theta = theta + 0.3
if incorrect: theta = theta - 0.3

This is a simplified approximation — not true IRT but easy to implement for prototyping.


5. Item Selection Methods

At each step, CAT must choose the next best item. Common methods [2][5]:

Maximum Information (most common)

Select the item that provides maximum information at current theta:

# Select item with highest information at current theta
I <- a^2 * P * (1-P)  # simplified 2PL information
next_item <- item with max(I)

Closest Difficulty (simplified)

Select item whose difficulty is closest to current theta:

items$diff <- abs(items$difficulty - theta)
selected <- items[which.min(items$diff), ]

Simple but effective approximation of maximum information.

Other Methods

MethodDescription
Random from eligible setAdds randomness to prevent item overexposure [2]
Maximum Fisher informationFull IRT-based calculation [4]
Kullback-Leibler informationInformation-theoretic approach [5]

6. Stopping Rules

CAT stops when one of these conditions is met [3][7]:

RuleDescription
Fixed lengthStop after N items (simplest)
Standard errorStop when SE(θ) < threshold (e.g. SE < 0.3)
Confidence intervalStop when CI is narrow enough
Time limitStop after time expires
Item bank exhaustedNo more unused items available

Standard error of theta:

The more items answered, the lower the SE and the more precise the estimate.

For example, Huda et al. (2024) used SE ≤ 0.01 as the stopping criterion in their CAT implementation for student assessment [7].


Full CAT Process Flow

1. START
   θ₀ = 0 (or prior mean)
   SE = ∞

2. ITEM SELECTION
   Find item i* = argmax I(θ_current)
   from unused items

3. PRESENT ITEM
   Show question to test-taker
   Record response u (1=correct, 0=incorrect)

4. THETA UPDATE
   Update θ using MLE or Bayesian method
   based on all responses so far

5. STOPPING CHECK
   If stopping rule met → go to 6
   Else → go back to 2

6. REPORT
   Final θ estimate
   SE(θ)
   Confidence interval

Why CAT is Better Than Fixed Tests

FeatureFixed TestCAT
Number of itemsSame for everyone (e.g. 50)Fewer needed (e.g. 20–25) [1]
PrecisionSame for all ability levelsHighest where it matters [3]
Test experienceMany items too easy or too hardMost items appropriately challenging [2]
Item exposureAll items used equallyRisk of overexposure [2]
SecurityAll items knownItems protected by adaptive selection [1]
Test timeLongerShorter (30–50% reduction) [1]

Interpreting Theta (θ)

Theta follows a standard normal distribution. Here's how to interpret the score [4]:

Theta RangeInterpretation
θ > 2.0Exceptionally high ability
1.0 < θ ≤ 2.0High ability
-1.0 < θ ≤ 1.0Average ability
-2.0 < θ ≤ -1.0Below average ability
θ ≤ -2.0Very low ability

Theta can also be converted to a more familiar scale:


Summary Table

ConceptSymbolWhat it does
Ability estimateθ (theta)Represents person's latent ability [4]
DifficultybItem's difficulty level [4]
DiscriminationaHow well item separates abilities [4]
GuessingcChance of correct answer without knowledge [4]
Item InformationI(θ)How useful an item is at a given theta [5]
Standard ErrorSE(θ)Precision of the theta estimate [3]
Theta estimationMLE/EAPUpdates ability after each response [6]
Item selectionargmax I(θ)Picks most informative next item [2]
Stopping ruleSE < 0.3 or N itemsDecides when test is precise enough [3]

Key Takeaway

The goal of CAT is always: estimate theta as accurately as possible using as few items as possible [1][3].

CAT achieves this by always asking the question that provides the most information about the person's ability at their current estimated level — making every question count.


References

[1] Davey, T. (2011). A guide to computer adaptive testing systems. Council of Chief State School Officers.

[2] van der Linden, W. J., & Glas, C. A. W. (2022). Computerized Adaptive Testing: Theory and Practice. Kluwer Academic Publishers.

[3] Eggen, T. J. H. M., & Verschoor, A. J. (2006). Overview and current management of computerized adaptive testing in licensing/certification examinations. PMC, NCBI. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5676016/

[4] Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Lawrence Erlbaum Associates / Routledge. https://doi.org/10.4324/9780203056615

[5] De Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. New York: The Guilford Press.

[6] Magis, D., & Barrada, J. R. (2017). Computerized Adaptive Testing with R: Recent Updates of the Package catR. Journal of Statistical Software, Code Snippets, 76(1), 1–18. https://doi.org/10.18637/jss.v076.c01

[7] Huda, A., Firdaus, F., Irfan, D., Hendriyani, Y., Almasri, A., & Sukmawati, M. (2024). Optimizing Educational Assessment: The Practicality of Computer Adaptive Testing (CAT) with an Item Response Theory (IRT) Approach. JOIV: International Journal on Informatics Visualization, 8(1), 473–480. https://doi.org/10.62527/joiv.8.1.2217

[8] Magis, D., & Raiche, G. (2012). Random Generation of Response Patterns under Computerized Adaptive Testing with the R Package catR. Journal of Statistical Software, 48(8), 1–31. https://doi.org/10.18637/jss.v048.i08