The statistical model used in Alva’s logic test is called the Three-Parameter Logistic (3PL) model (Birnbaum, 1968). Within the framework of Item Response Theory, this model captures three characteristics in which tasks can vary - difficulty (𝛽), discrimination (𝛼) and guessing (𝛾).
The model specifies a statistical relationship between individuals’ logical ability (𝜃) and the probability of solving a given task correctly:
The 3PL model is shown as a probabilistic graphical model below, where circles represent latent, unobserved variables for persons and items, and the square represents the observed solution to a given task.
By taking three item characteristics into account, logical ability can be estimated with higher precision than in classical tests. In addition, the adaptive format improves the efficiency of the test session, since the tasks are chosen to match the estimated ability as closely as possible.
A common issue in machine learning is the trade-off between bias and variance in models. The 3PL model is complex due to the large number of parameters in relation to the number of observations, which increases the variance of the model and therefore also the risk of overfitting the data. To control this, a Bayesian parameter estimation procedure is applied with priors that regularize the model towards reasonable values (the Rasch model). The method is implemented in the probabilistic programming language PyMC3 (Salvatier, Wiecki & Fonnesbeck, 2016) and was inspired by Lou and Jiao (2018).
The algorithm for estimating logical ability is called Expected A Posteriori (de Ayala, 2008), and it is based on Bayes’ Theorem. The input is a prior distribution, which defines the properties of the ability scale, and the likelihood function, which is given by the formula above. The output is the expected ability - the mean of the posterior distribution.
The estimation of logical ability is updated after each task in the test. In parallel, the next task is selected based on the maximum information criterion (posterior weighted maximum information). In theory, this makes sure that the tasks that are administered provide the most information about the individual’s ability. In practice, it means that each individual gets tasks that are challenging, but not impossible, given their ability.
Birnbaum A (1968). Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. In F Lord, M Novick (eds.), Statistical Theories of Mental Test Scores, AddisonWesley, Reading, MA.
de Ayala, R. J. (2008). The Theory and Practice of Item Response Theory. Guildford Press, New York, US.
Luo, Y. & Jiao, H. (2018). Using the Stan Program for Bayesian Item Response Theory. Educational and Psychological Measurement, 78(3), 384-408. DOI:10.1177/0013164417693666
Salvatier J., Wiecki T.V., Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2:55. DOI: 10.7717/peerj-cs.55