A major difference between Item Response Theory (IRT) and Classical Test Theory (CTT) is the way individual questions (or 'items') are treated. In CTT, question responses are simply added together to form a Test Score, assuming that all questions in a test have the same characteristics. In IRT, it is acknowledged that questions vary in difficulty and the amount of information they contain.

Take for example the two statements 'Most people I know like me' and 'I like to have a lot of people around me', which are both part of the facet Sociability. Our analyses show that they differ in their difficulty - it is 'harder' to agree with the second statement than the first. Put in another way, you need to be more Sociable to agree with the second statement compared to the first. In CTT, this difference would be ignored, while in IRT, this difference improves the precision of the test.

In the model underlying our personality test, each question has two parameters - the location (or difficulty) and the slope. These parameters are estimated using state-of-the-art methods for Bayesian inference (Markov Chain Monte Carlo sampling with the NUTS sampler as implemented in the python package PyMC3). The process was greatly inspired by a paper by Luo and Jiao (2017) *Using the Stan Program for Bayesian Item Response Theory. *

The standardization sample consisted of 659 observations. Informative priors were used to adjust for bias in the sample and regularize the question parameters to reasonable values.