In classical test theory, the *reliability* of a test is a measure of the consistency of the scores resulting from the test. According to the Wikipedia article on reliability, 'A measure is said to have a high reliability if it produces similar results under consistent conditions'.

Reliability is expressed as a coefficient which ranges from zero to one. The closer the reliability coefficient is to 1.00, the more reliable the test and the less measurement error there is associated with test scores. No test has a reliability of 1.00, however.

The most common type of test reliability reported by test developers is *Internal Consistency*, which concerns the consistency of scores across items within one test, or a test scale. This is often estimated using Cronbach's alpha. Read more in this excellent article.

Another type of reliability is the temporal stability, or *test-retest reliability* of the test scores. A common way to investigate this is to administer the test of interest to a group of people twice and calculate the correlation between the scores.

The time between the sessions should be short enough for the underlying trait to remain unchanged, but long enough for the participants to not remember their exact responses to the questions. It is common practice to administer the test with an interval of two weeks.

Closely related to the concept of reliability is the *Standard Error of Measurement *(SEM; not to be confused with the standard error of the mean). This is a measure of the uncertainty in a test score. Higher reliability means lower uncertainty, and therefore lower SEM.

SEM can be used to create a confidence interval around the observed score of a test. Given a theoretical true score of a respondent on a test, the confidence interval tells us the range in which the observed score is likely to end up. Approximately 68% of the time, the observed score will lie within +1.0 and –1.0 SEM of the true score; 95% of the time, the observed score will lie within +1.96 and –1.96 SEM of the true score.

The standard formula for transforming reliability to SEM is simply:

SEM = sd(x) * sqrt(1 - r)

Where sd(x) is the standard deviation of the test scores, sqrt is the square root and r is the reliability coefficient of the test.