Classical Test Theory

Classical Test Theory (CTT) is a psychometric framework for designing and interpreting tests, questionnaires, and similar instruments that measure psychological attributes. Originating in the early 20th century, CTT provides a basis for understanding the reliability and validity of test scores. It posits that an observed test score is composed of a true score and an error term.

Key Concepts

  • True Score: The true score in CTT represents the average score a test-taker would receive if they took the test an infinite number of times. It is considered the real measure of the attribute being tested but is unobservable.
  • Observed Score: The observed score is the actual score obtained by a test-taker. According to CTT, the observed score is a reflection of the true score plus or minus an error term.
  • Error: The error term represents the random factors that could affect the observed score, such as mood, health, and other transient conditions. It is assumed to have a mean of zero.


  • Linearity: Assumes a linear relationship between the observed, true, and error scores.
  • Independence: Assumes that errors are uncorrelated with true scores.
  • Random Error: Assumes that errors are random with a mean of zero.


  • Psychology: CTT is widely used in psychology to develop and validate instruments for measuring attributes like intelligence, personality, and attitudes.
  • Education: In educational settings, CTT is applied in the development and validation of standardized tests and assessments.
  • Medical Testing: CTT provides a framework for developing diagnostic tests in medical research, particularly in mental health assessments.

Statistical Metrics

  • Reliability: Reliability refers to the consistency of test scores across different instances of the test. Various coefficients, like Cronbach's alpha, are often used to measure this.
  • Validity: Validity is concerned with how well the test measures the attribute it is designed to measure. It can be categorized into different types such as content validity, criterion validity, and construct validity.


  • Simplistic: Does not account for item-level variations in difficulty or discrimination.
  • Independence Assumption: May not hold in tests that induce stress or fatigue, thereby correlating errors.
  • Unidimensional: Assumes only one latent trait is being measured, which may not always be true.

See Also