Factor Analysis

Factor Analysis is a statistical technique used for dimensionality reduction and to identify underlying relationships between observed variables. By analyzing correlations or covariances between a large number of variables, factor analysis aims to uncover a smaller number of unobserved variables, known as factors, that account for the observed correlations.

History

Factor analysis was developed in the early 20th century in the field of psychology, initially by Charles Spearman, who used the technique to model intelligence. Later developments were contributed by researchers like Thurstone, Kaiser, and Cattell, each refining the methods and applications.

Types

Exploratory Factor Analysis (EFA): EFA is used when the researcher does not have a clear idea of the structure or the number of underlying factors. It allows for more flexibility in the model specification and is often used in the initial stages of research.
Confirmatory Factor Analysis (CFA): CFA, on the other hand, is used when the researcher has a hypothesized model or factor structure based on existing theory or previous studies. CFA tests whether the data fits the hypothesized model well.

Key Concepts

Factors and Loadings: A factor is a latent variable that explains the variation and covariation among observed variables. Factor loadings indicate the strength of the relationship between the observed variables and the underlying factors.
Eigenvalues and Scree Plots: Eigenvalues represent the amount of variance explained by each factor. Scree plots visualize eigenvalues and help in determining the number of factors to retain in the model.

Applications

Psychology: Identifying underlying dimensions of personality, intelligence, or attitudes.
Marketing: Customer segmentation and product positioning.
Finance: Portfolio construction based on underlying risk factors.
Health: Identifying underlying symptoms or behaviors that indicate specific health conditions.

Software

SPSS
SAS
R packages like factoextra and psych
Mplus

Limitations

Sensitivity to Assumptions: The model is sensitive to assumptions such as linearity and normality.
Subjectivity: The interpretation of factors can sometimes be subjective.
Overfitting: Risk of extracting factors that may not be meaningful, especially in small samples.