My advisor told me to use principal components analysis to examine the structure of my items and compute scale scores, but I was taught not to use it because it is not a “true” factor analysis. Help!

Help, indeed. This issue has been a source of both confusion and contention for more than 75 years, and papers have been published on this topic as recently as just a few years ago. A thorough discussion of principal components analysis (PCA) and the closely related methods of exploratory factor analysis (EFA) would require pages of text and dozens of equations; here we will attempt to present a more succinct and admittedly colloquial description of the key issues at hand. We can begin by considering the nature of composites.

Say that you were interested in obtaining scores on negative affect (e.g., sadness, depression, anxiety) and you collected data from a sample of individuals who responded to 12 items assessing various types of mood and behavior (e.g., sometimes I feel lonely, I often have trouble sleeping, I feel nervous for no apparent reason, etc.). The simplest way to obtain a composite scale score would be to compute a mean of the 12 items for each person to represent their overall level of negative affect. This is often called an unweighted linear composite because all items contribute equally and additively to the scale score: that is, you simply add them all up and divide by 12. This approach is widely used in nearly all areas of social science research.

However, now imagine that you could compute more than one composite from the set of 12 items. For example, you might not believe a single overall composite of negative affect exists, but that there is one composite that primarily reflects depression and another that primarily reflects anxiety. This is initially very strange to think about because you want to obtain different composites from the same 12 items. The key is to differentially weight the items for each composite you compute. You might use larger weights for the first six items and smaller weights for the second six items to obtain the first composite, and then use smaller weights for the first six items and larger weights for the second six items to obtain the second composite. Now instead of having a single overall composite of the 12 items assessing negative affect, you have one composite that you might choose to label depression and a second composite that you might choose to label anxiety, and both were based on differential weighting of the same 12 items. This is the core of PCA.

PCA dates back to the 1930’s and was first proposed by Harold Hotelling as a data reduction method. His primary motivation was to take a larger amount of information and reduce it to a smaller amount of information by computing a set of weighted linear composites. The goal was for the composites to reflect most, though not all, of the original information. He accomplished this through the use of the eigenvalues and eigenvectors associated with the correlation matrix of the full set of items. Eigenvalues represent the variance associated with each composite, and eigenvectors represent the weights used to compute each composite. In our example, the first two eigenvalues would represent the variances of the depression and anxiety composites, and the eigenvectors or weights would tell us how much each item contributes to each composite. It is possible to compute as many composites as items (so we could compute 12 composites based on our 12 items) but this would accomplish nothing in terms of data reduction because we would simply be exchanging 12 items for 12 composites. Instead, we want to compute a much smaller number of composites than items that represent most but not all of the observed variance (so we might exchange 12 items for two or three composites). The cost of this reduction is some loss of information, but the gain is being able to work with a smaller number of composites relative to the original set of items.

There are many heuristics used to determine the “optimal” number of composites to extract from a set of items. Methods include the Kaiser-Guttman rule, looking for the “bend” in a scree plot of eigenvalues, parallel analysis, and evaluating the incremental variance associated with each extracted component. There are also many methods of “rotation” that allow us to rescale the item weights in particular ways to make the underlying components more interpretable (helping us “name” the factors). For example, if the first six items assessed things like sadness and loneliness and had large weights on the first component but smaller weights on the second, we might choose to name the first component “depression”, and so on. Often, the end goal is to obtain conceptually meaningful weighted composite scores for later analyses.

Although Hotelling developed PCA strictly as a method of data reduction and composite scoring (indeed, he never even discussed rotation because he was not interested in interpreting individual items), over time this method came to be associated with a broader class of models called exploratory factor analysis, or EFA. The goals of EFA are often very similar to those of PCA and might include scale development, understanding the psychometric structure underlying a set of items, obtaining scale scores for later analysis, or all three. There are many steps in EFA that overlap with those of PCA, including identifying the optimal number of factors to extract; how to rescale (or “rotate”) the factor loadings to enhance interpretation; how to “name” the factors based on what items are weighted more vs. less; and how to compute optimal scores. Given these similarities, there has long been contention about whether PCA is a formal member of the EFA family, or if PCA is not a “true” factor model but instead something distinctly different.

Contention on this point centers on a key defining feature of PCA: it assumes that all items are measured without error and all observed variance is available for potential factoring. When fewer composites are taken than the number of items, some residual variance in the items will be left over, but this is still considered “true” variance and not measurement error. In contrast, EFA explicitly assumes that the item responses may be, and indeed very likely are, characterized by measurement error. As such, whereas PCA expresses the components as a direct function of the items (that is, the items induce the components), EFA conceptually reverses this relation and instead expresses the items as a function of the underlying latent factors. The factors are “latent” in the sense that we believe them to exist but they are not directly observed, and our motivating goal is to infer their existence based on what we did observe: namely, the items.

Of critical importance is that, unlike the PCA, the EFA assumes that only part of the observed item variance is true score variance and the remaining part is explicitly defined as measurement error. Although this assumption allows the model to more accurately reflect what we believe to exist in the population (we nearly always recognize there is the potential for measurement error in our obtained items), this also creates a significant challenge in model estimation because the measurement errors are additional parameters that must be estimated from the data. Whereas PCA can be computed directly from our observed sample data, EFA requires us to move to more advanced methods that allow us to obtain optimal estimates of population parameters via iterative estimation. There are many methods of estimation that can be used in the EFA (e.g., unweighted least squares, generalized least squares, maximum likelihood), each of which have certain advantages and disadvantages. In general, maximum likelihood is often viewed as the “gold standard” method of estimation in most research applications.

We can think about four key issues that ultimately distinguish PCA from EFA:

  1. The theoretical model is formative in PCA and reflective in EFA. In other words, the composites are viewed as a function of the items in PCA, but the items are viewed as a function of the latent factors in EFA.
  2. PCA assumes all observed variance among a set of items is available for factoring, whereas EFA assumes only a subset of the observed variance among a set of items is available for factoring. This implies that PCA assumes no measurement error while EFA explicitly incorporates measurement error into the model.
  3. Although both PCA and EFA allow for the creation of weighted composites of items, in PCA these are direct linear combinations of items whereas in EFA these are model-implied estimates (or predicted values) of the underlying latent factors. As such, in PCA there is only a single method for computing composites, but in EFA there are many (e.g., regression, Bartlett, constrained covariance, etc.), all of which can differ slightly from one to the other.
  4. Finally, the confusion between PCA and EFA is exacerbated by the fact that in nearly all major software packages PCA is available as part of the “factor analysis” estimation procedures (e.g., in SAS PROC FACTOR a PCA is defined using “method=principal” but an EFA is defined using “method=ML”).

It is difficult to draw firm guidelines for when and if to use PCA in practice. It depends on the underlying theory, the characteristics of the sample, and the goals of the analysis. In most social science applications, particularly those focused on the measurement of psychological constructs, it is often best to use EFA because this better represents what we believe to hold in the population. However, if EFA is not possible due to estimation problems, or if there is an exceedingly large number of items under study, then PCA is a viable alternative. Interestingly, PCA has begun to make a recent comeback in usage within psychology given increased interest in machine learning. It is not uncommon for PCA to be applied to 50 or 100 variables in order to distill them down to a smaller number of composites to be used in subsequent analysis.

Our general recommendation is to initially consider EFA estimated using ML as your first best option, both for model fitting and score estimation. This is because, far more often than not, the EFA model better represents the mechanism we believe to have given rise to the observed data; namely, a process that combines both true underlying construct variation and random measurement error. However, if the EFA is not viable for some reason, then PCA is a perfectly defensible option as long as the omission of measurement error is clearly recognized. Finally, all of the above relates to the exploratory factor analysis model in which all items load on all underlying factors. In contrast, the confirmatory factor analysis (CFA) model allows for a priori tests of measurement structure based on theory. If there is a stronger underlying theoretical model under consideration, then CFA is often a better option. We discuss the CFA model in detail in our free three-day workshop, Introduction to Structural Equation Modeling.

Below are a few readings that might be of use.

Brown, T. A. (2015). Confirmatory factor analysis for applied research. Guilford publications.

Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286-299.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299.

Widaman, K. F. (1993). Common factor analysis versus principal component analysis: Differential bias in representing model parameters? Multivariate Behavioral Research, 28, 263-311.

Widaman, K. F. (2018). On common factor and principal component representations of data: Implications for theory and for confirmatory replications. Structural Equation Modeling: A Multidisciplinary Journal, 25, 829-847.

Related Articles

A reviewer recently asked me to comment on the issue of equivalent models in my structural equation model. What is the difference between alternative models and equivalent models within an SEM?

An equivalent model can be thought of as a re-parameterization of the original model. In other words, it is just a different way of “packaging” the same information in the data and no equivalent model can be distinguished from another based on fit alone. If you were to fit a series of equivalent models to the same sample data you obtain exactly the same chi-square test statistic, RMSEA, CFI, TLI, and any other omnibus measure of fit. It is often best to treat this as a limitation of any given study and to potentially present one or a small number of equivalent model options to the reader so that these too might be considered as plausible representations of the data.