What is a suppressor variable, and how does this differ from confounding and mediation?
A constant source of confusion within the multiple regression model (and the general linear model more broadly) relates to the terms suppression and suppressor variable. Indeed, it is not uncommon to see suppression invoked anytime some unanticipated or inexplicable finding is obtained that must be explained away. This is particularly evident when a strongly hypothesized relation is not found: The model results would have supported our hypotheses had they not been obscured by an omitted suppressor variable. What we will see is that (1) this statement is not an entirely accurate use of the term suppression, and (2) suppressor variables can be quite common, easily understood, and wholly accountable by substantive theory. So let’s think about this a bit more closely, because it really is pretty cool.
To understand suppression, we first need to remind ourselves of a simple two-predictor multiple regression model. Although throughout this note we focus on two predictors and one outcome, all of the concepts easily generalize to multiple predictors, sets of predictors, and even multiple outcomes (e.g., as might be found in a path analysis or structural equation model). This simple two predictor model can be expressed as
y=b0+b1x1+b2x2+r
where y is the outcome, b0 is the intercept, b1 and b2 are the regression coefficients relating x1 and x2 to y, respectively, and r is the residual. As always, each regression coefficient is the unique relation between that predictor and the outcome when controlling for (or above and beyond) the effects of the other predictor. In many situations, if two predictors are correlated with one another then the unique relation of one predictor and the outcome controlling for the other predictor is smaller than that same predictor when considered alone. (Spoiler alert: in suppression the opposite occurs, which is what makes it so incredibly weird).

The typical situation is best seen using a Venn diagram. Consider the bivariate relation between x1 and y where, for the moment, we ignore x2. In the Venn diagram the bivariate relation between these two variables is denoted as area a. This represents the relation between x1 and y ignoring the effect of x2.

Often, however, we are interested in the unique effects of two (or more) predictors (as well as the joint effect, but we ignore this for now). To obtain these, we bring both predictors into the model at the same time. Typically, x2 is correlated with both x1 and y. As seen in the diagram, in the presence of the second predictor, the effect of the first predictor, a, is usually smaller than it was before. This is quite natural in that the part of x1 that is shared with x2 is removed when assessing the unique relation between x1 and y, making a smaller. This is business as usual.
However, let’s make things a bit stranger. More than 80 years ago, a brilliant quantitative psychologist named Paul Horst found a situation in which the relation between x1 and y was actually larger in the presence of x2 than when assessed in the absence of x2 (Horst, 1941). Considering the above Venn diagrams, this makes absolutely no sense at all; it actually seems downright impossible. Yet it most definitely exists, and Horst somewhat unfortunately termed this situation suppression. It is unfortunate because x2 is not suppressing the relation between x1 and y (which many researchers assume). In actuality, x2 is suppressing irrelevant variance in x1 and, by doing so, enhances the relation between x1 and y. It might have been better to refer to x2 as an “enhancer” rather than a “suppressor”, but that historical ship has sailed so we are stuck with this terminology.
To understand suppression better, let’s first think about the substantive application in which Horst observed this phenomenon. He was part of a research team evaluating pilots during World War II. The pilots completed paper-and-pencil assessments measuring three types of cognitive reasoning: mechanical, numerical, and spatial. These three measures were then used to predict a score representing piloting ability. However, these measures were not as strongly predictive of piloting ability as had been anticipated. A fourth predictor was then included that was a measure of general verbal ability. Verbal ability was correlated with each of the three reasoning measures (as would be expected), but was not correlated with piloting ability (as would also be expected). Consistent with expectations, when verbal ability was added to the regression model it did not uniquely predict piloting ability. However, quite unexpectedly, the effects of all three reasoning measures were markedly larger compared to the model in which verbal ability was omitted. Horst determined that the reason was that verbal ability was removing irrelevant information from the three reasoning measures (that is, suppressing the part of the variance in reasoning that was related to verbal ability but unrelated to piloting ability) and this in turn enhanced the relations between what was left over in the reasoning measures and pilot ability, i.e., the unique relations.
It is helpful to consider a simple hypothetical example. We will define x1 to be the predictor variable of interest (say mechanical reasoning) and x2 the suppressor variable (say verbal ability). The simplest pattern of correlations consistent with traditional suppression is when x1 is correlated with y, x1 is correlated with x2, and x2 is not correlated with y. Of course, in any sample data there will rarely be a zero correlation between the suppressor and the outcome, but it might still obtain some negligible value. Let’s further say that the correlation between x1 and y is .25, between x1 and x2 is .70, and between x2 and y is zero. If we consider a model in which only x1 predicts y, the standardized regression coefficient for x1 is equal to .25 with a squared semi-partial correlation of .06 (that is, x1 uniquely accounts for 6% of the variance in y). However, if we add x2 as a second predictor, the standardized regression coefficient for x1 increases to .49 and the squared semi-partial correlation doubles to .12 (that is, x1 now uniquely accounts for 12% of the variance in y). The unique effect of x1 on y when controlling for x2 is markedly stronger than the bivariate relation of x1 with y, the hallmark of suppression.
What on Earth is going on? Of course, the presence or absence of the suppressor does not change the bivariate relation between the predictor and the outcome (the correlation between x1 and y is always .25). However, what the suppressor does change is the unique variability in the predictor that is available to be related to the outcome.

We can see this in a simple re-arrangement of the Venn diagram (the Venn diagram is an imperfect representation of the underlying mathematics, but visually gives a sense what is happening here). This shows that the suppressor, x2, correlates with the predictor, x1, as indicated by area b, but does not correlate with the outcome (reflected in the lack of overlap of the circles for x2 and y ). Further, we can see that controlling for the suppressor reduces (or suppresses) part of the variance in x1 that is unrelated to y (area b). This, in turns, enhances the proportional relation between x1 and y (represented by area a). This is the core of suppression.
There have been dozens of papers written on suppression following Horst’s initial discovery, and we note several of these below. Many of these propose specific subtypes of suppression and describe under what unique conditions these might be encountered in practice. However, a concise general definition was given by Conger (1974) who wrote “A suppressor variable is defined to be a variable which increases the predictive validity of another variable (or set of variables) by its inclusion in a regression equation. This variable is a suppressor only for those variables whose regression weights are increased.” Importantly, this means that a variable is not inherently a suppressor in and of itself. Instead, a suppressor is defined by the impact it has on other variables in the model. That is, a variable might be a suppressor in one model but not in another.
This brings us to two initial points. First, there is nothing about suppression that is magical, mysterious, or misunderstood. The papers noted below explain in gory detail exactly what suppression is and under what conditions it exists, so be suspect of a paper that says “Suppression is a long-misunderstood issue…”. It is not. Initially confusing? Yes. Misunderstood? No.
Second, suppression is not some unavoidable artifact of measurement or estimation but instead can be fully accounted by substantive theory. Horst’s example is just one of many in the literature, nearly all of which make perfect sense within a given theoretical framework. As such, it is often beneficial to think about potential suppressors during the design phase of a study so that all relevant variables can be included in the analysis.
However, our third and final point is a bit of a punch in the face: we must consider two additional competing explanations for the role of a third variable in our models: confounding and mediation.
By far the clearest treatment of this was given by MacKinnon, Krull and Lockwood (2000) in a title that could not be more on point: Equivalence of the Mediation, Confounding, and Suppression Effect. You nearly don’t have to read the paper given the title. The paper opens with, “Once a relationship between two variables has been established, it is common for researchers to consider the role of a third variable in this relationship.” This is precisely what we have considered thus far. But to better see this point, we move from Venn diagrams to path diagrams. Let’s first consider the two-predictor regression that we have discussed up to this point:

This shows the usual expression of two correlated predictors and one outcome. Note that there are three measured variables, and these are all related to one another (the curved arrow reflects the correlation between the two predictors, and the two one-headed arrows reflect the partial regression coefficients). In a suppression situation, x1 enhances the relation between x1 and y.
However, with a simple re-arrangement of the diagram we get what is called confounding:

Note that all we have done is changed the correlation between the two predictors to a regression coefficient and now x2 is a confounder in that it predicts both x1 and y. This is the situation that is so fun to teach because we can give examples such as the number of fire trucks sent to a fire is positively correlated to the amount of damage done at the fire; but when the confounder of severity of fire is included, there is no relation between number of trucks and damage. Importantly, whereas a suppressor enhances the relation between x1 and y, including a confounder as a second regressor reduces this same relation.
Finally, re-directing one arrow in the above path diagram results in mediation:

Now x2 explains the relation between x1 and y. For example, the predictor might be parent’s alcohol use, the outcome is the child’s alcohol use, and the mediator is impaired parenting. The inference is that the parent’s alcohol use impairs their own parenting behavior, and this in turn increases the probability that the child will drink alcohol themselves. In sum, suppression enhances the relation between a predictor and the outcome, confounding reduces the relation, and mediation explains the relation. How the heck do we differentiate among the three? MacKinnon et al. (2000) argue that you do not, and they demonstrate that the statistical tests of these three effects are all identical: each model is a simple re-expression of the others. They conclude the paper saying, “The statistical procedures provide no indication of which type of effect is being tested. That information must come from other sources.” The “other sources” to which they refer are prior knowledge and theory. All three effects are statistically isomorphic, and only theory can discern which most likely holds in the population. Further differentiation might also be possible by moving to experimental or longitudinal designs that allow for better testing of hypotheses about causal pathways.
Suggested Readings
Conger A.J. (1974). A revised definition for suppressor variables: A guide to their identification and interpretation. Educational Psychological Measurement, 34, 35–46.
Horst P. (1941). The role of predictor variables which are independent of the criterion. Social Science Research Council Bulletin, 48, 431–436
MacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equivalence of the mediation, confounding and suppression effect. Prevention Science, 1, 173-181.
Tzelgov, J., & Henik, A. (1991). Suppression situations in psychological research: Definitions, implications, and applications. Psychological Bulletin, 109, 524-536.
Velicer W.F. (1978). Suppressor variables and the semipartial correlation coefficient. Educational and Psychological Measurement, 38, 953–958