My advisor told me I should group-mean center my predictors in my multilevel model because it might “make my effects significant” but this doesn’t seem right to me. What exactly is involved in centering predictors within the multilevel model?

This is an excellent question and the topic of centering is often a source of confusion when using multilevel models (MLMs) in practice. This confusion is in part due to the need to address a complexity that arises within the MLM but is not relevant within the traditional multiple regression model: when modeling the effect of a lower-level predictor on an outcome, we must separate the total effect of the predictor into the within-group component versus the between-group component. Centering refers to the process of deviating an observed score from some referent mean value, and can be used to separate within- and between-group variation in lower-level predictors in MLMs, enabling us to obtain distinct estimates of within- and between-group effects (or within- and between-person effects, in a repeated measures setting).

As a simple example, imagine your sample consists of multiple classrooms, and each classroom contains multiple students. Further, you obtained a student-level predictor reflecting locus-of-control and a student-level outcome reflecting math achievement. Your goal is to examine if students who report higher levels of control also tend to perform better on a math exam. Given the hierarchical structure of your data (the nesting of students within classroom), there are actually three possible relations that can exist between your predictor and outcome, the total effect, within-group effect, and between-group effect (where, here, group is classroom). Let’s consider these in turn.

First is the total effect (or marginal effect), which represents the regression of math achievement on locus-of-control pooling over all students and classrooms. This total effect actually represents a weighted composite of the within-group and between-group components of the relation. While it is perfectly fine to estimate and interpret total effects from the standpoint of prediction (e.g., pooling over students and classrooms, a 1-unit change in the predictor leads to a so-many-points change in the outcome), it is much more difficult to draw theoretically meaningful conclusions from these effects, as the location of the effect is ambiguous – the total effect is a mish mosh of the within- and between-group effects. For this reason, when working with multilevel data, it is often preferable to estimate and interpret the within- and between-group effects directly instead.

The within-group effect is the relation between student locus-of-control within a given classroom; this evaluates whether, on average, students who are higher (or lower) on control with respect to the other students in their class tend to score higher (or lower) on the math assessment. One way to think about this effect is to imagine that you had only sampled students from a single classroom, say Class A. If you ran a simple regression analysis on the data, you would obtain an effect that tells you about how differences in locus of control are predictive of differences in math scores for students in Class A. You might assume that there’s nothing particularly special about this class and you would have observed the same effect had you sampled from Class B, or Class C, etc. With the multilevel data, we can leverage the data from all of the classrooms in our sample to estimate this common within-group effect with greater precision (and, if we don’t want to assume the within-group effect is the same in each classroom, we can allow for that too, but that’s a story for another day). The within-group effect continues to tell us, within a given group, how do differences in the predictor relate to differences in the outcome?

In contrast, the between-group effect is the relation between the classroom mean of student locus-of-control and math achievement; this evaluates whether, on average, classes categorized by higher (or lower) control tend to score higher (or lower) on math achievement. Here, we can imagine that instead of collecting the individual data, we were only provided with summary data for each classroom. Again, we could run a simple regression on this data, obtaining an estimate of how differences in the average value of locus of control between classrooms relate to differences in average value of math. With access to the individual, student-level data, we can estimate this effect more optimally (accounting for differences in classroom sizes, for instance), but the interpretation remains the same. If we were to compare two classrooms that differed by 1 unit in their average locus of control values, we would expect the students within these classrooms to differ in their average math scores by the magnitude of the between-group effect.

It is often quite important (if not required) to properly disaggregate the total effect into the within-group component and the between-group component within an MLM, and centering the predictors allows us to accomplish this. To see this, let’s consider a very simple one-predictor MLM for students nested within classrooms in which our predictor is locus-of-control and our outcome is math achievement.

Broadly, centering refers to the process of subtracting the mean from a variable (usually a predictor). Unlike in ordinary regression, centering becomes complicated with multilevel data because there are two possible means around which lower-level predictors can be centered. The first is the grand mean that represents the mean of the predictor pooling over all observations and all groups. The second is the group mean that represents the mean of the predictor within the group to which the observations belong.

There are thus two primary choices when centering lower-level predictors: we can grand mean center the predictor, where we deviate each individual score the overall mean (literally subtracting the grand mean from each person’s score), or we can group mean center the predictor, where we instead deviate each individual score from their own group mean. The former reflects the individual’s relative standing on the predictor with respect to everyone in the sample and the latter reflects the individual’s relative standing on the predictor with respect to everyone in their own group. Either of these rescaled (or “centered”) predictors can then be used in the Level 1 model, as can the raw (or uncentered) version of the predictor. Which is used influences the interpretation of the obtained effects. Further, because the group mean is a characteristic of the group, this itself can be used as an upper-level predictor in the Level 2 equation, regardless of which form of centering is used for the predictor at Level 1 (or even if it is left in the raw scale).

When using the predictor in the raw scale or within grand-mean centering, it is critical to include the group means of the predictor at Level 2 to properly disaggregate the effects. The effect obtained for the predictor at Level 1 will then be the within-group effect and the effect obtained at Level 2 will then be the difference between the between- and within-group effects, sometimes called the contextual effect.

Problems, however, arise if you fail to include the group means in the model when using the raw scale or grand-mean centered predictor. If you do that, you will get a mish mosh effect estimate for the Level 1 predictor that represents neither represents the between-group nor the within-group effect. Instead, it confounds these two effects together into a single value that may not resemble either. To make matters worse, this mish mosh also doesn’t represent the total effect, as it weights the within- and between-group effects differently. The obtained estimate is difficult to interpret, outside of a few special cases.1

In contrast, when using the predictor with group-mean centering, the effect obtained for the predictor at Level 1 will always be with within-group effect, regardless of whether the group means are included at Level 2 or not. If the group means are included at Level 2, the effect obtained will be the between-group effect. Importantly, MLMs fit using raw, grand-mean centered, or group-mean centered predictors all fit precisely the same, provided the group means are entered as predictors at Level 2 and there are no random slopes in the model (again, a story for another day).

With this as context, we can now return to your question, the answer to which depends on how you specified your initial model. If you included the group means in your model at Level 2, then you will obtain exactly the same within-group effect estimate (and p-value) for your Level 1 predictor regardless of which method of centering you use. In that case, your advisor would be wrong: group-mean centering won’t change a thing. On the other hand, if you haven’t included the group means in the model at Level 2, then group-mean centering will generate an estimate of the within-group effect that will differ from the mish mosh estimate you previously obtained with the raw scale or grand-mean centering. The significance of the within-group effect might well differ from the mish mosh estimate you had before, in which case your adviser would be right. And then there’s the effect of the group means at Level 2 to consider. Remember that when these are included at Level 2 the obtained estimates differ in interpretation depending on whether group-mean centering is used or not at Level 1. When using the predictor in raw scale or with grand-mean centering, the estimate represents the contextual effect, whereas with group-mean centering, the estimate represents the between-group effect. These will typically differ from one another and may differ in significance as well, since they test different null hypotheses.

The bottom line is that your advisor might or might not be right, depending on which aspect of the relationship between the predictor and outcome you are estimating in your models (e.g., total, within, or between-group effects). Different forms of centering and model specification can lead to important interpretational differences in the model results that are critical to consider when drawing substantive inferences. It is critical to be aware of exactly what effects you wish to estimate and to ensure that you are specifying the model in such a way that you will obtain tests of those effects.

We can thus draw the following general conclusions:

  1. If either the raw or grand mean centered predictor is entered at Level 1 without the group mean entered at Level 2, the obtained regression coefficient will confound the within- and between-group components of the relation into a single estimate that is difficult to interpret, outside of special circumstances (e.g., where the within- and between-group effects are the same).
  2. If either the raw or grand mean centered predictor is entered at Level 1 and the group mean is entered at Level 2, then the regression coefficient associated with the Level 1 predictor represents an unambiguous estimate of the within-group effect, and the regression coefficient associated with the Level 2 group mean represents the difference between the between-group and within-group effect; this latter effect is sometimes called the contextual effect.
  3. If the group mean centered predictor is entered at Level 1 with or without the group mean entered at Level 2, the regression coefficient represents an unambiguous estimate of the within-group effect.
  4. If the group mean is entered at Level 2 with or without the group mean centered predictor at Level 1, the regression coefficient represents an unambiguous estimate of the between-group effect.
  5. Finally, generalizing from points #3 and #4, if the group mean centered predictor is entered at Level 1 and the group mean is entered at Level 2, this provides simultaneous and unambiguous estimates of both the within-group and between-group effects of the predictor on the outcome.

Given the above, it is quite easy to see how confusion can arise about different options for centering, and how individual choices can impact subsequent interpretations of model results. Here we have only offered a brief review, and there are many clear and cogent descriptions of these issues as they arise both in hierarchically clustered data (as described above) and in longitudinal data (where we instead talk about within-person and between-person effects). For more detailed discussions of these issues see Raudenbush and Bryk (2002, pages 31-35, 134-149, and 181-183), Enders and Tofighi (2007), Kreft, de Leeuw, and Aiken (1995), and (if we may) Curran and Bauer (2011).

In conclusion, simply know that there is no “right” or “wrong” choice about centering, but there is most definitely an optimal choice based on the theoretical questions under study.

————————————

1 One special case is where the within- and between-group effects are the same. Then the value obtained for the raw-scale or grand-mean centered predictor at Level 1 is an unbiased estimate of these effects. But there is seldom cause to assume these effects to be identical a priori. Another special case is where there is no between-group variance in the predictor, due to balancing across clusters by design, in which case the estimate will resolve to the within-group effect. An example would be in a longitudinal study where the time scores are the same for all people because the assessment schedule is identical across participants and there is no missing data.

————————————

Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person effects in longitudinal models of change. Annual Review of Psychology, 62, 583-619.

Enders, C. K., & Tofighi, D. (2007). Centering predictor variables in cross-sectional multilevel models: a new look at an old issue. Psychological Methods12, 121-138.

Kreft, I. G. G., de Leeuw, J., & Aiken, L. S. (1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1–21. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Sage Publications.

Related Articles

A reviewer recently asked me to comment on the issue of equivalent models in my structural equation model. What is the difference between alternative models and equivalent models within an SEM?

An equivalent model can be thought of as a re-parameterization of the original model. In other words, it is just a different way of “packaging” the same information in the data and no equivalent model can be distinguished from another based on fit alone. If you were to fit a series of equivalent models to the same sample data you obtain exactly the same chi-square test statistic, RMSEA, CFI, TLI, and any other omnibus measure of fit. It is often best to treat this as a limitation of any given study and to potentially present one or a small number of equivalent model options to the reader so that these too might be considered as plausible representations of the data.