Whether colors in the rainbow, notes in a musical scale, types of coffee, or country versus pop singers, there is a natural human desire to categorize objects and experiences. A story that recently appeared in the New York Times by Tom Vanderbilt presents a wonderful exploration of how we all find comfort in defining, seeking out, and confirming categories in what he calls the “psychology of genre”. Within many research applications it is challenging to know when categorizing individuals is appropriate, or how best to discern these categories with the data at hand. Extracting categories when variation is really continuous presents risks, but so too does failing to identify meaningfully distinct subgroups of individuals within the population. Further, many approaches exist for empirically identifying subgroups, including cluster analysis, latent class analysis, and finite mixture modeling. These techniques bring a level of statistical rigor to our natural desire to categorize, but can also be complex to implement in practice. As an accessible initial resource on this topic, we recommend Everitt et al. (2011) Cluster Analysis (5th Edition), published by Wiley.
Selecting the number of classes (or components) is one of the most challenging decisions to make when fitting a finite mixture model (including latent class analysis and latent profile analysis). In this post, we talk through the conventional wisdom on class enumeration, as well as when this breaks down.