A Note about MLM and YouGov UK Polling

Much note has been made about how YouGov poll, utilizing multilevel modeling techniques, made the “correct” prediction for UK elections. However, as much interest there is about MRP, as they call the multilevel modeling technique, some peculiarities of the methodology need to be noted.

The single most important dimension of the technique is the part about “post stratification.” The methodology operates at two dimensions: it predicts the vote choice probabilities for different types of voters (which can involve fairly simple model like logistic regression, not something fancy — but you can’t do SVM with it, unless you get more creative, which, I think can be done), but layers this on top of estimated voter composition of different types in each locality. In effect, this involves two separate set of predictions: prediction of turnout for each locality, and, conditional on turning out, the vote choice. So, a locality that is made up of 60% voters of type A and 40% of type B, would be predicted to consist of the electorate of 50% type A and 50% type B, if the turnout for type A is estimated to be 66.7% and that for type B at 100%. If 90% of the type B vote for candidate 2 and 70% of type A vote for candidate 1 (and the rest vote the other way), the candidate 2 is estimated to receive 45%+15% = 60%. The basic logic is simple.

The real challenge, then, is actually about having the right data and being able to model the bottom layer — estimate the turnout for different subsets of voters, given the particular election. Asking the right questions of the polled voters can be particularly important: you want to know who will be turning out, as much as what their choices are. It may be possible to construct a model of the electorate without having the right polling questions, but in an election where the turnout is likely to be rather different from usual, even if only on subtle subsets (I speak from experience having used the multilevel methods on 2016 US elections, without having good data on turnout intentions and having cobbled several crude not very plausible — at least the way it seemed at the time — assumptions about voter compositions in different states. You want to have as good a data as you can about voter composition if you want to post stratify and expect to get reasonable “predictions.” One could simulate many different scenarios via MLM…but you still need some real data to assign them probabilities.) The real lesson is not that MLM will necessarily give you better predictions, but that it can if you define the problem and approach it appropriately, with the right data.

Observe that this is classic small data problem, nested inside the big data. We don’t want to make big prediction based on aggregated data. We want to make many small predictions based on small subsets of data inside a big chunk of data, and build up a big prediction on top of many small predictions. It’s an approach, a way of thinking, not simply a “methodology,” and perhaps something more “statistics-y” in nature rather than “data-science-y.” Given its nature, where many subsets of the data will have very small, relative to whole, at any rate, number of observations, many of the usual “training and validation” techniques simply would not work, without doing massive violence to the data — unless one gets more creative than not.

Observe also that, for elections in US and UK, also, the nature of the electoral rules favor this approach: elections are contested in many small units, each of which are relatively predictable for the most part, even if the aggregate numbers may not be. In a sense, this takes advantage of the kind of problems that arise in context of Simpson’s Paradox and turns them upside the head, at least in a context where that is relevant. It is applicable elsewhere, certainly, but only if one considers the appropriate subunits where the interesting trends would arise. Or, in other words, properly taking advantage of MLM requires knowing the context, not just formula.