What is the margin of error?
As I write this post, we’re a few weeks away from presidential elections in the United States. It’s a time of great anxiety, and a time where conversations about polling come into the collective zeitgeist. In the past few elections, trust in polls have somewhat decreased, with many calling out what seems to be an increase in error. But what do we mean by error? Are there different types of error? Can we learn anything about statistics by studying them? In this article, we’ll explore these questions.
The perfect poll — and the next best alternative
There are two major candidates running for the presidency of the United States this year: Donald Trump and Kamala Harris. So far, polls have shown a close race. But we know polls in previous races have been off, sometimes by quite a bit. It’s possible the current polls are too. But what would it take for us to have a poll without any error — a poll that could tell us with perfect certainty how every American would vote?
Well, unfortunately there is only one way to get there — we would need to poll every single American that is going to vote. Millions of citizens, across fifty states. If we did manage to do that, we would have surveyed what we call in statistics the population. A poll that surveys the entire population, mathematically, would be a poll without statistical error; a poll that gives us perfect certainty. Alas, polling the entire population is both impractical and virtually impossible. Therefore, we must resort to the next best alternative: sampling the population. Doing so will allow us to retrieve a prediction, with the important caveat that we inherently accept a statistical margin of error; an imperfection. But how big is that margin of error? And before we go into that, you might even wonder — why does sampling even work at all?
Why does sampling work?
The field of statistics rests upon two important mathematical principles that make sampling work as well as it does. The first principle is the Law of the Large Numbers (LLN). It tells us that as the sample size increases, the sample average (mean) will get closer to the population average (mean). It helps to visualize this principle with an example.
Let’s pretend we want to estimate the average height of all 10,000 students in a school, but we only have access to a single classroom with 20 students. If we calculate the average height of the 20 students in that classroom as an estimate of the overall average, we would probably be pretty far off. But, what would happen if we had access to over 100 students across several classrooms? What about 2000 students across several floors? Indeed, the larger our sample size, the more our estimate for the average height approaches the true average height if we were to survey all 10,000 students:
The Law of the Large Numbers ensures this convergence is true, although it doesn’t tell us anything about how big our sample really needs to be to get there. That part, however, can be worked out through the second principle at hand — the Central Limit Theorem. This principle tells us to look at the distribution of each sample, in our case the classroom, and guarantees us that as our sample size grows larger, our distribution will look increasingly more like a normal distribution with a fixed shape. So let’s pretend each data point from the survey above is a sample and we now plot the mean of each sample in a histogram. This is what we’d have as the sample size increases:
The shape made by sampling a population and plotting the average values is what we would call the distribution of the sample means. Regardless of the shape of the actual distribution of everyone’s heights in the school, the Central Limit Theory guarantees us that the distribution of several samples from the population will approach a normal distribution and will yield a symmetrical shape. This is great news for us because this symmetry allows us to use all kinds of mathematical tools that we would otherwise not have. Those tools will be extremely important for our calculations.
How much error is there in sampling?
Clearly, sampling works and can be used effectively for many things — from estimating the average height of students all the way to polling. But, it comes with the caveat of error. Generally speaking, we can break down polling error — and more generally, any kind of survey or study error — into two categories: statistical error and sampling error. These are very different, and we can only truly quantify the former. So let’s start there, with statistical error.
The first type of error, statistical error, can best be quantified through the margin of error, a value that represents the range within which the true population average is expected to lie given a desired confidence interval. This last part is key. It means that depending on how ‘confident’ we want to be, we will get different margins of error. Here’s a good example to illustrate this tradeoff. Let’s say you are looking for a rare bird in Brazil. You think you pinpointed the precise location of this bird to some small city in the center of the country. But if I ask you how confident you are that the bird truly is around there, you would ask me to define what around there means. I could be strict and say I want you to make sure the bird is within a few hundred miles of the location you suggested — maybe 10% of Brazil’s area. Or I can be more relaxed and say I it could be thousands of miles within the location — maybe 90% of Brazil’s area. It would be more helpful for me to know with more certainty, but the more certain I ask you to be, the bigger the margin of error.
Conventionally, when estimating margins of error, we are talking about a 95% confidence interval. In the example above, that would translate to us saying we expect the true location of the bird to be within 95% of the area of Brazil around where we said we spotted the bird.
Confidence Intervals and the Z-Score
When we look at polls, there is an implicit assumption that we would like to know the margin of error with a 95% confidence interval. But why 95%? The answer is that it is a convenient number. To see why, let’s go back to the shape of the normal distribution. A few paragraphs ago, I told you the symmetric shape of a normal distribution is mathematically convenient. What I meant by that is:
- The normal distribution can be split right in the middle, and then each half can be split again and again, therefore creating neat symmetric regions we call deviations.
- The area under the curve can be normalized to equal 1, the maximum probability for anything.
Combining both facts, we can use the shape of the distribution to link up the area under the curve and the regions of the curve to their associated probabilities. We’ll call the separation between each region a ‘deviation’, and we’ll choose the center of the curve as the mean, where the deviation is zero. Putting everything together, we get a beautiful diagram that tells us how much area is covered within each deviation and how many deviations away from the mean correspond to 95% probability.
When we choose a confidence interval, what we are really choosing is what standard deviation cutoff we need. In the case of a 95% confidence interval, as can be seen above, that would represent to a cutoff of roughly 2 standard deviations. To be precise, two standard deviations covers 95.44% of the area, so if we want exactly 95%, we use a standard deviation of 1.96. Our chosen value for the standard deviation is called the Z-score! We’ll use it shortly to calculate the margin of error.
The margin of error
To calculate the margin of error for any estimate, we need to calculate the variability of the estimate scaled to the confidence interval we want. This can achieved by multiplying the Z-score value for our desired confidence interval by the standard error of our sample. We just determined above what the Z-Score is. But what is the standard error? The standard error, intuitively, is asking ‘how much variance do we see in this sample controlling for the sample size we have’? In the case of polling, where we have choices, the standard error is calculated this way:
Putting all the pieces together, our final calculation for the margin of error can be derived by multiplying the standard error from above by the Z-Score, which is a fixed value contingent on our desired confidence interval. Our final equation for MoE would therefore look this like:
Calculating the margin of error for real polls
Finally, the fun part. Let’s calculate the margin of error of real polls this election season using the equation above. You can use this method for any poll you come across — the math is the same! For this exercise, I am using polls from Real Clear Polling and I am selecting the most recent three polls conducted: one by Reuters/Ipsos, one by Rasmussen Reports, and one by Morning Consult.
The Reuters/Ipsos poll surveyed 769 Americans. In this poll, 47% of respondents said they would vote for Kamala Harris, and 44% of respondents said they would vote for Donald Trump. To calculate the margin of error for this poll at the 95% confidence interval, we simply pick either candidate’s proportion and plug that in for p. Let’s go with Kamala’s value of 47% for this exercise. We then plug 769 for our sample size n, and finally, we plug 1.96 for Z. Doing so gives us a margin of error of 0.035, which means, 3.5%.
What does this mean? It means that the statistical error from this sample is 3.5% in either direction — plus or minus. This means that while 47% of respondents said they would vote for Kamala Harris, it would be reasonable to expect that value is in fact between 43.5% and 50.5%. The same margin of error would apply for Donald Trump’s proportion, meaning we could reasonably expect from this poll that the true share of votes he’d get is between 41.5% and 47.5%. This means that Kamala’s lead of 3% over Trump in this poll is lower than the margin of error, and therefore, statistically speaking from this poll, they are in fact tied.
The Rasmussen Reports poll, in contrast, surveyed more Americans, 1948 to be precise. They found in their poll that Donald Trump had 48% of the vote intention, against 45% for Kamala Harris. Therefore, the margin of error for this poll is 2.2%. This margin of error is below the lead by Donald Trump according to this poll, so from their perspective, Donald Trump has a larger proportion of votes. But an even larger poll by the Morning Consult, with 8647 Americans, found Kamala Harris scoring 50% of the vote share, against 46% for Trump. Their margin of error is 1.05%, and so from their perspective, Kamala Harris has a larger proportion of the vote.
So who do we believe? Well, looking at several of the most recent polls in Real Clear Polling, we see that only in 4 polls do we have a candidate lead above the margin of error. Indeed, if you keep scrolling on their website, you’ll see most polls conducted over the last few months will indicate that. Reputable websites will aggregate poll results and report their averages instead. This is a good way to try to cancel out some statistical errors. But, even so, it paints a tight race. And, of course, statistical error is not the only type of error we ought to consider.
Sampling Error — a different kind of beast
Finally, let’s talk a bit about sampling error — the second kind of error we said we’d discuss. Calculating the margin of error is a great step towards understanding the limitations — and also the validity of polling. But it is not the full story. In 2016, and to some degree in 2020, polls were off by a greater margin than the margin of error. When this happens, we attribute polling error not to statistical error but to sampling error. Sampling error indeed is not a quantitative measure — it’s a conceptual one. It describes problems that can occur in the sampling methodology that might lead us to a sample that is not representative of the population.
While more polling can be done online today, much of polling is still conducted over the phone and that very fact creates a methodological error because those who are more likely to pick up the phone might not be those who are also more likely to vote, or the general population at large. There are other factors at play here too such as where the poll was conducted, which states were included, what were the levels of education of the respondents, and also their willingness to divulge one’s political opinion to begin with. All of these factors force pollsters to apply corrections, some of which can end up adding even more error. That is all to say — polling in 2024 is complicated, but more sensical when we interpret them through the context of their errors, statistical and methodological alike.
Conclusion
Polling is statistical in nature. It is an imperfect, but powerful estimator of voting intentions. Its methods teach us much about statistical thinking, from understanding the law of the large numbers and the central limit theorem, to calculating margins of error. Learning about the statistics of polling might not entirely wipe away the anxiety one might feel about elections, but hopefully, it can give us a better sense of control in how we should interpret polls, and how much we could trust them. Far beyond polling and elections, these statistical lessons can also be applied to further our understanding of the world more broadly — from general human behavior, to small and large scale physical processes. And those, thankfully, can be studied any year — and with far less anxiety attached to them.