The Misconception of Online Polling

I am not going to act like I have a heap of knowledge about political polling specifically, but I consider statistics my forte and have an understanding of the basics of polling. In about 30 days, the American people will head to the poll stations to vote for our next president. As the time nears, people are focusing more on polls and there is too much misinterpretation and misinformation about this topic to ignore. The biggest offender is using online polling as legitimate proof of candidate support. Lets start with the basics.

Imagine you have a million marbles in a tank, either with the color red or blue that are randomly scattered throughout the tank. You want to find out the proportion of marbles that are red and that are blue, but counting a million marbles would take up too much of your time, perhaps years. Instead, you take a cup and scoop out a few dozen marbles. Then, you count up the number of red and blue marbles and calculate the percentage of red and blue marbles in your sample. You conclude that these percentages will be close to the percentages if you sat down and counted all one million marbles, because you are confident that the few dozen marbles represent the whole tank of marbles.

This is exactly how polling works in a general sense. In a perfect world, pollsters could ask every eligible voter in the United States who they want the next president to be. Of course, this is not viable because it would take an immense amount of resources, money, and time to get hold of 125 million+ people. Instead, pollsters will take a random sample of the American public and extrapolate this data to the entire population. Keep in mind the term “random sample” because it is very important for something I will talk about soon. But you may be asking yourself why every poll does not show the same numbers. There are so many reasons for this which will bore you to no end, but I will list a few of them: method of polling, distribution of demographics, wording of the questions, random chance, likely vs. registered voters, the list goes on. However, with a collection of polls you gauge the American public mind.

Recently, I have seen a lot of people ranging from friends on Facebook to Donald Trump’s Twitter account using online polls as “proof” that a certain candidate is going to win the election or that they won the second presidential debate. Online polls are beyond flawed for one simple reason: self-selection bias. Lets imagine that the Drudge Report runs a poll on their site that asks the following question: “Who will you vote for in the upcoming presidential election?” The numbers turnout to be 80% in favor of Donald Trump, 20% in favor of Hillary Clinton, and “What is statistics?” in favor of Gary Johnson. That is some shocking results! You might come to the conclusion that this will be the largest landslide in US history, even larger than FDR in 1936, with Donald Trump getting 60% more support than his opponent.

If you do not know anything about the Drudge Report, it is a news aggregation site with a heavy conservative readership and bias. So when you take a poll on this site, you are grossly oversampling conservatives and the online poll will be extremely biased in favor of Donald Trump. This is not an accurate assessment of the American public because people self-select themselves to vote in these online polls. In other words, people choose to vote in these polls. People who self-select themselves may have a stronger passion for a certain candidate, and are more likely to vote a certain way. But the strength of your passion does not mean more votes in the real world. Someone who is highly invested in the election has one vote, the same as someone who has to be dragged to the polling station to vote. This is not even counting the fact that mostly only conservatives will see this poll, making it even more biased and false.

In short, polls on sites like the Drudge Report are not legitimate is because it does not give an equal opportunity for everyone to respond in the United States. Instead, on a general level, legitimate polls use a random sample of people in the United States. A random sample is when each person in the population (in this case, the population is defined as all eligible voters in the United States), white or black, male or female, has the same chance of being asked their presidential preference. This eliminates most forms of bias by giving each person an equal chance to be chosen despite their beliefs, race, gender, height, weight, socioeconomic status, etc.

If I run a poll on Facebook asking “should coal mining be outlawed?”, people who are coal miners, have family who are coal miners, or live in coal miner areas will respond much more (obviously not in favor of outlawing coal mining) than people in New York City where coal mining does not affect their daily lives nearly as much. This example is a bit ridiculous, but it is the same premise as polling on conservative sites like Drudge or liberal-leaning sites who run similar polls. People who are more passionate about a topic are more likely to volunteer themselves to vote in optional polls, and these more passionate people are almost certain to lean a certain way.

Online polling will give you a false sense of what the American people actually think. For people who do not know any better, they might think that Donald Trump is going to have a landslide victory. As a result, they might not try as hard to persuade friends to vote for him, or might not volunteer to help his campaign. This is damaging to his campaign, especially when every reputable poll right now has Clinton winning most swing states and on the national level. You can try to convince yourselves that every pollster is part of some Democratic conspiracy to get Clinton elected (which would be odd, because that would only hurt her for the reasons mentioned above), but you would be fooling yourself.