“We increased our conversion rate by 1500%!” aka lying with data.

One of the most unethical practices in business is using data to lie. Unfortunately, though, it’s not uncommon. I’ll show you three examples you might encounter quite often.

Tomi Mester
Data36
7 min readDec 5, 2016

--

My aim with this article is not to point fingers at anyone, but to encourage you to be more critical when you see headlines like “we increased our conversion rate by 1500% in 1 day.”

I see examples of lying with data way too often and as a data analyst it really hurts my feelings. Data is your best ally in finding the truth. When you use data to lie, it’s like saying to someone, “I swear this is the truth.” Then lying to her face.

#1 “We increased our subscription rate by 200%!”

This is my favorite one. I guess you’ve also seen different online solutions and services promoting themselves with testimonials like:

“I’ve just tried ________’s solution and it instantly increased our conversion rate by 200%!”

Or:

Might be true. There is no question about that. Still, I’d like to see some additional information about this incredible success.

A) What was the exact situation and what were the exact numbers? Let’s say, the customer (call him Jimmy) originally had a newsletter subscription field at the very bottom of his website consisting of a grey “Subscribe” button without any description. Each day, out of 10,000 visitors, 1 person subscribed. Then Jimmy implemented a big pop-up on the site. This resulted in 3 subscriptions out of 10,000 visitors per day. Has the subscription rate increased by 200%? Yes. Is it a significant change? I don’t think so…

B) Can we see more numbers, please? Even if the targeted conversion rate changed significantly (let’s say, instead of 100/10,000 it became 300/10,000), how did this change affect the other numbers?
Did the bounce rate change? Did the newsletter open rate change? Did the sales number change? It’s really easy to push one number and then imply success with that one number. But an online business is usually much more complex than that and you need to check all your important metrics.

C) And finally: what was the research method? Did they A/B test the solution? If they did, what were the different versions and how did “A” perform against “B”? And if they didn’t A/B test it, how can they prove that this success is not the result of seasonality or anything else (e.g. Christmas season, an accidentally viral article, organic growth, etc.)?

Let me give you a personal example. This is the traffic of my blog (data36.com) in the last few weeks of 2017. I implemented an exit-intent pop-up on the site on the 3rd week of November:

data36.com traffic

But this is not what caused the increased traffic — even if the picture suggests that. In this case the traffic caused the pop-up: I implemented the exit-intent pop-up because I started a Facebook campaign and I expected increased traffic!

As you can see, this is a very tricky way of lying with data, because it is not exactly lying… just leaving some pieces of information out.

(Note: just to make it clear, I’m not saying it’s pointless to use exit-intent pop-ups. I picked this for the sake of example. I could have picked anything else.)

#2 “Look at the charts!”

Another nice way to lie with data is to lie with the visualization. The most basic “trick:” 3D charts! How big is the grey slice?

At first glance I’d say, same size as the blue. But check out the same pie chart in 2D!

In reality the blue one is bigger than the grey one.

Even the great Steve Jobs used this little trick when he needed to! (Look at the green vs. the purple fields!)

The advanced level of lying with a chart is the “truncated graph trick.” If you couldn’t hit the expected growth at the end of the year, you can still show some misleading bar charts to your boss:

But the reality is:

Do you see what the trick is with the first graph? Yes, the Y-axis does not start with zero.

And finally here’s my personal favorite in the lying-with-charts category. Fortunately it was only a joke. When Romania beat Hungary in a soccer game, a funny Facebook site introduced the best data visualization to cure the pain of all Hungarian soccer fans.

Even if it was just for fun, it’s still a really good representation of how easy lying with data visualization can be.

#3 “That’s what people say.”

I see, from time to time, university students doing online “research” about different topics. And when I say online research I mean surveying. And when I say surveying I mean posting Google Forms on social media and asking friends to share it.

This is very dangerous, because most of the time the “researchers” don’t realize that they are lying to themselves, as well. The most common issue is something that we call “selection bias.”

Let’s take a simple example. Clara (fictional character) wants to run a study to find out how much time people spend in her university on learning, doing sports, or doing other activities. She creates a survey and tries to get as many answers as she can.

Problem A) If she sends this to her friends and asks them to share it on social media, it won’t represent the university’s population, only her friends and maybe the friends of her friends. This is because she reaches her friends (and friends of friends) with a much higher probability than everyone else.
Do you see the issue?
If Clara is sporty, then she will have friends from the basketball team (let’s say), and the survey results will show that people are sporty at the university. But in reality the only finding is that Clara’s friends are sporty — no surprise, as she’s sporty herself. On the other hand, if Clara is a member of the local Book Club, then maybe most of her friends are focusing more on reading and learning. So no surprise that the survey results are, that “all the university students” are focusing on learning as well…

Problem B) Let’s assume that Clara is smart enough not to share her survey with her friends. Most people choose another option: posting it into a relevant Facebook Group where all the students of the university are members. Her research is still biased!
Why?
Because the sporty and learny people spend less time on Facebook, so maybe she will have fewer survey answers from those people. And more from those who are just hanging around on social media all the time.

So what’s the solution? My honest opinion: never do surveys…
Well, that sounds a bit too radical, right? So I have to mention that there are statistical sampling methods that professional polling people know really well. (Though it seems they never use this knowledge when it comes to presidential elections… ;-)). And Clara can learn about them too! If she does, she won’t post Google Forms on social media anymore. She will do something more scientific instead!

Conclusion

There are a plethora of other ways to lie with data. These were only three off the top of my head.

If, reading this article, you realized that you’ve also used data to lie? Don’t worry, we’ve all done that in the past (hopefully unintentionally). Again: the point of this article was not to point a finger at anyone, but to raise consciousness about the issue:

Don’t let yourself be scammed with data!

Be critical and think when you see statistics! Always dig into the details and try to understand every aspect of the research you’re reading about! Plus, always learn about the research method itself!

Enjoyed the article? Please just let me know by clicking the 💚 below. It also helps other people see the story! And if you know more stories about lying with data, please share them in the comments!

Tomi Mester
author of data36.com
Twitter:
@data36_com

--

--

Tomi Mester
Data36

Data analyst @Data36. I create in-depth, practical, true-to-life online tutorials — and video courses to help people learn Data Science. https://www.data36.com