Analyst’s nightmare, or how to lose $20,000 every day

Maria
IO Technologies
Published in
3 min readJul 15, 2015

Knowledge is power. If collected data is analyzed correctly, it becomes a powerful tool in analyst’s hands and leads a project to success. To the contrary, if data analysis is performed improperly, it becomes a real nightmare for a project.

The first and the biggest problem is data sampling. Most web-analytic services use it when it comes to great amounts of data in order to reduce processing power loads. This means that not all data is analysed, but just a little part of it.

But you can’t judge the amount of cheese holes by one piece, right? How about losing $20,000 every day just due to sampling?

This happened to a company we used to work with. Let us call it Sampling Victim (SV).

How to lose money with sampling

SV uses Google Analytics for data analysis. Their site has an average of 2 million visitors a day. It is a huge load and GA is naturally applying data sampling. Every day SV buys 50 thousand visitors, $2 each. Thus, advertising costs per day amount to $100,000.

The average value of registered paid traffic conversion was 25% according to Google Analytics. But when they used t.onthe.io service, which does not use sampling, the average conversion turned out to be 20%.

This means that some data got lost or distorted by sampling. Because of this SV was losing $20,000 a day.

How to avoid sampling in GA: tips

As we can see, sampled data doesn’t always objectively reflect the situation. There are several ways to avoid sampling.

1. GA premium account

If you have a premium account, Google will provide unsampled data up to 1 billion hits per month. But this account costs $150,000 a year.

2. Reducing sampling date range

If a large time period (e.g. one year) is used for the report, then Google is likely to sample the data. To prevent this, the time interval should be divided into smaller periods, for example, months. And then all months can be summed manually.

3. Increase precision

You can increase sampling precision in GA settings when generating the report. Inaccuracy of represented data will be much lower, but won’t be reduced to zero.

4. Data segmentation using views

Configure multiple data views. For example, if the website has 10 main sections, you can make 8 data views that will receive data from separate allocated channels. The website still has the same 2 million hits per month in general. And each section receives 200,000 hits. Consequently data should not be sampled for each section. But again the downside is that you’ll have to manually merge the analysis data for the entire website.

You can also use Google Analytics Query Explorer tool or R language scripts.

Alternative services

Another way is to try services that do not use sampling. For example t.onthe.io, StatHat, Librato, Sumologic.

--

--