How to avoid Google Analytics sampling when creating custom reports

Adrien Auclair
Serenytics

--

Sampling is the practice of analyzing a subset of all data in order to uncover the meaningful information in the larger data set. As an example, try to count the cherries you see in the above picture. That takes a lot of time. A sampling approach would be to divide the image in 8 parts of same size, quickly count the number of cherries in one part (i.e. the subset) and multiply the result by 8. That’s a lot faster, but you’ll only get an approximate value.

When querying data from Google Analytics to create a custom report, Google might use a sampling approach to deliver faster answers. This means that the data you obtain is not based on all the sessions but on a subset of them. This is unpredictable and can lead to completely wrong reports.

If the subset used by Google Analytics is too small, the obtained values can be very different from the real ones. In many situations, you want to be sure you’re working with exact values. For example, if you’re analyzing the evolution of your e-commerce site revenue by device, for the last 12 months, you can’t accept a 20% error margin.

Using non-sampled data in Serenytics

To build custom reports using exact values in Serenytics, we’ve added a “refuse sampled data” option in our Google Analytics connector. With this option enabled, you’re sure the data you get is not sampled. If Google Analytics returns sampled data, you’ll see an error message instead of the data.

If your query generates sampled data, there are three ways to fix the issue:

  • use a smaller date range. Google Analytics will sample if your query targets more than 500k sessions. And depending on the query, this threshold can be much lower in practice.
  • simplify your query (less dimensions, less metrics, avoid custom segments…)
  • run a non-sampled query multiple times and accumulate the results in the Serenytics internal data-warehouse. For example, if a particular query spanning a full year is sampled by Google Analytics, try to launch 12 queries, each one querying for one month of data. This can be easily setup with Serenytics.

Another option to avoid sampling is to upgrade to Analytics 360. With 360, the threshold is 100M sessions before sampling is activated. But as far as I know, the price is above k$100/year.

Conclusion

Knowing in advance if Google Analytics will sample or not your data for a given query is almost impossible. With the option to refuse sampled data in Serenytics and the possibility to store multiple non-sampled queries in our data-warehouse, you can trust the reports you create and postpone the time to switch to Google Analytics 360.

--

--

Adrien Auclair
Serenytics

Serenytics Founder - Planorama Founder- PhD in Computer Vision - Entrepreneur & coder