Our (Bumpy) Road To Self Service Analytics | QueryClick

Sam Morris
QueryClick Tech Blog
6 min readFeb 9, 2021
Photo by Jason Coudriet on Unsplash

This post is part of a series where we outline our progress in the creation of our internal analytics platform both technologically and from a cultural perspective.

At face value, self-service analytics seems so simple. Get some data, make it accessible, extract value from it, but it doesn’t always go like that.

So far our adventure into this area has been interesting and what we currently lack in deliverables we make up for in learning.

Data at QueryClick

To help understand our learning progress, it’s important to understand how our department works and where our data comes from.

Our data comes from 3 places:

  1. Third party products, be that an API or tool. An example of our standard third party data providers are Google, Facebook, Taboola.
  2. Our pixel, which we deploy onto our customers’ sites.
  3. Information inputted by our users via our Corvidae product.

What is Self Service Analytics

Self-Service Analytics is a form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support. Self-service analytics is often characterized by simple-to-use BI tools with basic analytic capabilities and an underlying data model that has been simplified or scaled down for ease of understanding and straightforward data access. — Gartner

Self Service vs. Internal Reporting

Depending on your background, these two options may be the same and, to that end, it is important to differentiate them as they cross many of the same paths. We already have an example of what self-service is from Gartner so let’s try and differentiate them with the below scenario.

You’re a business user who wants to view your client’s stats for an ad campaign, this ad has been active for the last 6 months so there is a fair bit of data for you to analyse.

Self Service Approach

Taking Gartner’s example, I, as the user, would be able to go to a data catalogue and find the relevant data. From there I would go to the lovely user-friendly table and connect it to a business intelligence solution that I would then use to generate insight.

Internal Reporting Approach

As a user, you log into a web application that contains a collection report, each report contains a collection of charts and tables. It is with this data that you need to use to extract insight.

Upfront Effort

The upfront effort for full-blown self-service analytics and internal reporting is, strangely, very similar. From what we have experienced so far this is because there are only a handful of steps that differ.

Using the example of a user who has found a site that contains useful data for some analysis, and has chosen to scrape this data from the site daily.

Collection

This will most likely start on your computer, each day you will manually run this process to collect data from the site and store the data somewhere accessible.

The real challenge comes when you are trying to remove the dependency on your machine as that normally comes with a conversation with an IT team which, depending on your organisation, can kill your dreams there and then.

Storage

Store the data somewhere cheap, accessible and most importantly, appropriate for what you are collecting.

Sticking with our data scraping example this could mean a chat with your technology department to assist you with storage.

Make it user friendly

Now you have the data being collected on a schedule and is stored in a place that is accessible, you now have the job of actually making the data user-friendly.

This may include changing column names, applying standard formatting, adding some bespoke metrics or converting epoch into a date format that means something to a normal human.

Apply Business Logic

Similar to the process of making your data user friendly you will also have the task of making it business-friendly.

This is the point when you ask questions like

  • “Does keywordA mean keywordA to everyone?”
  • “Should all monetary values be reported in US dollars or pounds sterling?”
  • “Do we trust the metrics from this site or do we check them against our internal processes as a quality check?”

Make sure it doesn’t break.

You need to make sure that it is repeatable, tested, and won’t cost you a small fortune to run.

If you’re spending a couple of grand a month on your Databricks cluster, due to continuous rerunning of data through your pipeline, then the value of the process diminishes.

Make it accessible

This could be as simple as putting it in SQL Server, but you will also need to get into data modelling, populate fact and dimension tables, and there may be a need to mask certain data; for example, a user’s email address or location.

Depending on what choices you have made in the process processing your new information you may get some quick wins, connecting your Azure storage solution to PowerBI is quite simple and allows for interactive visualisations to be built.

Repeat

New requirements will be created, new metrics will be added. Maybe new datasets will be found.

Lessons Learned So Far

We have had a couple of opportunities to try the self-service analytics approach they left much to be desired, the lessons so far are:

Concentrate on value

Value is a vague descriptor and may change depending on what your current focus is. The way I currently measure value is in the decoupling of data and processes with the target being that one dataset should feed many reports.

Value to you may be to save time, changing an excel report to have data feed via a SQL database instead of a human may be the important first step.

Start small with a specific goal

Taking into consideration the section on upfront effort, there are many layers to delivering analytics. Utilising frameworks like Agile can help with the delivery of your insight. Merging this with your definition value can give you a clear path to your deliverable.

A goal for us recently was to move a report that was generated within SQL Server outside of the database and in our Data Lake, this lead to much easier management of the dataset.

Data can be accessible and also not be.

Simply by making data accessible to others doesn’t make it easy to use.

During one project we used AWS’s Athena product to serve our collected clickstream data through PowerBI. The engineering effort was low but the usefulness of the data was also low due to the lack of business rules applied to the output.

Release what you can

If Facebook and Google can release an API to the public and caveats that it could change at any point you can release your dataset when you think it has enough value.

This may seem contradictory to the above point but if only a handful of columns are difficult to interpret then you still have something of value.

Be prepared to support

Granting users access to data, in the most suitable way to them, will depend on your business.

Jumping to default methods like a SQL database may prove to be more harm than good as your users may not be familiar with it. Whatever delivery method you chose for your first round ensure you have enough time to support people using your choice.

Our Progress So Far

The lessons learned have come from a couple of attempts at delivering data along with numerous conversations that begin with “Can we have this data please?” and are responded with “Maybe, it’s not really set up like that though” but we are progressing.

Technology Progress

So far we have most of the groundwork built, but this came as a byproduct of creating our Corvidae product. The main aspect we are missing at the moment is the data catalogue. Azure Pureview looks intriguing and could fill this gap.

Our next big hurdle after that is probably our actual analytics stack. By that, I mean how we will allow the user to access the data.

Culture Progress

Culturally, I think we have progressed tremendously. Making data accessible to the rest of the business is something we keep in mind during the development of our pipelines.

We still have a bit of work to do on understanding business rules and applying our own constant standard too. For example, we still don’t have a data dictionary to build around.

While our data is more accessible now than it was before, we are yet to use it as part of an internal product. Given some more time and possibly a couple of proof of concepts, we will have another shot at this soon.

Appendix

Gartners definition of Self-service analytics

Gartners definition of Self Service BI

--

--