Data Studio: We can do much more than it seems

Some ideas to make better use of this tool and explore many more visualizations than you imagine!

Emi
spikelab
Published in
9 min readMay 27, 2021

--

In a generic machine learning problem, once the initial data processing is done, the next step is probably to visualize the result and go back to a new data cleaning, and so on several times. This is because seeing the result and being able to inspect it from different angles allows us to find different details that we did not see in a line of code. This is why in this article we are going to see how to make some visualizations in Data Studio, due to how easy it is to go from Big Query to this tool.

To get started, I want to mention that a (very) basic knowledge of Data Studio is required to understand what we are going to do, how to connect data sources to a report and add charts/controls to it. In addition, we will take as example the data used in the article I wrote a while ago about Big Query (to which we add some extra columns, as we see in figure 0).

Figure 0. Sample of the data used in the article.

As an additional comment and using the link mentioned above, it would be a good exercise to take the different results obtained in that article and make different visualizations when finishing reading this, combining the new Big Query AND Data Studio tricks.

The “default” plot

If we have a database of transactions, sales, or any other that has temporary information, it is probably a good idea to make a column chart or histogram grouped by date (day or month, for example), to see how sales moved.

In the case of Data Studio, this graph can be added directly with a “column chart”, with the parameters specified in Figure 1. Note that the metric (Record Count), in this case would be counting each row by Month and by Shoes , so Record Count would be each transaction per month and per type of shoe.

Figure 1. Parameters of the Column Chart

The graph is then as seen in Figure 2, where are the transactions per month, separated by type of shoe. However, there are two issues to consider. First of all, we are looking at the data by month, regardless of the year. As we know, there are transactions for 2019 and 2020, so when we see January (or any other month), we have information for January 2019 and 2020, in the same bar. And secondly, we know that there are two types of store, A and B, that again, we are looking at them together. This type of chart does not allow further grouping.

Figure 2. Column chart for stores A and B of the transactions in 2019 and 2020

Controls

In the case of year and stores, there is a first solution: add controls. By adding controls with the variables that are not in the graph as parameters, you can finish choosing what is being displayed. In this case, the variable year and store are missing, so we add those two.

In addition, if it is important that two years or two different stores are never added together, the initial value or the default value of the control can be chosen (Figure 3, left), and that it takes a single value at a time, with single select (Figure 3, right), that is, it can only be 2019 or 2020, but not both at the same time, so we will never see the transactions of the two years together.

Figure 3. Parameters for the year control

When considering the controls, Figure 2 becomes Figure 4 for store A in 2020. It should be noted that the combination of the control and breakdown dimension could be the other way around: the store in the graph (colors) and the type of shoes in the control.

Figure 4 (GIF). Filtered column chart for store A of the transactions in 2020

Date format

For the issue of the months that appear together regardless of the year (Figure 2), there is another way to correct it, as long as that is not what you want to show. We can add a new column to the dataset, either in Big query or in Data Studio (Figure 5), which allows us to sort by year-month and not just month.

Figure 5. Data Studio editor to add a field. In this case the new field is “year_month”.

In Figure 6 we see the result of this change in X-axis variable, which went from “month” to “year-month”. In any case, the year control is preserved, because if it is filtered only for 2019 for example, we would be left with Figure 4 again.

Figure 6 (GIF). Filtered column chart for store A of the transactions grouped by year-month

Dare a try!

Although the issue of the months/years could be solved with different approaches, some simpler than others, there are times when we have a visualization in mind but the options that Data Studio gives us are limited. However, there are always alternatives. They are probably not explicitly in the options board, but you have to find a way.

Combo chart

For the stores, we were able to solve with the control to see them separately or both together, but not independently in the same graph. If we stay on the bar graph, we would have to remove the type of shoes, to enter the type of store and the same would happen: the type of shoes would be a control. This is why we switched to the “combo chart”.

This type of graph allows adding several metrics based on a variable, so the important thing is to define: what metrics do we want? The answer may sound easier than it is to write it in Data Studio, although that is still not complex. Each metric is going to be a store-shoe combination, so that it counts each occurrence separately. It would look like in Figure 7, where the Sandals-A and Boots-A combinations are (left and center). And we also see how all the parameters of the graph would be (Figure 7 right), including the other two combinations as metrics, Sandals-B and Boots-B.

Figure 7. Parameters of the Combo Chart (right) and of the new added metrics (left and center)

The result of this is in Figure 8. As seen in the graph, there are controls for the store and type of shoes, because if we filter by any of them, the corresponding bar will simply disappear, as seen in the gif of Figure 8. That is the important thing about this visualization, that we have everything and we can take it out, unlike the previous one, which was limited in the information that was there from the beginning.

Figure 8 (GIF). Combo chart with the transactions of all the combinations of shoes-stores (2020).

Another benefit that has changed from a graph to one that allows adding many metrics is precisely: adding metrics! (without abusing, please). The only important thing is to think about what we want to show and look for the indicated syntax, or if it is too complicated, bring the feature ready from Big Query.

For example, to this graph of transactions, we can add the average amount per unit (or pair in this case) of shoes, to get an idea of ​​in which month there was the highest unit income. As we see in Figure 9 (left), this metric is constructed as the total, which would be the total paid (including discounts), divided by the number of units, obtaining $/unit. In addition, we take “average” as aggregation to have the average amount, since that is what we are interested in. If we wanted the total, we would choose “sum”. On the right side of Figure 9, there is the difference in the styles between the bars and the lines, in addition to the fact that the line is on the Y-axis on the right side, in case there are differences in the data scales.

Figure 9. Parameters of the bars/lines of the Combo Chart (right) and of the new added metric (left)

The result of adding this new metric is in Figure 10. It should be noted that since we created this variable WITHOUT considerations of the store or shoes in particular (like the other four variables), this metric will be calculated with all the values ​​that are available according to the controls: in Figure 10, there is no filter by store or shoe, so the average sale per month that we see in the graph is the average of both stores and both types of shoes for that month. But since we already know how to do it, it is a matter of putting the necessary conditions on it and we can have our sales per store, shoe, and more!

Figure 10. Combo chart with the transactions of all the combinations of shoes-stores and average amount (2020).

Bonus track: The usefulness of controls

To finish, and just to give an idea of ​​the infinity of things that can be done in Data Studio if we look for the right way, I want to show you a way (because there are probably several more), of how to change the type of metric / kpi / parameter we are looking at, through the controls. To understand it better, in this case we will first see the graph (Figure 11).

Figure 11 (GIF). Comparison of 2019 vs 2020 in the kpi chosen through the control.

What we are seeing is the year-over-year comparison of a KPI, which changes depending on what we choose in the control. We are no longer interested in filtering the years or the stores or the shoes, but rather how did we do with such KPI in 2020 vs 2019.

When changing the “kpi” control, the Y-axis changes, so according to this choice we are looking at the total sold, the total discounts, the quantity sold or the transactions carried out (for this example we leave the store and type of shoes as a control because the point is another, but we could do the same as before: all the kpi-store-shoe combinations, to have all the information).

How did we do it? We start in Big Query:

(SELECT Store, Shoes, month,
SUM(IF(year=2019,quantity,0)) kpi_2019,
SUM(IF(year=2020,quantity,0)) kpi_2020,
'quantity' kpi
FROM `TEMP.transactions`
GROUP BY Store, Shoes, month)
UNION ALL(SELECT Store, Shoes, month,
SUM(IF(year=2019,totalDiscount,0)) kpi_2019,
SUM(IF(year=2020,totalDiscount,0)) kpi_2020,
'totalDiscount' kpi
FROM `TEMP.transactions`
GROUP BY Store, Shoes, month)
UNION ALL...

What we are doing is:

  1. Repeat a block of code for each kpi that we want to add. Then, we join each block with a UNION ALL.
  2. Add the column “kpi” to the table, which will take as a value the name of the kpi corresponding to that block.
  3. In each block, we change the metric or variable that is being added in the IF sentence, which corresponds to the kpi. Also, in all the cases we saw, we are adding values, for example: quantities of shoes grouped by month-store-type of shoes. But maybe there is a kpi that has to be the average, so that SUM() may well be another function.
  4. In this case we are comparing 2019 with 2020, but eventually it may be more years or other years/periods, so these conditions are completely modifiable.

Once we have the query ready, we can create a table directly in Big Query to read it from Data Studio, or make a custom query from Data Studio (to not create a table). As we see in Figure 12 (left), we did it with CUSTOM QUERY and the parameters are shown in Figure 12 (right). With this, we have everything to do Figure 11!

Figure 12. Parameters of the kpis-Combo Chart (right) and Data Studio editor to add a Custom Query as Data Source (left).

Final thoughts

To finish the article, I only have to say that once the initial stage of getting used to and getting to know Data Studio, adding data sources, changing the graphics and their styles, etc., is a matter of taking a single and short step to handle endless visualization options. You just have to open up to the idea that there is no a single way to do it and not a single “good” way, they are all correct.

It is only important to take the time to define what we want to show, see that we have the data and voilà!, how it is going to be increasingly easier to understand/do.

It is also important to consider that sometimes we spend a lot of time, we go around a thousand times, to make a super graph with millions of things, to show the same thing that could come out of a card with a single number. let’s not forget that less is more, especially if it is going to save us time, head and lines of code.

--

--