Photo by Paula Vermeulen on Unsplash

You might want to use Data Studio in Data Science — Part I

Liudmyla Kyrashchuk
LogicAI
Published in
4 min readFeb 27, 2019

--

Or at least consider it…

At LogicAI we use Google products on various stages of our data science pipeline as we like to optimize our work. Typically, Google Engines and Google Storage seem enough for work, but what about Data Studio?

No doubt, it is a good tool for reporting and analyzing, especially when Google Analytics or Google Adwords come on stage. But is there a place for it in a Data Science world? I think it is, and I will show you two main aspects where you can benefit from it. At the very end, I will briefly show you how to connect Data Studio with BigQuery.

Share your doubts about data with others

Every Data Scientist knows that EDA (Exploratory Data Analysis) is one of the key stages to understand the data, to create magic (and sometimes not so magic) features to boost your model performance and simply to know where and how to clean your data. Sometimes you even need to consult with colleagues or your client to make sure that something is really wrong in your data.

The last part is where Data Studio can help you a lot. You can simply download your data in a CSV format or connect it to BigQuery and create plots that you are uncertain about to share them with people you what to consult with.

Note, that connecting your Data Studio with BigQuery and releasing it to the public will cost you each query made within Data Studio, at least now.

Clients or colleagues can change dates, metrics or filter on their own with just drag and drop, to test the hypotheses about the problem on their side (so you don’t actually need to reiterate the process with sending them plots over and over again).

Toronto Weather Forecast from Google Report Gallery

Show performance of your model

The second time Data Studio can come in handy is when you’ve created your model and even deployed it. Actually, now you and your client can monitor how the model performs. While your model is running on the Google Engine and writes predictions in Google Storage, it is very simple to connect it with Google Data Studio and monitor how good it is.

The good thing is, you have nice visualizations of items sold (if that was your target) by day/week/month, or clients churned, but the most important part is that you can pretty quickly react when your model starts to fail.

Instead of plotting your model performance every week or month, you can easily spend a few minutes daily just to refresh your plots with the new data in Data Studio, and share updates with your clients. Now they can observe how the model behaves.

Integration with BigQuery

Google Data Studio

The first step to create Data Studio report is to select a data source. Lucky for us, Data Studio has a pretty simple connection with BigQuery. After authorization (don’t forget to read that!) the list of your projects with all available datasets will appear.

Data Studio cashes results, but you will pay each time you refresh your graphs or change filters.

Query costs are based on the amount of data processed by the query — Google Pricing

So, can you minimize the costs? The answer is, yes, if you are using Data Studio to monitor your model performance.

If you are using it to show the potential problem in data, it is better to use CSV files instead. This way, you can use granular data and aggregate it in different ways to investigate the issue.

So, let’s go back to BigQuery integration. To minimize the costs you can aggregate your data with one query and save it as a separate table in BigQuery Web UI.

It means, that every other day you change the date, and add it to your report table. Less data processed, less you pay.

In the next part, I will show you how to automate this process via App Engine. Stay tuned!

If you want to get familiar with Data Studio, check out these links:

I hope you found it interesting and useful. If you use Data Studio in your Data Science routine somehow else, please share it in the comments!

--

--