5 steps for data analytics using Google Cloud and Marketing tools
How to leverage CSV, Cloud Storage, and Data Studio to support fast data-driven decision making
What comes to mind when you hear “data-driven”? Expensive BI system licenses? Complex ETL processes that prepare data to populate dashboards? Corporate or technical constraints that prevent you from accessing relevant data to make decisions? Or affordable tools that are available whenever you need them, even to guide simple decisions? I hope you answered “The last option!”, but I bring good news for you in case you didn’t.
Well, “data-driven” has been a trending topic for a long time and it is not difficult to understand why. As all of us know, people, gadgets, and devices — aka “things” — generate more and more data day after day; and, fortunately, tools that use such data to support decision-making are easier and more convenient to access. Although commonly associated with big data, it does not mean we cannot count on some of those tools to analyze small-scale datasets.
My goal with the present article is to share thoughts on a straightforward workflow I’ve been using to support decision-making based on data, especially in fast-moving scenarios that require me to get insights as early as possible. This workflow is part of an effort I am doing to exercise my data-driven mindset. There’s no need for complex or expensive stuff to start visualizing and sharing data through user-friendly dashboards: CSV files, a Google Cloud Storage bucket or Google Sheet, and Data Studio should be enough.
To demonstrate how it works, I will use a 3-column sample dataset containing timestamps, event ids, and the platforms in which the events were generated, as follows:
| timestamp | event_id | platform |
| =================== | ======== | ======== |
| 2019-06-22 01:15:03 | 600 | ios |
| 2019-06-25 22:34:46 | 1072 | android |
| 2019-06-26 07:36:51 | 1810 | android |
| 2019-06-26 21:04:50 | 290 | ios |
...
This fictitious dataset represents events gathered from a mobile application. You can interpret them as trackable errors reported by the application’s users, or any other identifiable events such as a new installation, new user subscription, and so on. I won’t focus on discussing what the events mean, but on how to use Data Studio to look into them. So, the most important thing to keep in mind is: we can turn 3-column raw data into an interactive dashboard in a few minutes.
Since there are many options to add data to Data Studio dashboards, I needed to make some choices. I decided to go with CSV + Cloud Storage as they allow me to clearly demonstrate how to manually handle data and also leave some thoughts on automation at the end of the article.
More details about CSV files:
- They work as a simple and standard format to exchange data between information systems.
- Lots of tools, such as spreadsheets and database management systems, can export data in this format.
- Anyone with a basic text editor can manipulate a CSV file.
I hope these decisions will make it easier for readers to go through the 5 steps. This is intended to be a hands-on guide, so there are sample files available for download and we will use them later on.
— Hey Ricardo, can I read data from Google Sheets?
— Sure! And also from BigQuery, CloudSQL, MySQL, and 500+ data sets from over 230 connectors available for Data Studio (not all for free). In fact, Data Studio even allows us to blend data from multiple datasets and use them to populate the dashboards, but I won’t go so deep in this introductory blog post.
Let’s get down to work!
Step 1: Export the data as a CSV file
Disclaimer: In actual scenarios, getting access to the data is usually the most difficult part of the job! Not due to technical constraints most times, but due to business, privacy, and security concerns.
Having that said, please be aware of the below tips:
- If you aim to use data that belongs to any third-party, start by looking for the policies that govern access to such data.
- Request only what you need to perform your analysis.
- Avoid requesting Personally Identifiable Information (PII), as they are subject to more restrictive access control.
- By ensuring that you do the above, you maximize the chances of getting the data.
Also, I’ve faced situations in which the data didn't even exist, so I needed to take action in order to collect/generate them: sometimes by instrumenting information systems, sometimes by doing something as simple as requesting people to provide information through Google Forms. Use cases may vary a lot…
Once you are granted access, it’s time to export the data as a CSV. This is pretty straightforward in most tools, as you can see in the examples below. Warning: the first line must contain the column names.
- LibreOffice or Microsoft Excel: click File > Save As…, then select
Text CSV (*.csv)
or a similar file type. - MySQL: execute a query.
- Google Sheets: although possible, there is no need to export as a CSV — Data Studio can read directly from the spreadsheet. In this case, you might skip uploading the files to Google Cloud Storage (step 2), use the
Google Sheets
connector in step 3, continue to step 4.
For the sake of simplicity, I’ve shared the sample dataset in this folder: https://drive.google.com/drive/folders/132ardN2eZ3GFS8HOax0AJINqbAXGYIEd?usp=sharing. Please download the three CSV files: 2017–202004.csv
, 202005.csv
, and 202006.csv
.
Step 2: Upload to Google Cloud Storage
Time to upload the first file to somewhere Data Studio can read it from. I’m assuming you have a Google Cloud account and are familiar with Cloud Storage, so please go to the Cloud Console: https://console.cloud.google.com.
- Look for STORAGE > Storage in the navigation menu.
- Click CREATE BUCKET and follow the instructions. Select
Standard
as the default storage class, as the dashboard will retrieve data from the bucket several times and this is the most cost-effective class for this use case. - Click CREATE.
- In the Storage browser screen, click on the bucket you have just created.
- Drop the
2017–202004.csv
file into the bucket. Leave the two other files untouched for now.
Step 3: Connect Data Studio to Cloud Storage
Now, go to Data Studio: https://datastudio.google.com.
- Click Blank Report on the home screen and you will be redirected to the Add data to report page.
- Select
Google Cloud Storage
in the Google Connectors section. - Check the
Use all files in path
option. - Type the name of the bucket you created in step 2.3 and click Add.
- Click ADD TO REPORT in the You are about to add data to this report popup.
- Data Studio may request you to provide your Google Cloud credentials in order to get access to the dataset.
Then you should see a dashboard with a simple table, as follows:
Step 4: Add components to the dashboard
Once we connected to the dataset, I will use a short video to demonstrate how to add and set up all the interactive UI components — represented by the yellow balloons in the below thumbnail. Note: static components, such as labels and lines, will not be covered in this guide to keep it more succinct. But it is important to have in mind they bring a substantial improvement to the user experience.
After a few clicks, the interactive components setup is done! For those who might be interested, there is a step-by-step guide for in the video’s description on YouTube.
I tried to demonstrate a couple of features that allow readers to get started, but strongly recommend reviewing the instructions later and playing with other components and configurations as well. I’m pretty sure you will find a bunch of cool things provided by Data Studio!
Step 5: Look into the data
With the components in place, we can start the most fun part: looking into the data! I will mix the Year Month filter and the Drill-down arrows to explore some scenarios. Please click the View button at the top of the screen.
Notice there is an increase in usage of the app in iOS, with a spike of the event 600
in April 2020. Let’s get more details on what happened in that month:
- In the Year Month field, click ONLY in the
Apr 2020
line. You will notice the charts will display weird lines and bars. It happens because they are still aggregating data on a monthly basis (remember that we set theYear Month
Granularity for both) and we have just asked for data from a single month, what “breaks” the time-series plot. - Click the down arrow at the top-right corner of the Platform breakdown chart. Now you can see data on a daily basis.
- Do the same for the Event ID breakdown chart. Notice there were some spikes of the event
600
throughout the month.
So, depending on what that event means, the responsible team can take appropriate action. For example: if it represents new subscriptions to a premium usage plan, the Marketing team may conclude they are doing a good job in a campaign focused on iOS users and hence they should keep putting money behind it. Android users, on the other hand, look stuck and might need more attention.
Here we finish the last step. I hope you agree it is easy to set up a data-driven workflow and may leverage it to guide your decisions from now on. Remember this is only the very beginning. The more you practice, the more you will see the benefits and safety of making decisions based on data.
What’s next?
We are getting close to the end of the article, but since we have already covered the basics, I invite you to a new exercise: start brainstorming on what can be done to improve the workflow. There are lots of feasible improvements, so let me briefly share at least three to promote further discussions.
Keep adding fresh data
We analyzed data from June 2017 to April 2020. But I’m writing this article in the last week of June. What happened in May and the early days of June? To answer this question: we need more data…
Remember those two files you downloaded but didn’t use, 202005.csv
and 202006.csv
? Time to upload them to the Cloud Storage bucket you created.
After uploading the files, with the dashboard in view mode:
- Click the up arrow at the top-right corner of each chart.
- Unselect the Apr 2020 item in the Year Month field.
- Click the Refresh data button at the top of the screen to discover fresh data.
New data brings new insights. It is essential to keep the data updated.
Automate CSV files generation and upload
Although CSV + Cloud Storage allows manual updates, as we have just done, one of the key benefits of this architecture is that it also supports automatic updates — with no major efforts for set up. Adding such capabilities is very likely to require some coding and system administration skills, but the resulting convenience definitely pays the bill.
Steps 1 and 2 of the workflow would be directly impacted by this kind of improvement, so the first thing to think about is how to automate the CSV file generation. It might be simpler if you are fetching data from a database and a bit more complex if you depend on other kinds of tools.
Google Cloud SDK comes with gsutil
, a command-line tool that allows us to securely upload files to Cloud Storage. It can be scheduled to run from time to time and push fresh data to the bucket.
Use a more robust data storage tool
Hopefully, you will have more data to handle as time goes by. And the Cloud Storage connector may slow down when processing a big dataset. If it happens, consider changing the underlying data storage to a more robust tool — e.g. an RDMS such as MySQL or a data warehouse such as BigQuery.
Wrapping up
If you read until here, you got a basic skill set to get started with Data Studio and leverage this amazing tool to empower your data-driven journey. My suggestion now is to explore more components and configurations. What about adding a new business variable to the sample dataset and comparing the results? You can also create and play with datasets that make more sense to your business.
Yes, there are plenty of opportunities either from the dataset perspective and the solution architecture. Enjoy them!
Thanks for reading, I hope it helps.