From device wiring to Google Data Studio: a journey with MicroPython and Google serverless [Part 2]

Nicola Guglielmi
Google for Developers EMEA
7 min readDec 10, 2019

Let’s move to the Cloud

This is the second part of a tutorial, the first part is here:

→go to part 1

Brief architecture description

There are several ways to design and deploy a data management pipeline using the cloud services.

In this tutorial we will focus on a full serverless approach.

On opposite you can run a vm or docker with tools like RabbitMQ.

In a serverless approach, you can choose among many options, too, as you can see in the chart:

Our entrypoint is IoT Core, after that we will accumulate the sensors readings in Pub/Sub and we will process with a Google Cloud Functions (GCF).

I would like to spend a minute to examine the differences between the use of Cloud Dataflow and Cloud Functions.

With Cloud Dataflow you can design and execute the transformations your data needs to be loaded inside BigQuery in a managed environment.

The approach is super easy and effective, but for low data volume there is a drawback.

Each Dataflow job is executed on a Compute Engine Instance and you can start it as a batch job, to process the data in bulk,or a stream job to process data continuously.

Due the nature of Pub/Sub system, it requires a stream processing and this blocks the termination of the Dataflow instance, keeping it running indefinitely and generating costs.

On opposite, Cloud Functions are executed on event, like new Pub/Sub message, and you can leave the function working because the cost of each invocation is microscopic.

Google Cloud Platform

Point your browser to https://console.cloud.google.com/ and if you don’t have one, you can create a free account with 300$ of credit for one year.

Create a project or select one and select IoT Core functions menu:

I pinned some of the menu that I use daily, you can find IoT Core under BigData section

Now create a registry, choose a registry ID to identify the IoT Core endpoint. You can have multiple devices send data to this registry.

Choose a region, closer to the device that will stream the data and enable the protocol, for this tutorial you need just MQTT, but you can left both enabled (MQTT/HTTP).

[Default telemetry topic]

Now it’s the time to create a pub/sub topic to collect the data that are coming from the device, select create topic from drill down menu and type in a name for this Sub/Sub topic.

You can leave the key management to Google.

[Device state topic]

In a more complex setup you may wish to redirect changes in status events to a separate Pub/Sub topic and you will use the Device state topic.

Select the error level you wish, “info” is enough, you can change later.

Keep in mind that these logs use Stackdriver services and would billed.

Click on create and you will create registry and Pub/Sub topic.

Now select the registry you created and click on “Devices” to configure the access for your device:

Choose a device id to identify your device’s data, set it up to allow communications, set authentication to “manual” with the key format RS256 and paste the content of the rsa_private.pem file you created before.

Let’s stream some data to the cloud

You should have all the files of the repository you previously cloned inside a folder.

Edit the config.py wifi_config settings to match the values you set during creation of registry and device id and the values of your sda/scl pinout:

Once the config.py file is ready you can upload all the files to the board.

ampy -p COM5 put config.pyampy -p COM5 put iot-test.py main.py

With the second command you will upload the file iot-test.py and rename it as main.py to allow auto execution on each restart.

[If you don’t used the BME280 sensor and wish to use the integrate temperature reader, please upload the iot-test-only-temp.py instead of iot-test.py]

If all is ok, you should see some data flowing in Pub/Sub:to do a quick check, use the Stackdriver logs:

And you should see some data

Configuring BigQuery

Now it’s the time to process the incoming data.

The data should flow from IoT Core to Pub/Sub and accumulate until some subscription pulls it.

We will build a Google Cloud Function that will be called each time new data flows to pub/sub and do a simple data preparation before pushing it to BigQuery.

First of all, let’s create a dataset and table in BigQuery to receive the incoming data: go to BigQuery from the Google Cloud Console.

You can use the name you prefer for dataset and table, just respect the constraints of naming: write it down for use in the GCF.

Under resource, click on your Project Name and select Create Dataset

Type in the name IoT_test and select the region nearest to you (or to the streaming device)

It’s the time to create a table for the data,

Click on the newly created dataset, select create table and define the fields you need on the table:

Now you are able to collect the data you are streaming.

Google Cloud Functions setup

Last step of ingress to DB process is to write and deploy a GCF that take data from your device as soon as they are published to Pub/Sub, make a small makeup of data and send it to Big Query.

Go back to Google Cloud Console and open Cloud Functions from the left menu.

Click on the Create Function and input some infos, like function’s name, memory etc…

You can have different memory size for your functions and you can choose different languages.

For this tutorial we use Python and 128MB should be enough.

The GCF will be executed each time one event will trigger it, and we have several event triggers to choose from.

Under trigger, select Pub/Sub and select your topic.

Copy the code from the file gcfunctions.py and paste in the textarea,

You can have many functions inside a single GCF and you have to specify which one should be executed on the trigger event: type the name of that function in “Function to execute”

If you need specific libs for your function, you can load them in the requirements.txt tab.

We need two additional libs for our project:

google.cloudgoogle-cloud-bigquery

Clicking on “Environment variables, networking, timeouts and more” you can access additional options, like the Region in which you want to deploy your function, some limit like time execution limit and maximum limit for concurrent functions we want to execute.

Under environment you will define two variables to define your dataset and your table:

Click on Deploy, give some seconds and go to check logs: open your function and click on “view logs”.

You should see something like this:

If you see a terminating status “ok”, the data has probably been processed correctly and you can check if in BigQuery everything is going well.

Open BigQuery in another tab, click on your dataset, table and click on “preview”:

Analytics with Google Data Studio

I found some strange behaviors using the shortcuts “Explore With Data Studio” after query to the table, so I suggest to open Data Studio and to create a datasource, browsing to:

https://datastudio.google.com/

First create a new datasource pointing to your BigQuery data

Select BigQuery, set a name for this datasource and select the project, dataset and table:

Connect and make a little modification to the timestamp visualization field to also get hours and minutes:

Finally you can click on “Create Report”

Just add a time-series chart to your dashboard, stretch it to max size and make some markup, add your metric selecting Average as aggregation:

Make some improvements, moving the Pressure series to a right axis scale and make some little tweaks, experiment with the options (don’t be shy, you can’t break anything! ;))

And finally you can play with styles to give a better look and feel to your dashboard and add widgets and labels:

--

--

Nicola Guglielmi
Google for Developers EMEA

Google Cloud Architect & Authorized Trainer • Team Manager • Community Leader