Tell me more Internet of Things — Part 2 — Google Cloud IoT cost and value comparison

Google Cloud IoT — onboarding, data storage and visualization. How to connect your device to Google Cloud IoT and stream the ingested data to a database for storage, visualization and analyzation.

Jan Bertrand
Coinmonks
Published in
14 min readJan 10, 2019

--

Part 1 | Part 2 | Part 3 | Part 4| Part 5 |Part 6 | … | Part n

In Part 1 of this series I explained how we intend to compare the IoT Platform based on a specific use-, business-case and its value proposition. We defined the scope and size of the project and finished with the hardware setup of the sensor.

The goal of this part is to store sensor data continuously in a scalable database able to serve for visualization and data analysis. Within the Google cloud services there are multiple ways to implement this. I followed mainly these tutorials:

We start to implement a solution which is scalable and based on the standard services Goolge Cloud provides. Afterwards we replace Cloud Flow engine with Cloud Functions to increase the transparency of the underlying services and to compare a restricted scalable version (which could even run for free for only a few devices and for prototyping)

Overview of implementation and results

Implemented (costs estimated) solution

The device is onboarded via a node.js MQTT (Message Queuing Telemetry Transport) agent. With RSA public / private key encryption — the IoT Core, as the device registry holds the public key of the device for encrypted communication M2. From there we use Pub/Sub to collect the data and pull it with Cloud Flow / Cloud Functions to our Database (Big Query). The visualization is implemented with the help of DataStudio M3

State of implementation (M0 — M3)

For the current implementation with no users and only devices ingesting data, the IoT Core costs are the highest followed by the computing costs of the cloud function. There is a minimum 1024 bytes cap on the ingested message. Which accounts for our higher net ingest (as our message is about 100 bytes with MQTT overhead a bit more). The cost per device increases with more devices and the prototype is for free K1. The service price includes maintaining and updating the infrastructure as well as keeping them secure K2.

On the soft side the device agents onboarding via MQTT node.js libraries is relatively easy and various starter tutorials exists K3. The device management (IoT Core) is well explained. There is enough documentation available but for just visualization of data its quite complex composition of services. The possibility of entering a cloud shell directly from the browser and the available step by step tutorials for the different service are great K4. All implemented Milestones (M1-M3) are provided out of the box with various different possible services including the visualization via DataStudio K5. In addition to all that greatness (you know by now already that I am close to fan boy status) the sheer endless possibilities to define where your data should be stored or processed makes the platform fit for IP (Intellectual Property), ECC (Export Control Classification) and possible other liabilities. On the other side of the shining medal this free choice increases the complexity of the implementation.

Google Cloud estimated costs running with Cloud Function.

I have created a configurator for this implementation which you can use to explore how I derived the costs for the different services. Inputs are

  • amount of devices (our case: 1; 100; 1000; 10.000)
  • message size (our case: 100 Bytes)
  • message frequency (our case: 20 message per device per minute)
Screen of my cost configuration app (naturally running on Google Cloud App engine)

There is of course as well the official more complex cost estimator — with many more functions and services. https://cloud.google.com/pricing/

In this Part 2 implementation (M0-M3) is not included yet: User Management — individual visualization and event suggestions. Efficient data storage for fast querying for the user. Automated data analytics for the user.

Highly scalable implementation

The solution is based on onboarding the device to

  1. IoT Core. Here the device gets registered via a public/private key (asymmetric encryption) provided by the device owner. We only send data from the device to Google Cloud IoT Core. The device manager can do more — like updating the devices configuration (incl. rollbacks etc.)
  2. The device runs a node.js program which sends the sensor data to the registry via MQTT (Message Queuing Telemetry Transport) protocol.
  3. From there we create a subscription in Pub/Sub to this registry.
  4. Data flow is used to fetch the data from this subscription and pushes it for permanent storage to the Database BigQuery.
  5. BigQuery holds the schema for storage and is queryable by API or with Data Studio
  6. Data Studio for visualizing, reporting and manual analytics.

You ask yourself maybe why do I need so many services to just visualize and query data? We are in the same boat here — lets find out together.

Overview of Google cloud services used in this article for the project

This implementation does still not provide

  1. User Management — individual visualization and event suggestions.
  2. Efficient data storage for fast querying for the user.
  3. Automated data analytics for the user.

If you like to try it out yourself please get yourself a free-tier Google Cloud offer. There is a free trial with ~200$ of free usage for 1 year. In addition there are certain always free limits too.

In Google Cloud we can separate different projects by project-ids. They have their own resources, billing and analyzation. We define a new Google cloud project. My project name (project id) is tell-me-more-iot. It is a unique global identifier and you need to choose one yourself.

Creating a new project in Google Cloud (project id: tell-me-more-iot)

Maybe here a little overview map for the new ones (it’s quite a bit of functionality in this GUI)

Some overview for the complex, extensive dashboard of google cloud

Google IoT Core and Pub/Sub

There is a good quick start tutorial available for basic onboarding (or even virtual onboarding of compute instances — if you don’t have a pi or a running linux laptop on hand)

We introduce both possibilities to create our registry and devices by GUI (Graphical User Interface) and via the command line. For this I show both ways later you can decide how to progress yourself

With the GUI

Creating a IoT Core Registry (and with that a topic for the Pub/Sub)
Creating the topic for the Pub/Sub during IoT Core registry creation

We chose following names:

PROJECT_ID=”tell-me-more-iot”
TOPIC_ID=”tmmiot-topic-1"
TOPIC_PATH=”projects/tell-me-more-iot/topics/”
REGISTRY_ID=”tmmiot-registry-1"
REGION=”europe-west1"

With the command line via Cloud Shell

The cloud shell (terminal) is an ephemeral linux debian instance with pre configured software. You can define your own docker profile (the image which should be deployed once cloud shell opens) with your necessary software. But it comes already with git, node.js and of course gcloud — the CLI (Command Line Input) for the entire service portfolio of Google Cloud.

Execution of the gcloud command leads to

@cloudshell:~ (tell-me-more-iot)$ 
gcloud iot registries create
ERROR: (gcloud.iot.registries.create) argument (REGISTRY : — region=REGION): Must be specified.
For detailed information on this command and its flags, run:
gcloud iot registries create — help

We could consult further the help or you just execute the following script with

@cloudshell:~ (tell-me-more-iot)$ 
mkdir scripts
touch createIoT-1.sh
chmod +x createIoT-1.sh
nano createIoT-1.sh

copy and paste the following code ( mkdir creates a new folder scripts, touch creates a new file with the name createIoT-1.sh (our shell script). chmod +x give the file the rights to get executed. nano is a command-line editor / alternatively you could use the web-based cloud editor from Google.)

#!/bin/bash
PROJECT_ID=”tell-me-more-iot”
TOPIC_ID=”tmmiot-topic-1"
TOPIC_PATH=”projects/tell-me-more-iot/topics/”
REGISTRY_ID=”tmmiot-registry-1"
REGION=”europe-west1"
gcloud config set project $PROJECT_ID
gcloud pubsub topics create $TOPIC_ID
gcloud iot registries create $REGISTRY_ID \
— region=$REGION \
— no-enable-http-config \
— enable-mqtt-config \
— event-notification-config=topic=$TOPIC_ID

Then execute the shell script with

@cloudshell:~ (tell-me-more-iot)$ 
./createIoT-1.sh
Creating the registry and sub/pub topic via CLI

Not so much happening at the moment as we have not send anything to the registry neither did we define a device (the representation in the registry of our pi)

Let start preparing sending the sensor data to the registry.

Status of the implementation

Prepare the device for sending data to the cloud registry device

Back to the pi.

We create a folder for our MQTT client software

@raspberry: ~ $ 
mkdir tmmiot && cd tmmiot

Pull the dedicated client node.js script (based on this extensive example repositories)

@raspberry: ~/tmmiot/ $ 
git clone https://github.com/jhab82/tellMeMoreIoT-MQTTclient

Once that is downloaded we need to install and potentially replace the USB port name ttyUSB0 of your sensor in the script (see Part 1)

@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $ 
npm install

The program would run but would crash with asking for a private key file which we need to create.

  • The private key — which decrypts messages (created and owned by our device)
  • The public key — which is able to encrypts the message (created by our device and is shared with our IoT Core registry device)

According to this tutorial our device needs to sign a JSON Web Token (JWT) which is send to the IoT core for proof of identity. Further we use the X.509 certificate which expires by default after 30 days — you could add -days 100000 to have some more time for device registration.

@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $ 
openssl req -x509 -newkey rsa:2048 -keyout rsa_private.pem -nodes -out rsa_cert.pem -subj "/CN=unused"

Two files have been created. The rsa_private.pem stays where it is and will not be shared with anyone or anything. The rsa_cert.pem we should copy now over to our registry.

Onboarding the device via GUI

Adding the device and to the IoT registry and handing over the public key (this public key was generated on our pi — only for visual aid I have created here with the consol)

We navigate to our tmmiot-registy-1 and click create and fill in the necessary information, including pasting our RS256_X509 certificate into the public key value field (don’t forget to add the device). We are ready to communicate securely with the cloud now.

Sending data to the cloud registries dedicated device

Back to the pi

It’s time to bring the sensor data to the cloud. Before you are able to start the agent (or client onboarding and sending of data) you need to change some variables to fit your project in the mqtt-agent.js .

var argv = { projectId: “tell-me-more-iot”,
cloudRegion: “europe-west1”,
registryId: “tmmiot-registry”,
deviceId: “tmmiot-device-1”,
privateKeyFile: “rsa_private.pem”,
tokenExpMins: 20,
numMessages: 10,
algorithm: “RS256”,
mqttBridgePort: 443,
mqttBridgeHostname: “mqtt.googleapis.com”,
messageType: “events”,
};

I have adapted the node.js MQTT client example provided by Google. The client runs every 3 min. and queries the sensor which then runs for 2–3 s and produces a measurement point which gets send with JSON formatted string to the Google cloud registry. Every 20 min. the JSON web token gets renewed. The data we send is JSON formated as following:

{“id”:”tmmiot-device-1",”time”:1544729995403,”date”:”2018–12–13T19:39:55.403Z”,”pm2p5":5,”pm10":6.8}

Running the client with:

@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $ 
node mqtt-agent.js

Or use the nohup command to run the program in the background (even if you close the terminal)

@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $ 
nohup node mqtt-agent.js > out.log 2>&1 &
IoT Core registry and first heartbeat and data received from the device

Creating a subscription to a topic

We need to subscribe to the Pub/Sub topic in order receive or handle the data.

@cloudshell:~ (tell-me-more-iot)$
gcloud pubsub subscriptions create tmmiot-subscription
-topic=tmmiot-topic-1
Pulled sensor data by command line on Cloud Shell

The subscription will deliver the message when requested (pull) from the topic and retain the message for 7 days (this duration can be limited or extended). Once the message is received and gets acknowledged its not any longer in the topic/registry.

As we have send already some sensor data to the topic and have the subscription we can pull (request the data) — the pull will not deliver the message in any defined order. The following code just pulls one message and acknowledge it.

@cloudshell:~ (tell-me-more-iot)$ 
gcloud pubsub subscriptions pull -auto-ack tmmiot-subscription
Status of the implementation

The pi runs overnight and we can see the monitoring of the devices in the registry

Monitoring of the device registry in Google IoT — Bytes received peaks at 640B every 20min (JSON web token renewal) and around 128B for the message every 3 min.

Streaming the subscription data to BigQuery

There is a quite easy possibility to stream the data from the pub/sub subscription to a BigQuery database.

The Google Cloud Flow function runs a template Java application which streams all JSON formatted String data into a predefined BigQuery database with the equivalent schema. The data we send would require a database as shown below

{“id”:”tmmiot-device-1",”time”:1544729995403,”date”:”2018–12–13T19:39:55.403Z”,”pm2p5":5,”pm10":6.8}
Database schema represented by a simple table

Creating a BigQuery database

Let’s use the command line for creating the database and adding our schema. First we need to define a JSON formatted file tmmiot_table_schema.json and store it in our cloudshell.

[
{
“description”: “device-id”,
“name”: “id”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Measurement time”,
“name”: “time”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Measurement date”,
“name”: “date”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Sensor 1”,
“name”: “pm2p5”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Sensor 2”,
“name”: “pm10”,
“type”: “STRING”,
“mode”: “NULLABLE”
}
]

Create the dataset tmmiot_dataset

@cloudshell:~ 
bq mk — dataset tell-me-more-iot:tmmiot_dataset

Then we add a table tmmiot_table to the dataset with the JSON formatted schema

bq mk — table [PROJECT_ID]:[DATASET].[TABLE] [PATH_TO_SCHEMA_FILE]

@cloudshell:~ 
bq mk -table tell-me-more-iot:tmmiot_dataset.tmmiot_table tmmiot_table_schema.json

Hint: In contrast to the prior CLI we wont use gcloud as the command but bq.

BigQuery dataset with table and schema

Creating a cloud flow stream to BigQuery

Please delete the Dataflow job after your tests as this service is designed for high performance and will cost you money over time

Still our sensor data is not getting logged for good. The subscription only holds 7 days the telemetry event data. We would like to permanently store the data and the most easiest and highly scalable solution — we use a template from Google Cloud Flow.

In order to use cloud flow we need to first define a data bucket for storing temporary files.

Hint: Here we use gsutil to create and access the storage function of google cloud which is organized in buckets. You can certainly just use the GUI again.

gsutil mb -p [PROJECT_NAME] -c [STORAGE_CLASS] -l [BUCKET_LOCATION] gs://[BUCKET_NAME]/

@cloudshell:~ 
gsutil mb -p tell-me-more-iot gs://tmmiot_bucket_1

To create a subfolder we need to copy cp a file to the subdir.

@cloudshell:~ 
touch tmp_test
gsutil cp tmp_test gs://tmmiot_bucket_1/tmp/

This is all possible as well with the GUI of course

Hint: we have created storage without taken care of the storage location yet. If you would need your data in a specific place — you need to specify that

Further we open the Dataflow GUI and create a new job

Further we open the Dataflow GUI and create a new job. Job name tmmiot-dataflow and Cloud Dataflow template: Cloud Pub/Sub to BigQuery

Job name: tmmiot-dataflow
Cloud Dataflow template: Cloud Pub/Sub to BigQuery
Regional endpoint: europe-west1
Cloud Pub/Sub input topic: projects/tell-me-more-iot/topics/tmmiot-topic-1
BigQuery output table: tell-me-more-iot:tmmiot_dataset.tmmiot_table
Temporary Location: gs://tmmiot_bucket_1/tmp
Max workers: 1

Hint: n1-standard-1 is the smallest machine able to use for this template (we tried f1-micro and the next but following error arises). If you don’t hand over max workers and machine-type by default there will be 4 workers and n1-standard-4 (with 4 vCPUs and 15GB RAM).

Error when trying to use a f1-micro compute engine as the data flow engine
Creation of the dataflow job and view of the compute instance where the job is running
Dataflow overview with resource metrics
Querying the database after 9 min data flow running

Hint: A Dataflow job can only be stopped but not be entirely deleted. The job will be shown in the overview. If you made a typo for the bucket name for example the job failed and will stay in the overview.As I had ingested data to the topic (Pub/Sub) overnight I would have assumed to get a all the unacknowledged messages pushed into the database.But after waiting 9 min we only see three entries — meaning the streaming service only accounts for new messages.

The JAVA templates source code states

/** * The {@link PubSubToBigQuery} pipeline is a streaming pipeline which ingests data in JSON format * from Cloud Pub/Sub, executes a UDF, and outputs the resulting records to BigQuery. Any errors * which occur in the transformation of the data or execution of the UDF will be output to a * separate errors table in BigQuery. The errors table will be created if it does not exist prior to * execution. Both output and error tables are specified by the user as template parameters.

Hint: This flow engine is heavily oversized for our one lonely sensor — but once you get more traffic you know where to look for 😄

Visualizing data with Data Studio

That is the final step to reach our stories goal. But before we go to Part 3 we will replace the Dataflow service in order to create a less scalable version of the same functionality for better transparency — and to save some euros.

Let’s try to visualize the time series for PM10 and PM2.5 from our sensor (as we have only one device we do it only for this).

  • Navigate to datastudio.google.com and create a new report.
  • Connect your BigQuery to this report
  • Configure the date to read in as text
  • Create New Field which converts the date string into YYYYMMDDHH format
TODATE(date, ‘RFC_3339’, ‘%Y%m%d%H’)
  • Configure the sensor data as average (or max or min)
Create a new report with Data Studio connected to BigQuery
Visualization of sensor data in Google Cloud Studio
Overview of implementation

Billing

After leaving this running for ~26h billing shows me ~2,50€ consumption. All connected to the Cloud Dataflow engine.

Screen from the billing for tell-me-more-iot project

Replace Dataflow with Cloud Function

Lets quickly replace the Dataflow job (designed to handle huge amount of telemetry events) and exchange it with the less scalable but quickly deployed solution. Its cloud function.

The cloud function just runs with a predefined allocated memory on the app or compute engine account.

Cloud Function setup

index.js

/**
* Triggered from a message on a Cloud Pub/Sub topic.
*
* @param {!Object} event Event payload and metadata.
* @param {!Function} callback Callback function to signal completion.
*/
exports.pubsubToBQ = (event, callback) => {
const pubsubMessage = event.data;
const BigQuery = require(‘@google-cloud/bigquery’);
const bigquery = new BigQuery();
bigquery
.dataset("tmmiot_dataset")
.table ("tmmiot_table")
.insert (JSON.parse(Buffer.from(pubsubMessage.data, 'base64').toString()), {'ignoreUnknownValues':true, 'raw':false})
.then ((data) => {
console.log(`Inserted 1 rows`);
console.log(data);
})
.catch(err => {
if (err && err.name === 'PartialFailureError') {
if (err.errors && err.errors.length > 0) {
console.log('Insert errors:');
err.errors.forEach(err => console.error(err));
}
} else {
console.error('ERROR:', err);
}
});


callback();
};

packages.json

{
“name”: “pubsubToBigQuery”,
“version”: “0.0.1”,
“dependencies”: {
“@google-cloud/bigquery”: “^.3.0”
}
}

Hint: If you try it yourself please change the names for dataset and table

We replaced the Cloudflow (Big Data) with a smaller solution Cloud Functions.

Overview of implementation

Billing

After ~24h there is no billable cost from Cloud Function. There are only 9 Invocations registered but the function reliable executed every 3 minutes. That is most probable due to how Cloud Functions is counting Invocations.

Filtered billable services for Cloud Function after leaving it 24h running

I like your comments, corrections and suggestions 👌.

--

--