Tell me more Internet of Things — Part 2 — Google Cloud IoT cost and value comparison
Google Cloud IoT — onboarding, data storage and visualization. How to connect your device to Google Cloud IoT and stream the ingested data to a database for storage, visualization and analyzation.
Part 1 | Part 2 | Part 3 | Part 4| Part 5 |Part 6 | … | Part n
In Part 1 of this series I explained how we intend to compare the IoT Platform based on a specific use-, business-case and its value proposition. We defined the scope and size of the project and finished with the hardware setup of the sensor.
The goal of this part is to store sensor data continuously in a scalable database able to serve for visualization and data analysis. Within the Google cloud services there are multiple ways to implement this. I followed mainly these tutorials:
- Real time Data Processing With Cloud IoT Core
- Build a Weather Station using Google Cloud IoT Core and MongooseOS
We start to implement a solution which is scalable and based on the standard services Goolge Cloud provides. Afterwards we replace Cloud Flow engine with Cloud Functions to increase the transparency of the underlying services and to compare a restricted scalable version (which could even run for free for only a few devices and for prototyping)
Overview of implementation and results
The device is onboarded via a node.js MQTT (Message Queuing Telemetry Transport) agent. With RSA public / private key encryption — the IoT Core, as the device registry holds the public key of the device for encrypted communication M2. From there we use Pub/Sub to collect the data and pull it with Cloud Flow / Cloud Functions to our Database (Big Query). The visualization is implemented with the help of DataStudio M3
For the current implementation with no users and only devices ingesting data, the IoT Core costs are the highest followed by the computing costs of the cloud function. There is a minimum 1024 bytes cap on the ingested message. Which accounts for our higher net ingest (as our message is about 100 bytes with MQTT overhead a bit more). The cost per device increases with more devices and the prototype is for free K1. The service price includes maintaining and updating the infrastructure as well as keeping them secure K2.
On the soft side the device agents onboarding via MQTT node.js libraries is relatively easy and various starter tutorials exists K3. The device management (IoT Core) is well explained. There is enough documentation available but for just visualization of data its quite complex composition of services. The possibility of entering a cloud shell directly from the browser and the available step by step tutorials for the different service are great K4. All implemented Milestones (M1-M3) are provided out of the box with various different possible services including the visualization via DataStudio K5. In addition to all that greatness (you know by now already that I am close to fan boy status) the sheer endless possibilities to define where your data should be stored or processed makes the platform fit for IP (Intellectual Property), ECC (Export Control Classification) and possible other liabilities. On the other side of the shining medal this free choice increases the complexity of the implementation.
I have created a configurator for this implementation which you can use to explore how I derived the costs for the different services. Inputs are
- amount of devices (our case: 1; 100; 1000; 10.000)
- message size (our case: 100 Bytes)
- message frequency (our case: 20 message per device per minute)
There is of course as well the official more complex cost estimator — with many more functions and services. https://cloud.google.com/pricing/
In this Part 2 implementation (M0-M3) is not included yet: User Management — individual visualization and event suggestions. Efficient data storage for fast querying for the user. Automated data analytics for the user.
Highly scalable implementation
The solution is based on onboarding the device to
- IoT Core. Here the device gets registered via a public/private key (asymmetric encryption) provided by the device owner. We only send data from the device to Google Cloud IoT Core. The device manager can do more — like updating the devices configuration (incl. rollbacks etc.)
- The device runs a node.js program which sends the sensor data to the registry via MQTT (Message Queuing Telemetry Transport) protocol.
- From there we create a subscription in Pub/Sub to this registry.
- Data flow is used to fetch the data from this subscription and pushes it for permanent storage to the Database BigQuery.
- BigQuery holds the schema for storage and is queryable by API or with Data Studio
- Data Studio for visualizing, reporting and manual analytics.
You ask yourself maybe why do I need so many services to just visualize and query data? We are in the same boat here — lets find out together.
This implementation does still not provide
- User Management — individual visualization and event suggestions.
- Efficient data storage for fast querying for the user.
- Automated data analytics for the user.
If you like to try it out yourself please get yourself a free-tier Google Cloud offer. There is a free trial with ~200$ of free usage for 1 year. In addition there are certain always free limits too.
In Google Cloud we can separate different projects by project-ids. They have their own resources, billing and analyzation. We define a new Google cloud project. My project name (project id) is tell-me-more-iot
. It is a unique global identifier and you need to choose one yourself.
Maybe here a little overview map for the new ones (it’s quite a bit of functionality in this GUI)
Google IoT Core and Pub/Sub
There is a good quick start tutorial available for basic onboarding (or even virtual onboarding of compute instances — if you don’t have a pi or a running linux laptop on hand)
We introduce both possibilities to create our registry and devices by GUI (Graphical User Interface) and via the command line. For this I show both ways later you can decide how to progress yourself
With the GUI
We chose following names:
PROJECT_ID=”tell-me-more-iot”
TOPIC_ID=”tmmiot-topic-1"
TOPIC_PATH=”projects/tell-me-more-iot/topics/”
REGISTRY_ID=”tmmiot-registry-1"
REGION=”europe-west1"
With the command line via Cloud Shell
The cloud shell (terminal) is an ephemeral linux debian instance with pre configured software. You can define your own docker profile (the image which should be deployed once cloud shell opens) with your necessary software. But it comes already with git, node.js and of course gcloud — the CLI (Command Line Input) for the entire service portfolio of Google Cloud.
Execution of the gcloud command leads to
@cloudshell:~ (tell-me-more-iot)$
gcloud iot registries createERROR: (gcloud.iot.registries.create) argument (REGISTRY : — region=REGION): Must be specified.
For detailed information on this command and its flags, run:
gcloud iot registries create — help
We could consult further the help or you just execute the following script with
@cloudshell:~ (tell-me-more-iot)$
mkdir scripts
touch createIoT-1.sh
chmod +x createIoT-1.sh
nano createIoT-1.sh
copy and paste the following code ( mkdir
creates a new folder scripts, touch
creates a new file with the name createIoT-1.sh (our shell script). chmod +x
give the file the rights to get executed. nano
is a command-line editor / alternatively you could use the web-based cloud editor from Google.)
#!/bin/bash
PROJECT_ID=”tell-me-more-iot”
TOPIC_ID=”tmmiot-topic-1"
TOPIC_PATH=”projects/tell-me-more-iot/topics/”
REGISTRY_ID=”tmmiot-registry-1"
REGION=”europe-west1"
gcloud config set project $PROJECT_ID
gcloud pubsub topics create $TOPIC_ID
gcloud iot registries create $REGISTRY_ID \
— region=$REGION \
— no-enable-http-config \
— enable-mqtt-config \
— event-notification-config=topic=$TOPIC_ID
Then execute the shell script with
@cloudshell:~ (tell-me-more-iot)$
./createIoT-1.sh
Not so much happening at the moment as we have not send anything to the registry neither did we define a device (the representation in the registry of our pi)
Let start preparing sending the sensor data to the registry.
Prepare the device for sending data to the cloud registry device
Back to the pi.
We create a folder for our MQTT client software
@raspberry: ~ $
mkdir tmmiot && cd tmmiot
Pull the dedicated client node.js script (based on this extensive example repositories)
@raspberry: ~/tmmiot/ $
git clone https://github.com/jhab82/tellMeMoreIoT-MQTTclient
Once that is downloaded we need to install and potentially replace the USB port name ttyUSB0
of your sensor in the script (see Part 1)
@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $
npm install
The program would run but would crash with asking for a private key file which we need to create.
- The private key — which decrypts messages (created and owned by our device)
- The public key — which is able to encrypts the message (created by our device and is shared with our IoT Core registry device)
According to this tutorial our device needs to sign a JSON Web Token (JWT) which is send to the IoT core for proof of identity. Further we use the X.509 certificate which expires by default after 30 days — you could add -days 100000
to have some more time for device registration.
@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $
openssl req -x509 -newkey rsa:2048 -keyout rsa_private.pem -nodes -out rsa_cert.pem -subj "/CN=unused"
Two files have been created. The rsa_private.pem
stays where it is and will not be shared with anyone or anything. The rsa_cert.pem
we should copy now over to our registry.
Onboarding the device via GUI
We navigate to our tmmiot-registy-1
and click create and fill in the necessary information, including pasting our RS256_X509 certificate into the public key value field (don’t forget to add the device). We are ready to communicate securely with the cloud now.
Sending data to the cloud registries dedicated device
Back to the pi
It’s time to bring the sensor data to the cloud. Before you are able to start the agent (or client onboarding and sending of data) you need to change some variables to fit your project in the mqtt-agent.js
.
var argv = { projectId: “tell-me-more-iot”,
cloudRegion: “europe-west1”,
registryId: “tmmiot-registry”,
deviceId: “tmmiot-device-1”,
privateKeyFile: “rsa_private.pem”,
tokenExpMins: 20,
numMessages: 10,
algorithm: “RS256”,
mqttBridgePort: 443,
mqttBridgeHostname: “mqtt.googleapis.com”,
messageType: “events”,
};
I have adapted the node.js MQTT client example provided by Google. The client runs every 3 min. and queries the sensor which then runs for 2–3 s and produces a measurement point which gets send with JSON formatted string to the Google cloud registry. Every 20 min. the JSON web token gets renewed. The data we send is JSON formated as following:
{“id”:”tmmiot-device-1",”time”:1544729995403,”date”:”2018–12–13T19:39:55.403Z”,”pm2p5":5,”pm10":6.8}
Running the client with:
@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $
node mqtt-agent.js
Or use the nohup command to run the program in the background (even if you close the terminal)
@raspberry: ~/tmmiot/tellMeMoreIoT-MQTTclient/ $
nohup node mqtt-agent.js > out.log 2>&1 &
Creating a subscription to a topic
We need to subscribe to the Pub/Sub topic in order receive or handle the data.
@cloudshell:~ (tell-me-more-iot)$
gcloud pubsub subscriptions create tmmiot-subscription
-topic=tmmiot-topic-1
The subscription will deliver the message when requested (pull) from the topic and retain the message for 7 days (this duration can be limited or extended). Once the message is received and gets acknowledged its not any longer in the topic/registry.
As we have send already some sensor data to the topic and have the subscription we can pull (request the data) — the pull will not deliver the message in any defined order. The following code just pulls one message and acknowledge it.
@cloudshell:~ (tell-me-more-iot)$
gcloud pubsub subscriptions pull -auto-ack tmmiot-subscription
The pi runs overnight and we can see the monitoring of the devices in the registry
Streaming the subscription data to BigQuery
There is a quite easy possibility to stream the data from the pub/sub subscription to a BigQuery database.
The Google Cloud Flow function runs a template Java application which streams all JSON formatted String data into a predefined BigQuery database with the equivalent schema. The data we send would require a database as shown below
{“id”:”tmmiot-device-1",”time”:1544729995403,”date”:”2018–12–13T19:39:55.403Z”,”pm2p5":5,”pm10":6.8}
Creating a BigQuery database
Let’s use the command line for creating the database and adding our schema. First we need to define a JSON formatted file tmmiot_table_schema.json
and store it in our cloudshell.
[
{
“description”: “device-id”,
“name”: “id”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Measurement time”,
“name”: “time”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Measurement date”,
“name”: “date”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Sensor 1”,
“name”: “pm2p5”,
“type”: “STRING”,
“mode”: “NULLABLE”
},
{
“description”: “Sensor 2”,
“name”: “pm10”,
“type”: “STRING”,
“mode”: “NULLABLE”
}
]
Create the dataset tmmiot_dataset
@cloudshell:~
bq mk — dataset tell-me-more-iot:tmmiot_dataset
Then we add a table tmmiot_table
to the dataset with the JSON formatted schema
bq mk — table [PROJECT_ID]:[DATASET].[TABLE] [PATH_TO_SCHEMA_FILE]
@cloudshell:~
bq mk -table tell-me-more-iot:tmmiot_dataset.tmmiot_table tmmiot_table_schema.json
Hint: In contrast to the prior CLI we wont use gcloud as the command but bq.
Creating a cloud flow stream to BigQuery
Please delete the Dataflow job after your tests as this service is designed for high performance and will cost you money over time
Still our sensor data is not getting logged for good. The subscription only holds 7 days the telemetry event data. We would like to permanently store the data and the most easiest and highly scalable solution — we use a template from Google Cloud Flow.
In order to use cloud flow we need to first define a data bucket for storing temporary files.
Hint: Here we use gsutil
to create and access the storage function of google cloud which is organized in buckets. You can certainly just use the GUI again.
gsutil mb -p [PROJECT_NAME] -c [STORAGE_CLASS] -l [BUCKET_LOCATION] gs://[BUCKET_NAME]/
@cloudshell:~
gsutil mb -p tell-me-more-iot gs://tmmiot_bucket_1
To create a subfolder we need to copy cp
a file to the subdir.
@cloudshell:~
touch tmp_test
gsutil cp tmp_test gs://tmmiot_bucket_1/tmp/
This is all possible as well with the GUI of course
Hint: we have created storage without taken care of the storage location yet. If you would need your data in a specific place — you need to specify that
Further we open the Dataflow GUI and create a new job. Job name tmmiot-dataflow
and Cloud Dataflow template: Cloud Pub/Sub to BigQuery
Job name: tmmiot-dataflow
Cloud Dataflow template: Cloud Pub/Sub to BigQuery
Regional endpoint: europe-west1
Cloud Pub/Sub input topic: projects/tell-me-more-iot/topics/tmmiot-topic-1
BigQuery output table: tell-me-more-iot:tmmiot_dataset.tmmiot_table
Temporary Location: gs://tmmiot_bucket_1/tmp
Max workers: 1Hint: n1-standard-1 is the smallest machine able to use for this template (we tried f1-micro and the next but following error arises). If you don’t hand over max workers and machine-type by default there will be 4 workers and n1-standard-4 (with 4 vCPUs and 15GB RAM).
Hint: A Dataflow job can only be stopped but not be entirely deleted. The job will be shown in the overview. If you made a typo for the bucket name for example the job failed and will stay in the overview.As I had ingested data to the topic (Pub/Sub) overnight I would have assumed to get a all the unacknowledged messages pushed into the database.But after waiting 9 min we only see three entries — meaning the streaming service only accounts for new messages.
The JAVA templates source code states
/** * The {@link PubSubToBigQuery} pipeline is a streaming pipeline which ingests data in JSON format * from Cloud Pub/Sub, executes a UDF, and outputs the resulting records to BigQuery. Any errors * which occur in the transformation of the data or execution of the UDF will be output to a * separate errors table in BigQuery. The errors table will be created if it does not exist prior to * execution. Both output and error tables are specified by the user as template parameters.
Hint: This flow engine is heavily oversized for our one lonely sensor — but once you get more traffic you know where to look for 😄
Visualizing data with Data Studio
That is the final step to reach our stories goal. But before we go to Part 3 we will replace the Dataflow service in order to create a less scalable version of the same functionality for better transparency — and to save some euros.
Let’s try to visualize the time series for PM10 and PM2.5 from our sensor (as we have only one device we do it only for this).
- Navigate to datastudio.google.com and create a new report.
- Connect your BigQuery to this report
- Configure the date to read in as text
- Create New Field which converts the date string into YYYYMMDDHH format
TODATE(date, ‘RFC_3339’, ‘%Y%m%d%H’)
- Configure the sensor data as average (or max or min)
Billing
After leaving this running for ~26h billing shows me ~2,50€ consumption. All connected to the Cloud Dataflow engine.
Replace Dataflow with Cloud Function
Lets quickly replace the Dataflow job (designed to handle huge amount of telemetry events) and exchange it with the less scalable but quickly deployed solution. Its cloud function.
The cloud function just runs with a predefined allocated memory on the app or compute engine account.
index.js
/**
* Triggered from a message on a Cloud Pub/Sub topic.
*
* @param {!Object} event Event payload and metadata.
* @param {!Function} callback Callback function to signal completion.
*/
exports.pubsubToBQ = (event, callback) => {
const pubsubMessage = event.data;const BigQuery = require(‘@google-cloud/bigquery’);
const bigquery = new BigQuery();bigquery
.dataset("tmmiot_dataset")
.table ("tmmiot_table")
.insert (JSON.parse(Buffer.from(pubsubMessage.data, 'base64').toString()), {'ignoreUnknownValues':true, 'raw':false})
.then ((data) => {
console.log(`Inserted 1 rows`);
console.log(data);
})
.catch(err => {
if (err && err.name === 'PartialFailureError') {
if (err.errors && err.errors.length > 0) {
console.log('Insert errors:');
err.errors.forEach(err => console.error(err));
}
} else {
console.error('ERROR:', err);
}
});
callback();
};
packages.json
{
“name”: “pubsubToBigQuery”,
“version”: “0.0.1”,
“dependencies”: {
“@google-cloud/bigquery”: “^.3.0”
}
}
Hint: If you try it yourself please change the names for dataset and table
We replaced the Cloudflow (Big Data) with a smaller solution Cloud Functions.
Billing
After ~24h there is no billable cost from Cloud Function. There are only 9 Invocations registered but the function reliable executed every 3 minutes. That is most probable due to how Cloud Functions is counting Invocations.
I like your comments, corrections and suggestions 👌.