Tell me more Internet of Things — Part 5 — Google Cloud — User handling

How to build a web application for our ingested IoT Core data. How to implement individual, secure data visualization and device registration. Using Nodejs with App Engine, Datastore, OAuth2 and Cron jobs.

14 min readMay 13, 2019

With Google Cloud Engine (GCE) we start continuing the story. In the last articles it was all about how to get the data securely into the cloud (or in case of IOTA into the distributed Database… kind of a cloud at the moment anyways) — in a robust and scalable manner.

This article deep dives on the User Interaction (UI). We will explore how to get data into a faster query-able database (here Datastore — predecessor of Firestore) and build a web application on Google Cloud App Engine for the user to visualize their device data. As we did in the last parts of this series , we will try to estimate the introduced cost with the added functionality.

Milestones: It’s just M4 this time (well, a bit of M3 for the user)

This article will implement a way of how a users can register their devices (hopefully securely) to their accounts and visualize its data. As Big Query is not as fast nor designed for fast user interaction we use Google Datastore to store the past days of data. We deploy a web application written in node.js (express,..) on Google App engine with build in SSL (HTTPS) and full OAuth2 authentication. The user should be able authenticate himself with a google account and register a device by entering their devices ID. The devices ID has been registered by us (the manufacturer of the device after creation of the device). The user can then visualize the measurements from any browser (mobile support is added by using stylesheets bootstrapcdn together with PUG)

Overview of the Implementation: Google Cloud Service architecture with changed code in Cloud functions, added Datastore and App Engine for the web app

We will touch three services which increases our costs of the solution.

Cloud Functions: We just push our IoT Core registered Pub/Sub beside to Big Query as well to Datastore.
Datastore: We store the past one day data for every device. CRON job deletes old data every day
App Engine: We create a web application which serves as the base user interaction and visualization.

Data flow and extension to user interface (web-app)

The impact on the cost is the temporary data added to Datastore (which comes with reasonable high free quota) and the Standard App engine environment (which actually with the chosen F2 hardware architecture will cost you money even for one user if it runs all day). But it most prob. won’t as you only access the web app once in a while. Bear in mind that the engine spins up for 15 minutes minimum and that this time adds up if you even occasionally access the endpoint / URL.

Some of the cost goes away by the always free allowance https://cloud.google.com/free/docs/gcp-free-tier

Always free usage limits from Google FAQ for App Engine and (Cloud Firestore ~Datastore)

Rough estimation of cost increase by introducing an App Engine and Datastore operations. Assuming 3 instances of F2 24h necessary for the 10.000 device (user) option. In addition the Datastore is approximated to store for every device only one day data (data storage is then for all scenarios for free as below 1GB)

The cost jumps to ~10 € per device/ year for the 100 device scenario — This is due to the high app engine costs which can be optimized with a lower powered compute engine or just not opt in for the stronger F2 architecture. I didn’t really looked into that to be honest but would like to give you a possible explanation and emphasize that this approximation has several assumption and can be optimized.

This article only introduces the user management and the base for all sorts of web-based provided user configurations and visualizations. App Engine is a very easy to deploy out of the box secure (HTTPS) offer. The beauty of this product is that it scales in its standard environment up and down (adding further F2 instances or deleting them) to always provide the best performance for the current load. We established routes for handling data which we stored in the fast queryable database Datastore.

When I started the project tell-me-more-iot a couple of month ago I decided to opt in for Datastore. There is currently no possibility to have in the same project Datastore and Firestore running — as well not to switch from one to the other. At least I am not aware of. This is not a big deal but explains my decision for using Datastore here.

I found it especially nice to have SSL and OAuth relatively easy to add to the web-app. With that improving the web apps security. I am not sure whether my chosen design for the authorization of the measurement content (by asking for the user.id of the google account information) is fit for all sort of attacks and I would be glad if someone could leave a comment if she/he thinks it should be changed or could be improved.

Cloud Function: From Pub/Sub to Datastore

I always think of Cloud Functions as a single paged app engine deploy. As we use node.js this contains one application file (index.js) containing the execution logic and one package file (package.json) which is the “header” file containing the information of which modules we need for our Cloud Function.

Last time we used node.js 6.0 (this was the default engine) now I switched to node.js 8.0. It took me a whole day to figure out that in node.js 8.0 the syntax for creation of objects has changed. The logging wasn’t really helpful and I did not know how to debug it in Cloud Functions — quite frustrating.
I think I “brute forced”-style trial and error until hitting version number 19 :-) before it worked.

Below the entire code now extended to push data to Datastore once available

Cloud Function from pub/sub to BQ and DS

This function automatically writes the incoming data into the Kind tmmiot_datastore_2 to the appropriate columns. No need to create a new DB or a blank schema. As Datastore will be most likely soon replaced entirely by Firestore the syntax would be a bit different using Firestore.

App Engine: Web application for device registration, user handling and more

The web application is written entirely again in javascript. On the back-end we have node.js and the front we use PUG (a HTML template engine).

The application architecture is derived from googles own node.js end-2-end Bookshelf example app: https://cloud.google.com/nodejs/getting-started/tutorial-app

In addition we use for user authentication and authorization OAuth with google OAuth 2.0 client IDs (you need to authorize yourself with your google account to enter the app)

Further we enable logging from the start for better debugging.

The general setup

I do directly coding inside the cloud shell and using the editor functionality as seen the pic below:

Its an ephemeral docker container spinning up once you hit the cloud shell symbol and it comes with all the required software (git, node, npm,…) and sure you can add to the standard deploy in a docker format.

In order to prepare our project form the scratch first create a project folder. In this folder I like to add the freely chosen sub-folders and start the node.js project.

(tell-me-more-iot)$
npm init
mkdir router
mkdir router/private
mkdir model
mkdir views
mkdir views/private
mkdir lib
npm install --save @google-cloud/datastore
npm install --save express
npm install --save express-session
npm install --save pugtouch app.js
....npm install

A quick walk through for the beginning npm init initiates the package file with information about the project. Withmkdir all folders will be created. Where in router/private we intend to store the express router (which is the endpoints of our URL base web-app). In the sub-folder modelI like to store the handler to our Datastore. In views its all PUG (HTML) code for the front-end. With npm install --save ... we save the dedicated module to our newly created package.json file. Finally we create the first file app.js and after adding all the code install the app with npm install . Just afterwards the node_modules folder is getting filled up with all code requested in package.json

npm init executed from google cloud shell

Most simple web-app implementation. Without logging, authentication, data store handling

The above Gists show the simplest implementation of this architecture. We won’t deploy that to the app engine just yet. We can simply start the application “locally” (actually in the cloud on a spun up app engine) without deploying it by altering the package.json file

...
"main": "app.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"start": "node ${SCRIPT:-app.js}"
},
...

Lets start the application by npm start and opening the browser with the little browser sign.

(marked yellow) starting the URL with the “locally” deployed app

You can see that this is an ephemeral app engine with a static link (8080-dot-4015961-dot-devshell-appspot.com) this URL we need to add later for our OAuth registered endpoints.

Next we need to add

Datastore routes (model/datastore-device.js)
Authentication via OAuth (lib/passport.js)
Logging (lib/logging.js)
a few more views and a body-parser for POST requests

You can find the code in its final version here if you like:

jhab82/gce-tmmiot-app

tmmiot user application (M4-M6). Contribute to jhab82/gce-tmmiot-app development by creating an account on GitHub.

github.com

Adding Datastore communication

We like to fetch the measurement data from device from Datastore. In order to do so we need to add the communication model to Datastore. This is half happening in /model/datastore-device.js and the URL which routes to the functionality is added in /router/private/device.js (why did I chose the subfolder private ? because I like to add later OAuth and make it private at the moment its terrible public though).

I won’t go into all the details but the easiest route will be a data list presentation. The function which queries all the measurement data by a given deviceId stays below (in /model/datastore-device.js ):

function listDataByDeviceId(deviceId, limit, token, cb) {
  const q = ds
    .createQuery([kind_data])
    .filter('id', '=', deviceId)
    .order('time', {descending: true})
    .limit(limit)
    .start(token);ds.runQuery(q, (err, entities, nextQuery) => {
    if (err) {
      cb(err);
      return;
    }
    const hasMore =
      nextQuery.moreResults !== Datastore.NO_MORE_RESULTS
        ? nextQuery.endCursor
        : false;
    cb(null, entities.map(fromDatastore), hasMore);
  });
}

The function and exposed endpoint (URL: …/list/tmmiot-device-4) which triggers this function is the router here (in /router/device.js ):

router.get('/list/:deviceId', oauth2.required, (req, res, next) => {
  //check the deviceId requested is registered to the owner
  getModelDevice().isDeviceIdRegistered(req.params.deviceId, (err, device) => {
        if (err) { console.log("err: " + err) }
        else if (device[0].userId == req.user.id) {
                  
        getModelDevice().listDataByDeviceId(
            req.params.deviceId, 
            10,
            req.query.pageToken, 
            (err, measurements, cursor) => {
            if (err) {
                
                res.redirect("/device");
                } else {
                    
                    res.render('private/measurementList.pug', {
                        measurements: measurements,
                        nextPageToken: cursor,
                    });
                }
            });
        } else {
            res.redirect(`${req.baseUrl}/`);
        }
    });
});

The function takes into account only querying the last 10 measurements and has a link to the next 10. The result in combination with the front end pug presentation private/measurementList.pug

extends ./base.pugblock content 
  h3 Data from #{measurements[0].id}
    
  each measurement in measurements
    .media
      pre
        span Date: #{measurement.date} 
        span Temperature: #{measurement.Temp}°C 
        span Humidity: #{measurement.Hum}% 
        span PM2.5: #{measurement.pm2p5} mug/m^3 
        span PM10: #{measurement.pm10} mug/m^3if !measurements.length
    p No measurements found.if nextPageToken
    nav
      ul.pager
        li
          a(href=`?&pageToken=${encodeURIComponent(nextPageToken)}`) More

One thing which comes with the high performing database is that you can only do single queries (and no *like queries) but as seen above we like to have them sorted by the time and filtered by Id that means we need to create an index for this:

//index.yaml
indexes:
- kind: tmmiot_datastore_2
 properties:
 - name: id
 - name: time

The index.yaml file can be deployed to Datastore with gcloud datastore indexes create index.yaml straight from the cloud shell

As a list is probably the worst representation when it comes to time-series we need to extend the front and back-end to use charts.

I will only show here the new PUG /views/measurementChart.pug the routes and model changes you can review in the git repro. Here we use google charts java-script packages which we pass all the measurement while rendering a stringified object list var measurements = !{measurements}; . I am sure that there are nicer ways to do such things (Would be great if you could leave a comment about other ways of doing that).

extends ./base.pugblock content
  h3 Chart from #{deviceId}
  script(type='text/javascript').
        google.charts.load('current', {'packages':['corechart'], callback: drawChart});
        var measurements = !{measurements};
        function drawChart() {
                var dataLine = new google.visualization.DataTable();
                dataLine.addColumn('datetime', 'Date');
                dataLine.addColumn('number', 'Temp');
                dataLine.addColumn('number', 'Humidity');
                dataLine.addColumn('number', 'pm2p5');
                dataLine.addColumn('number', 'pm10');for (var i = 0; i < measurements.length; i++) {
                    var d = new Date();
                    d.setTime(measurements[i].time);
                    dataLine.addRow([d, parseFloat(measurements[i].Temp), parseFloat(measurements[i].Hum), parseFloat(measurements[i].pm2p5), parseFloat(measurements[i].pm10)]);
                }var chartLine = new google.visualization.LineChart(document.getElementById('div_LineChart'));
                chartLine.draw(dataLine, { width: '400', height: '600', legend: {position: 'top'}});
            }if measurements.length < 3
    p No measurements found. 
  else
    #div_LineChartif nextPageToken
    nav
      ul.pager
        li
          a(href=`?&pageToken=${encodeURIComponent(nextPageToken)}`) More

The result is the colorful time-series presentation of our measurement data

Deploying the app to App Engine

This is quite to easy for an own chapter but needs some explanation. The only thing really is necessary to go from local (testing env.) to production to decide on the name of the service (here: gce-tmmiot-app) and whether we like to deploy in a flexible or standard environment. We chose the standard environment with the instance power dedicated to F2 machine. All this information just need to get squeezed into the app.yaml file

//app.yaml
service: gce-tmmiot-app
runtime: nodejs10
instance_class: F2

The web page will get by default HTTPS access and the URL is the following https://gce-tmmiot-app-dot-tell-me-more-iot.appspot.com

constructed from https://[service-name]-dot-[project-id].appspot.com

So after deploying you can see the live stats

I would recommend to set the Quotas max allowance per day to 0$ for the beginning — that saves you from paying if you mess up things or a bot army crashes your site. Navigate to Settings:

Well after doing all that you can enjoy your app from your phone form your wife’s laptop from your works laptop or anywhere else you have an internet connection.

Spoiler alert: This is how the application looks like when you integrate OAuth and a few more below steps.

Adding Authentication

As mentioned before. The past exercise was all public accessible. But I am sure that especially going fwd. not everyone would like to share his device measurement with the world. This is why we introduce authentication. Despite a common misunderstanding that this will hold others from stealing your data (which can be solved using authorization concepts) its just enabling getting your google account id your name and your profile picture to our application in a secure way (OAuth is safe — at least for now).

In order to check which data is handed over to which user. The user needs to register his device to his google account id. This id is then saved to the device-id and we query always that the authenticated user only can access this data only. Sounds easy — well it kind of is — with GCE.

As shown above we need to add our URLs (test and production) to the authorized ones. With that we get a client ID and a secret — both we need to setup in our node.js application.

The magic happens in lib/oauth.js where we read our config.json file which holds the above mentioned client ID secret and OAtuh callback link

Make sure that you have one config.json for your testing enviroment (“local”) and one for production — both files only differ in the OAUTH2_CALLBACK link.

config.json for the production app engine deploy

lib/oauth.js where we read in the config.json file

After getting all the routes private by adding OAuth to the handler i have designed the back-end Datastore to hold one Kind as the “produced devices”. Once the customer get his device he can register it using the id of the device to his account.

...
router.post('/register', oauth2.required, (req, res) => {
...

From here we have pretty much everything in place for accessing the streamed data from a web interface. Bearing in mind that the device would need to have been manually added with its public key to the IoT Core and configured to pub/sub (manufacturer). In addition the device name would have to be added manually (after production) to the database.

Once that is done any user can log into the web page and register the device just entering the name of the device (the device id, should be chosen as not guessable — maybe 81 trytes ;-9.

We will for sure need to make that process a bit more automated going fwd. Next its scheduling a recurring deletion of all data which is older than 7 (or 1) days from the Datastore.

Deleting the old data from Datastore

GCE app engine provides the so called cron job within its interface. Which just executes a publicly accessible URL of your app (the public can be omitted and this is what we would really need to do). Anyways this time I just use a very unique non guessable URL (“/deleteAllOldData123456789”)and connect to the cron job. My cron.yaml looks like that

cron:
- description: "delete every 24h data which is older than 1 day"
 url: /private/device/deleteAllOldData123456789
 schedule: every 24 hours
 target: gce-tmmiot-app

In my router (device.js) I have added the route

router.get('/deleteAllOldData123456789', (req, res, next) => {
     getModelDevice().deleteAllOldData(null, (err, response) => {
         if (err) { console.log("err: " + err)}
         else {
             res.status(200).send('Done');
         }
     });
});

And the datamodel handling with Datastore uses this (in /model/datastore-device.js ):

function _delete(id, cb) {
 const key = ds.key([kind_data, parseInt(id, 10)]);
 /** TODO: only allow deleting parts with no children */
 ds.delete(key, cb);
}function deleteAllOldData(option, cb) {
   time7Days = Math.floor(new Date() ) - 1*86400*1000;
   //console.log("time7Days: " + time7Days)   const q = ds
   .createQuery([kind_data])
   .filter('time', '<', time7Days)
   .select('__key__'); ds.runQuery(q, (err, entities) => {
   if (err) {
     cb(err);
     return;
   } else {
       for (i in entities) {
         //console.log("entity: "+ JSON.stringify(entities[i][ds.KEY]['id']));
         _delete(entities[i][ds.KEY]['id'], (err, success) => {
             if(err) {}
             else {
                 cb(null, "done")
             }
         });
       }
   }
 });
}

Strange that even without extending the index.yaml by the code seems to work. The above model is not very robust apparently, assuming massive amount of new data in Datastore, deleting them, should be handled via Cloud Flow (another Google service) — which is actually quite good at big data and uses strategies like map reduce to stay robust and in sync.

//maybe to be added to indey.yaml
- kind: tmmiot_datastore_2
properties:
- name: time
- name: __key__

In this example we will just leave the past 1 day data in Datastore. Honestly I don’t know what is more expensive to execute the delete or store the data — as always a function of how many devices ingest the data I suppose. For the moment we leave it like that an bear in mind that the customer like to maybe skim older data as well. Knowing that we might compress this data in Datastore in a mean value per day representation…

Or a second way would be to let the user query BigQuery directly for older data accepting the worse latency.