The most important devices in IoT systems are sensors and actuators. Sensors collect data from the real world and actuators interact with it based on the data collected — this is why data analysis is central to IoT systems.
A cognitive IoT architecture is a specific type of IoT implementation. Utilising machine learning models trained in a cloud or cluster, a cognitive IoT system makes the majority of its real-time decisions at the edge. This way, decisions are made quickly each time a data set arrives from the sensors. Further analysis of the data and tweaking of the edge model can be done in the cloud later.
The architecture outlined below is what might be implemented to create a ‘smart’ fridge:
Two RuuviTag sensors were placed in the fridge door at Streamr’s office. The data was published via BlueTooth LTE. A Raspberry Pi was set up to collect it. A Node-RED flow in the Raspberry Pi then aggregated and filtered the data to detect events such as the door being opened. The raw data with some aggregations was then sent as a stream in Streamr. A Node-RED instance in IBM Cloud then subscribed to that stream. The data was then further analysed in the cloud.
Streamr is used to connect IoT sensor data, collected by a gateway device, to IBM Cloud. This makes connecting the real-time data from the edge to the cloud very simple. An additional benefit of using Streamr is the mechanism for data monetisation. Streamr could be used to sell data collected by user-owned sensors. The Community Products system being developed by Streamr could be employed in this case to create monetised data union firehoses.
An ideal cognitive IoT-architecture with the used technologies
The implemented architecture
Step-by-step set up
In the above images, we see the role of the edge more clearly. However, an actual machine learning model isn’t used. Instead, more classical statistics are used to implement anomaly detection. The cloud service that was used was IBM Cloud, which provides a free starter kit that enables access to Node-RED for easy cloud integrations. (Watch this video to learn more about the starter kit).
Streamr has a library readily available in the Node-RED palette, so no extra configuration outside of creating and connecting to a stream is required. The library to publish and subscribe to Streamr is found in Node-RED by going to:
Manage pallette -> Install -> Search for: node-red-contrib-streamr -> Install
After installing the library, you should create a stream in Streamr’s editor. You need your Streamr API key and the created stream’s ID to connect to it in Node-RED. Then you should drag either the Streamr sub node for subscribing to a stream or the Streamr pub node for publishing data to the canvas. By double-clicking the node, you open up the properties where you can copy and paste your API key and stream ID.
Here, Node-RED is also used to connect the RuuviTag sensors to the Raspberry Pi. Connecting the RuuviTags to Node-RED is pretty easy, as demonstrated here. If you are using a Mac to connect the RuuviTags you might need to reconfigure the Noble library that is used to establish Bluetooth connections in Node-RED. (More info for Mac users is here). If you are using a Raspberry Pi as the edge gateway device, you need to downgrade/upgrade your Node.js version to 8.x. This has to be done for the Noble library to work.
The data fields sent by the RuuviTags via Bluetooth
The three acceleration axes are aggregated to a single value representing the total acceleration of the sensor. Then the edge flow calculates a moving z-score anomaly detection algorithm for the g-force calculated from the sensors in the fridge door. This way one can make fairly certain assumptions about when the fridge door is opened or closed based on z-score. The current window for the algorithm is 60 data points, which should mean that the window corresponds to one minute because the RuuviTags send data once per second.
The z-score threshold for anomaly detection can be changed by sending an API request to the Node-RED instance in the cloud. The Edge flow gets the threshold value for the anomaly detection from the cloud in one-minute intervals. It’s probably not necessary to have the interval set so high because new models may only be processed in IBM Cloud once per hour. However, as manual API requests are currently the only real way to set the threshold, a one-minute interval is probably good.
There is also a filter function for events with timestamps that are three seconds apart. This way we can filter away data that can be assumed to be corresponding to the same event.
You can import the flow from Github if you wish to analyse the flow and its functions further.
Temperature and humidity data is aggregated to means and z-scores and then pushed to a database, so that more analysis can be done on the data later on. The get functions are for updating and getting the edge model. The update function can be called by:
The value for
new has to be between 0 and 1.
The model update can be called from the Jupyter Notebook that has been set up as the data analysis platform in IBM’s Watson Studio. Doing machine learning in batches using the Notebook instances with Apache Spark is a great way to update the edge models when they are required.
In the picture above you can see the basics of how to use Notebook with Apache Spark and Python in IBM’s Watson Studio. Watson Studio also supports Notebook with Scala. (Check out this video to find out how to launch Jupyter Notebook instances in Watson Studio).
You can also see how a Cloudant connection can be established to the Notebook in the picture above. You can create read-only credentials easily in IBM Bluemix’s console for Cloudant. Just go to the permissions tab in your Cloudant database’s console and click on generate API key. This gives you a username and password for the database. “Cloudant.host” is simply the base url of your Cloudant database. After the Cloudant connection is set up you can easily make SQL queries with Spark to analyze or visualize the data.
As you are able to set up Jupyter Notebook to run, for example, every hour in the cloud, it is possible to update the models in the Raspberry Pi at those intervals. For example, you could train a logistical regression model in the cloud. Then the new model is updated to the edge by an HTTP request. However, setting up a Raspberry Pi to receive external HTTP requests takes a little bit more work because you need to set up routers and take care of security. Creating external connections to edge gateway devices is not a recommended security practice. Instead, the gateway should be responsible for making the GET or POST requests to the cloud. This is why there is no REST API in the Raspberry Pi when used as an edge gateway. Instead, it gets its model from the Node-RED instance in the cloud and the model updates after data analyses are first updated to the cloud.
IBM has connected Apache Spark and Notebook for fast data exploration. The Pixiedust library is used to visualise the data. By doing some exploring, I was able to decide on which model to use to filter events where a door being opened is likely. Logistical regression and anomaly detection were the leading candidates. I decided to use sliding z-score anomaly detection for this use case. Anomaly detection is a safer method because the default total acceleration of an immobile RuuviTag might change based on which side it’s laying on. Violent door openings did quite often flip the RuuviTags around.
The meanTotalAcceleration displayed in the graph is calculated at the same time with the sliding z-score, so it is calculated with the same window of data. This also means that the mean is also sliding.
All the spikes in the graph can easily be found by the z-score anomaly detection. However, it could be useful to implement a logistical regression model that would be used to estimate whether or not a door has been opened without z-score as multiple door openings in the same window might not register as anomalies.
More technical documentation can be found in the code repository of the example application.
As the application has been running for two months now we have some interesting data to share from the 1,700,000+ RuuviTag data points stored in Streamr. The free version of IBM Cloud’s Cloudant can only store up to 500 MBs so Streamr’s historical data capabilities are of great use here.
Starting with the g-force of the RuuviTags, the average g-force of all data points is at 1.05 G. This is roughly equivalent to the Earth’s gravitational pull. However, for the anomaly detected door events the average g-force of the recorded events is below that, at 0.937 G. This is most likely because the RuuviTags were set up to move as much as possible, which caused them to jump around a bit. Effectively the RuuviTags could be in near-weightless states at the peaks of the jumps. You can see this visualised in the acceleration graph earlier in the blog. Moreover, the lowest point of total acceleration recorded was 0.071 G. On the other end, the highest acceleration recorded was at 1.7 G. Perhaps someone was having a bad moment at 13:39 on May 29th.
We can also analyse the recorded temperature, humidity and pressure fields of the RuuviTags. By looking at the highest temperature recorded at 13.66°C, we can assume that the fridge door was left open for quite a while on June 4th between 6 and 7 PM. The application has also confirmed that the office fridge can reach freezing temperatures, as the minimum temperature during the two months was -0.94°C.
The RuuviTags measured the lowest pressure in the fridge on the 5th of July at 14:49. This happened to correspond to the weather event shown below.
Overall the pressure did not change much in the fridge, and the minimum and maximum measurements weren’t that far from each other. Still, it’s interesting to see a clear low-pressure-area move through Helsinki when the lowest amount of pressure was measured.
Lastly, unsurprisingly the humidity of the fridge correlated with the temperature of the fridge. These two values went pretty much hand in hand throughout the recorded two months.
Hopefully, the implemented architecture and historical data analysis gave you an idea about how to use Streamr in IoT and other data-driven applications. All of the data you publish to Streamr can be published on the Marketplace where you can share it for free or sell it to other users interested in your data. You can find the two data streams from this experiment (one for continuous RuuviTag data and the other for anomaly detected data points) here.