Analysing AWS-IoT stack for applications in building energy efficiency

Published in

Zenergy

5 min readDec 25, 2015

One of my research directions at IIIT Delhi was on architecting the systems for managing building energy data. A lot of thought went into designing how different system components will interact when we worked on our research system — SensorAct during 2013. Around the same time, a few other system papers also were published by good research groups across the world. Some of these system papers include — BOSS (from UC Berkeley) and BuildingDepot (from UC San Diego).

While working on system architectures, we also looked into how some of the commercial IoT stacks were architected. A few systems that we evaluated included Cosm (later renamed to Xively), nimbits and Mango. After evaluating many such stacks, when we started Zenatix, we decided to use the open source stack from BOSS as the core and build on top of it.

So far, we haven’t been constrained by the core stack and have been able to modify it as per our use case requirements. Early this year, I looked at different services offered by AWS — Kinesis, DynamoDB, Lambda, EC2, SNS, SES, ElastiCache and Machine Learning and felt that all these can be combined to create a generalised stack for data management and analysis in IoT applications such as ours. Little did I know that AWS had similar thoughts and launched their AWS-IoT in October 2015.

The academic buff in me wanted to work on implementing AWS-IoT for our application and in the process evaluate it in more detail (to answer much more concretely many of the doubts raised later in the article).

I am still looking for someone who would like to undertake this as their internship with us (cant get as much time with so many other things at Zenatix to implement and evaluate myself) — if you are interested, contact me.

However, I did end up spending some time reading through the documentation that came with AWS-IoT and evaluate different features on their utility for our energy analytics application (I reserve my detailed thoughts on generalised vs application specific stacks to some future post).

Let me first pull up a system architecture diagram from AWS-IoT documentation.

AWS-IoT Architecture (from the AWS Design Guide)

Now, let me first start with some seemingly useful and new things in this architecture:

Keeping Things Shadow and Things registry in the cloud for situations when your Things are not connected is useful in certain context.
Doing the basic stream processing as the data is received in the cloud and thinking of everything (including the DB storage) as subscribing services also seem interesting.

A few aspects which are glaringly missing are:

How to connect Things to their architecture — some SDK for common platforms like Arduino, Raspberry Pi etc. would have been a useful value add to get started
How to replace some of their services with some other third party (or my own) implementation — As an example, can I use some third party time series database instead of using their DynamoDB or if I want to use web sockets for communication (rather than MQTT or HTTP) as is currently available then how can this be extended? For the database replacement kind of requirements, one can argue that this could be done by directly subscribing to the MQTT broker but then everything around the new service has to be built ground up ourselves — I would have preferred to use the glue logic from AWS and just replace the DB with some other DB based on application requirements.
End to end costs involved — We have seen AWS pricing to be notorious when using their EBS services. Simple decisions like whether I use magnetic volumes which have lower per GB cost but have associated I/O requests cost vs using SSD which have higher per GB cost and throttle the I/Os can only be taken once we know the numbers from a few months of usage. When many services, each with their multiple cost options, are combined in IoT, it will be useful if AWS can give some real examples of the costs involved for some simple applications when using all their services (Just saying pay for what you use doesn’t help!)
How to separate the meta information from the data coming from Things — This meta information will be static and should be associated at the stream level.
Sharing privileges — if users having ownership of their own things want to share data with other users in the system then how can such privileges be shared across different users?

Now, coming to the different use cases, specific to our existing requirements which we believe get constrained with the current architecture of AWS-IoT are:

Limit of 3 custom attributes with each Thing — We like to explain our things (or streams as we call them) with as many meta information attributes as possible. I don’t understand why is there this limit of 3 attributes. Adding more fields makes our query interface richer as well.
Thing Shadow — In many circumstances, keeping a shadow in the cloud (that automatically syncs itself with the device may be unnecessary). As an example, if I have to control the switching on/off of AC if temperature is more than 25. If at that instance, the AC is not accessible over cloud, I would not like to keep the control command in the cloud to be automatically synced to the AC when it comes online.
Many of the rules on which we generate alerts require us to access historical data (e.g. if energy consumed in last 15 minutes is more than 100). It is unclear on how to make rules for such scenarios from the documentation.
Data buffering on the things side — we see a lot of connectivity loss from devices connected over cellular network. AWS seem to resolve this by keeping a device shadow in the cloud. Whats instead required is a seamless way (as we already do) to buffer data on the device side and upload it when the device connects back up to ensure near zero data loss
We like performing control actions using websockets than using MQTT for it (we have been using MQTT for data collection though!). Its unclear how we extend the architecture to support websockets.
We use several automated services that monitor the health of our system from different perspectives. It would be useful if AWS could add such automated checks in their system and provide them as regular reports to the user.
Computed streams — In any IoT application, including ours, there will be several use cases which would require performing some basic computation on the streams (one or more streams) and then either returning back the computed data as a query response (rather than returning raw data all the time) or storing these computed streams automatically. It is unclear how one can support such computed streams which would require interaction across multiple services subscribed to the rues engine.
Grouped streams for joint processing — Consider an example of energy meter with multiple parameters sampled from it (e.g. voltage, current, power, energy). It is unclear how one could create a notion of a meter with multiple streams inside it. Similar thing is required on the control side when based on certain input (time or stream based), we may want to control a set of appliances simultaneously.

Analysing AWS-IoT stack for applications in building energy efficiency

Written by Amarjeet Singh