Basics of the internet of things
To connect hardware devices with the internet seems often like a good and at first sight like a simple idea. Everyone wants to digitalize. You can monitor what your devices are doing — great. You can gain insights over your product and/or the customers, and you can react on the service or business level if needed — perfect.
However, around 3 out of 4 IoT projects fail according to a recent report.
When every company has become agile, which is supposed to reduce the number of failing software projects, why is the number so high?
If an IoT solution needs to scale (success!), the data challenges become huge. Within the 5-layer complexity described in the next section, usual scalability issues in data projects arise.
Let us dive into a more detailed introduction to IoT first.
What is the internet of things?
The internet of things (IoT) is the generic idea to connect hardware items, sensors or devices (things), to the internet to generate benefits for the end-user and the thing provider. It was named IoT to get the word “internet” into the title of a talk in front of management by Kevin Ashton in 1999. He wanted to motivate the need to put RFID chips into the company’s devices in times of the dot-com bubble. By now IoT is applied throughout very many industries and it is very often combined with the current trends in artificial intelligence (AI).
Often IoT is described a little bit more pompously as the fourth industrial revolution, the next step after going from manual labor to machine-based labor, introduction of electricity and computerization. As the technology penetrates almost all areas of life, it might indeed be seen this way. Others like to refer to AI as the fourth industrial revolution — and there is a large overlap.
IoT has an inherent five-layer complexity, as sketched below: 1. device hardware, 2. device software, 3. network communication, 4. cloud environments, 5. cloud applications
This means that IoT products require business and technical decisions in all of these layers, not only in software or hardware. This boosts the requirements for the manager’s and the engineers’ skills. IoT projects tend to be complex when the number of devices in scope is high. Social risks arise with complexity (it becomes complicated to get people with the right skill set for the projects).
In hardware driven businesses, where IoT ideas often emerge, there are often gaps in expertise with respect to networks, software, and cloud. If you treat software projects in the same way in which hardware focused projects were solved, this might not work out as expected.
IoT is by now related to most industries, e.g.:
- consumer electronics, e.g. for home automation
- healthcare devices, e.g. for monitoring/optimizing long-term therapies
- factory equipment, e.g. reducing complete break-down of costly machines
- farming, e.g. optimization of water and fertilizer levels
- energy, e.g. for wind turbine maintenance
- finance, e.g. for payment options with ‘some device’
- automotive, e.g. car-to-x communication
One of the most complex “things” to consider is the car, communicating device states and user interactions to the cloud, communicating to other cars or infrastructure, using a vast amount of different sensors to understand the environment and driving autonomously. Information from various sources can be gathered and analyzed in order to improve the provided service, or in order to adjust/create new products based on the insights from infrequent states and user behavior.
This is the challenge the German automotive industry is currently dealing with. It has always been strong in manufacturing at scale, while for example Tesla as a young competitor is way more software driven. Building a prototype everyone loves does not mean that it is ready for mass production — no matter if it is hard- or software. Tesla experienced it the other way around than German OEMs, when they had to scale the Model 3 production and had to move parts of the production into a temporary tent.
The following schematic describes common IoT set-ups in some more detail
Sensors measure relevant information e.g. on machines or devices. The events need to pass a gateway: Data from the devices/sensors needs to be preprocessed (often filtered to reduce volume) and translated as different communication technologies are used, e.g. Bluetooth on the sensor/actor side and LAN on the cloud side.
In the cloud the incoming events are monitored and evaluated. Different actions can be carried out based on the insights that are gained.
The large cloud providers offer their individual IoT environments (e.g. AWS, GCP, Azure). However, it may become more complicated when you work on premises without a private cloud to prohibit a vendor lock and costs that are hard to estimate. For industrial IoT (IIoT) there are large companies like Bosch that offer gateway solutions, and also provide software on the application side — for example Siemens offers its cloud platform MindSphere.
What is the benefit? IoT allows for new business cases and to increase efficiency. You gain the information value cycle that suits particularly well for consumer electronics at scale when improving the application software continuously:
The technological expertise needed to be able to run in this circle is high. Most efforts happen on the application side — hence within the software running in the cloud.
The following section explains different use-cases.
Monitor, analyze, act
When you have the gateway under control, possibly large amounts of data arrive in the cloud or your own IoT platform.
Depending on the use-case of IoT, the monitor, analyze, act triad that needs to be built in software, must be considered in different ways. This is broken down in the following schematic, where maintenance/service, demand-driven business, and machine to machine use cases are described.
This use-case is dominant in predictive maintenance in production processes, even for smaller endeavors. In suitable business-cases it is assumed that prediction of failure by monitoring and analyzing the condition of expensive production parts with help of suitable sensors, can prevent them from breaking. Instead a much cheaper repair can circumvent long down-times and large costs.
Service can mean to use a downtime of the production-chain for cost-minimal replacements/repair. A typical example where IoT is successfully employed are wind turbines, but you could also have middle-class use-cases, such as reducing downtimes for filling machines of a beverage supplier.
The goals here are to reduce costs and/or increase revenue by having to spend less on spare parts and having fewer downtimes.
Business actions can be triggered. For example, when wind turbines run into problems, the stakeholders can start understanding under which conditions it happens. Then they can be optimized based on this knowledge.
This use-case may be for consumer goods, for example in vending machines. User-interaction can be stored and analyzed. Gained insights may be used for further optimization of the business, for example by adjusting the marketing strategy.
The goals are to increase revenue, optimize service and its costs.
Machine to machine:
Automation is every manager’s favorite goal.
A great example for automation with multiple communication partners are crop fields with monitored soil quality and moisture, reacting on bad numbers with messages to machines automatically pouring water in the right areas with the right mix of complement — and humans being only responsible for monitoring and maintenance of the machinery and the sensors. Based on the outcomes of this automation the business can be further driven.
Results are efficiency, improved processes and hence increasing quality and revenue.
General challenges for IoT
We saw some of the many use-cases for IoT. The information value cycle fits perfectly to continuous learning and continuous delivery approaches of our time.
But let us have a deeper look into the challenges that it brings along, as we learned that if four of you start an IoT project, three will fail.
Some of the most important challenges for IoT resulting from the five-layer complexity are (partly taken from J. Holdowsky, M. Mahto, M. E. Raynor, M. Cotteleer, Inside the internet of things (iot), Deloitte university press 2015)
- Energy consumption, not only from the sensors, but also from the network
- Scaling of a secure network
- Complexity and interdisciplinarity of the IoT topic
- Standardization (law, technical standards, worldwide)
- Continuous development and delivery
- Scaling of real-time calculations and data operations
- Data storage
- Data lineage, from unique IDs for sensors to data lineage for dashboards
- Data Analytics
Note that security of IoT is one of the largest issues, in particular for consumer electronics. If you want to learn more about that, you may refer to one of the many sources that are devoted mainly to this topic, e.g. V. Hassija et al., A Survey on IoT Security: Application Areas,Security Threats, and Solution Architectures, IEEE Access 2019.
Many IoT projects are motivated “from the hardware side”. Large enterprises working with huge machines determine the optimization potential through IoT and need to get a grip on the data — the oil of the current age. In the next section important aspects for suitable IoT architectures on the application side are described.
Data architectures for the cloud applications
If an IoT solution consists of few devices and few log data, many solutions seem possible. For simple industrial IoT (IIoT) one may build for example upon the Elastic Stack, many services are available in the big cloud providers by now. It consists mainly of
- Data shippers that forward logs
- Logstash, a server-side data processing pipeline
- Elasticsearch, a search engine
- Kibana, a visualization tool based on the Elasticsearch index
Kibana may include alerting and you can decide if the person to act receives an e-mail, a Slack notification, or any other kind of message.
When the requirements are high though, data pours in rapidly, analytics need to be carried out fast, with vast flexibility and with exact numbers (Kibana approximates aggregated numbers when it deals with big data). A suitable architecture with proper ETL pipelines needs to be thought through, with the possibility for different groups to work in parallel. Imagine for example how much data needs to be dealt with from smart watches that are sold by the millions per year.
Either one relies on cloud web services that are designed for the specific IoT use-case, ore one builds up an own pipeline that can be migrated if a vendor lock is prohibited.
AWS for example offers a whole set of services that you may use for building up an IoT pipeline. It is easy to lose overview over the many services and different costs they generate, and you will not be able to migrate easily once you implemented all relevant elements in this stack. Hence it might be wise just to use the “infinite” storage (S3) and computational resources (EC2) and write a modular pipeline that is deployed to AWS EKS. Not only you might have a better overview over the resources, it then still should be possible to migrate the whole system either on premises or to another cloud that manages Kubernetes — the big cloud providers offer this service by now and the S3 protocol is kind of standard for object stores.
Instead of Amazon Kinesis Firehose streaming platform, for example Kafka is a suitable platform for consideration under the Apache-2.0 license. It can handle many millions of writes in a minute and it works in a producer-consumer architecture.
Various sensors/devices/machines may produce logs to be stored in Kafka. This needs to be taken care of first, then you have the relevant raw data stored and can work one by one on the consumer side. Due to its success it is a service in the major cloud platforms by now, too.
No matter which stack is used exactly, for large projects probabilities are high to end up with a producer-consumer architecture. Extraction processes can be added one after another, for example
- to search within relevant logs with Elasticsearch and use Kibana for visualizations/search
- to visualize certain aspects based on a PosgreSQL database that is connected to Tableau
- to maintain OLAP cubes with druid and base visualizations on Superset for computationally demanding analytics use-cases.
This kind of architecture allows for flexibility and scalability.
When starting an IoT project at least the core use cases need to be specified first, the data amounts pouring in, acceptable downtimes and hence an adapted (distributed) architecture.
If you have a small, fixed number of devices, it is comparably easy. If the plan is to scale in masses, but without knowing if 10000 or 1000000 devices will be registered within the next years, the architecture must be horizontally scalable to get by with increasing loads. Again, if you are unsure if the system shall run on AWS, GCP, Azure or even on premises, Kubernetes with Docker services might be a good option — if the expertise to take this path is available.
In case you plan to scale, I suggest reading some of the articles from the Netflix technology blog. Although Netflix is a movie platform, its data science approaches are state of the art for anything where large data amounts are streamed continuously and evaluated by various stakeholders. You can have similar challenges with IoT software in case of suitable amounts of devices.
The different use-cases for IoT were introduced within the information value cycle and the inherent five layers of IoT solutions. These lead many projects to failure due to the interdisciplinary complexity.
The IoT landscape is wide, so that challenges may diverge. It is a completely different project if you connect three robots in a factory to an already available IoT platform and can accept downtimes of the monitoring system than writing the cloud software from scratch for a highly reliable system that might end up with millions of devices in scope in case of success. Both ends were touched within this articles.
Challenges for a large-scale solution were described, it makes a secure and scalable network, data flow, data analytics platform mandatory. If this is your use-case, think about a suitable, scalable, and extendable architecture first. Producer-consumer approaches suit well in this case.
Thanks for reading!
If you found the read interesting, you might want to catch up on some of my previous articles: