Resolving large-scale performance bottlenecks in IoT Networks accessing Big Data

Dr. GP Pulipaka
10 min readAug 2, 2016

Introduction

Juniper Research has shown in the research that the rise of the connected revolution by Internet of Things is estimated to grow by 285% and reach a staggering number of 38.5 billion by 2020. The connected revolution of Internet of Things from Web 4.0 has invaded government, retail, intelligent grids, homes, healthcare, mobile healthcare, aviation, wearables, agriculture, utilities, and automotive industries.

The rise of the Internet of Things

· In the early 1990s during Web 1.0 revolution, approximately there were around 300,000 desktop computers that were connected to the Internet for broadcasting the brochures of businesses on the static web.

· By 2000s, the desktop computers reached a staggering number of 300 million.

· As the Web 4.0 revolution hit the masses, as of 2016, 2 billion phones are connected to the Internet.

· By 2020, according to the research report sponsored by RS solutions, an electrical manufacturer, around 13 billion refrigerators, kettles, Television sets, tablets, lights, security cameras, and smoke detectors are connected to the Internet powering the Internet of Things revolution.

· Around 3.5 billion global positioning navigation systems (GPS), car systems, and various sensors in the connected car are connected to the Internet. Google, Toyota, and Intel are advancing the connected car Internet of Things revolution with their research labs and acquisitions of Internet of Things companies.

· Wearables is another component of Internet of Things that are connected to the Internet of Things such as watches, shoes, wristbands, socks, and clothing adding a staggering number of 411 million.

· Approximately 646 million mobile health devices for precision medicine, diabetes care, health care, monitors for heart rate, monitors of blood pressure known as mIoT (medical Internet of Things) are connected to the Internet.

· The revolution of Internet of Things is fueling the innovation in building smart cities like San Jose and many other cities in the US. There are around 9.7 billion smart buildings, traffic lights, parking meters throughout the city, and monitors on city pollution are connected to the Internet.

All of the Internet of Things are fueling the technological singularity and breaking down Moore’s Law that stood for decades regarding doubling up the processing capabilities of the chips. According to Deloitte University Press, the Internet of Things goes through information value loop cycle. Following are the key technologies and components that enable Internet of Things.

· A sensor of Internet of Thing is a device beaming an electronic signal through complex event processing or from a physical condition. These sensors are everywhere from clothing to aviation. The sensors are transducers by transforming the electrical energy into mechanical energy. The types of sensors vary such as geographical, geospatial, position-based, sensors at rest, sensors in motion, force and touch sensors, barometric, pressure sensors, smart water meters for flow, sonic sensors for measuring the audio, acoustic sensors transforming the data from digital or analog signals, moisture sensors measuring the quantity of water vapor, radiation sensors, infrared detectors, biosensors measuring biological levels, chemical sensors, and temperature sensors.

· The fundamental mechanism to route the electronic signal generated by the sensor as part of Internet of Things would be through a wireless network that can transmit the speeds with a bandwidth of 300 megabits to one gigabit per second (Holdowsky, Mahto, Raynor, & Cotteleer, 2015).

Networks of Internet of Things

The performance bottlenecks can stem from the networks transmitting the web data generated by Internet of Things. The raw data transmits from the sensors of Internet of Things in real-time to the networks in an unstructured format. The big data tsunami gets aggregated pooling the data from billions of sensors. The web data generated by the Internet of Things goes through several channels of networks such as network bridges, gateways, network exchange switches, and routers. The traditional desktop computers, laptops, smartphones, and tablets transmit the data through the router of the Wi-Fi network. In order to distinguish the Internet of Things, a primary and unique protocol such as IPv4 or IPv6 number is assigned. Already there are around four billion addresses of IPv4 have been allocated from the availability of six billion IPv4 addresses. IPv6 was invented to handle the scalability problems of such assignments projecting an availability of 340 undecillion addresses to handle the projected 38.5 billion devices by 2020. The web data generated from the Internet of Things can be transmitted from LAN, PAN, Bluetooth, NFC, Wi-Fi, and WAN networks.

Massive data aggregation process from Internet of Things

The data aggregation process can be another bottleneck to process the unstructured, raw data accumulated by the sensors of Internet of Things with high performance computing methods. The telematics data from the connected car, transportation fleet can be uploaded to the cloud for processing. The pooled unprocessed and unstructured data go through the transformation, loading process after the extraction of the data. Various big data SQL and NoSQL tools process this data through queries and joining the tables to provide finally the data visualization of the data. The big data ETL tools can break the petabyte barrier with high-speed computing that can handle the variety of big data by parallel processing the data with the components of MapReduce and HDFS.

Performance challenges from Internet of Things

As Web 4.0 evolution invades the web, M2M, M2H, H2M communications have evolved rapidly especially impacting the Internet of Things. The Internet of Things devices communicate with other devices D2D (device-to-device). The big data generated from the Internet of Things through web and sensors get transferred to servers with device-to-server architecture. The challenges can arise when there is a constant connection between the devices of Internet of Things and the server. The servers need to have the high scalability to handle such performance bottlenecks with high bandwidth. Similar to the challenges faced by the big data generated from the web, the challenges from the network latency, lower bandwidth, dropped packets, a large number of users communicating with machine devices creating deadlocks for the database inserts and updates can cause performance challenges. Following are few protocols from Internet of Things that can create potential performance bottlenecks.

· AMQP — This protocol allows to connect one server to another server for processing the web data of Internet of Things. Some of the Internet of Things architectures deal with pub/sub (publish and subscribe) architecture. The advanced messaging queuing protocol leverages the message bus architecture. This protocol is suited for fast speed data transmissions for devices generated web based big data. The connected services from the Internet of Things devices could be running on a cloud or operating on PaaS (Platform-as-a-service) publish and subscribe model. AMQP protocol works much similar to Apache Kafka, which is also a publish subscribe model. Each Kafka broke can manage 1000s of reads and writes in excess of gigabytes from an array of clients. AMQP protocol is a popular protocol for web services.

· MQTT — This protocol is built for data aggregation from each device of Internet of Things and transmitting the data interactively to the server from the device with D2S model. IBM originally developed this protocol. IBM later released the source code of this protocol as open source code. However, the architecture is intended for device to service model.

· DDS — This protocol is for the communication between devices to devices.

· XMPP — This protocol is for the communication between human to machine.

In the context of Internet of Things and large-scale data generated from these devices, Intel Internet of Things Analytics Dashboard offers data mining from geographically distributed devices sensors. The temperature wireless web-based sensors deployed throughout the transportation freeways geographically aggregate the data to a local data collection point. The Intel IoT Analytics website gathers all of this data by connected to an array of sensor-based networks on the web and preprocesses the large-scale web data generated from the Internet of Things devices for providing dashboard analytics. This massive amount of data comes to the data collection point initially through wireless, wired, or Bluetooth network. All of this data gets collected into Intel Galileo and Edison Dashboards. One the data is collected, preprocessed through the rules engine, the data gets stored on IoT analytics web servers.

Vodafone is heavily investing into Industrial Internet of Things for connected car revolution. Vodafone Automotive arm delivers Internet of Things for in-car telematics for tracking the theft of the vehicles. Mobile health devices and wearables tracking and monitoring the health by collecting a trove of big data through the web require higher network bandwidth on 3G, 4G, and LTE networks. The in-door large-scale web-based traffic and the out-door mobile web-data create severe performance bottlenecks for Internet of Things. According to Cisco, 80% of the web-based traffic is generated in-doors. Internet of Things devices create big data and fast data through the web and cellular networks. TowerSource resolves such performance bottlenecks on Internet of Things devices with fiber connectivity network infrastructure deployment solutions by offering SaaS (Software-as-a-service).

Recently Amazon has released AWS IoT. First of all, AWS IoT is not just another platform delivering SDK to all the IoT developers out there. It has specific objectives and design in the place built through years. AWS IoT has done away with conventional X.509 certificate installations. It introduced a new method with SigV4. One of the first steps in IoT is activating a connected device by installing the certificate on it. A local hub gateway will handle the authentication of the device before transmitting out the data. Amazon natively built the IAM authentication engine that is tightly integrated into the IoT ecosystem. Each IoT device has to be identified uniquely with identity management. Amazon leverages Cognito to handle that scenario. As mentioned in my earlier discussion, MQTT, AMQP, DDS, and XMPP protocols are the vital protocols that create performance bottlenecks for Internet of Things. Amazon does support message queuing telemetry transport protocol with the publish/subscribe model. This is specifically designed to handle the performance bottlenecks by powering the low-powered devices by optimizing the network bandwidth for the data transmissions. The HTTP protocol aids to establish the communication with the cloud-based gateway. Amazon also mentioned AWS IoT is highly customizable to implement any custom protocols through development. IoT is analytics platform. To determine, what data needs to be sent, what data needs to be filtered, AWS IoT provides the following core capabilities.

Connectivity to the devices

The connectivity of the devices is established either through the native protocols or custom protocols. Once the devices are connected to the gateway, the messages and acknowledgments flow bi-directionally.

Data orchestration

The orchestration engine supports an additional rule engine that allows creating rules for the types of data or allowing any third-party processing data engines to process the data. Lambda, S3, SNS, and Kinesis Firehose are few examples of such engines that can process these messages to specific data lakes not necessarily data warehouses. A rule looks a lot like am SQL query statement.

New challenges for developing an indexing scheme used to assist accessing large-scale Web data generated by IoT

The challenges for developing an indexing scheme for the web data generated by Internet of Things can arise due to the time and location and billions of devices transmitting the data of the observations and monitoring in real-time with complex event processing. The IoT data lifecycle differs from the traditional lifecycle of the data processing and data management through conventional RDBMS technologies. The data extraction, aggregation, transformation, and loading of the data is summarized while all the devices are connected online. However, the logging of the data can be performed offline. The difference is IoT data lifecycle needs capabilities of both online and offline logging and storing the data with large-scale processing capabilities. The data is queries from Internet of Things through database queries. The data transfers from the IoT require sending workflow notification to different people frequently. The data is entirely different from the traditional structured data. The data transmitted from the Internet of Things devices could be texts, audio, voice messages or video content from the security cameras of the banks. The IoT data gets pre-processed from the unstructured format to combine the data coming from disparate sources of the sensors.

The indexes are built on IoT database applications for high-speed data analytics, aggregation, fusion, filtering, and processing. The number of database inserts coming from the Internet of Things web generated data is in extreme high-volume. However, the indexing scheme applied for such high-speed database operations can be expensive on the performance of the system. Since the data is real-time streaming data, the trade-offs have to be established for building the indexing schema versus effective retrieval of the data. Several indexing schemes can be applied such as windows indexing, dynamic indexing, and multi-granule indexing, time indexing, and wave indexes. The dynamic indexing is performed on the streaming data that is very frequently accessed.

References

Abu-Elkheir, M., Hayajneh, M., & Ali, N. A. (2013, November 14). Data Management for the Internet of Things: Design Primitives and Solution. US National Library of Medicine National Institutes of Health. http://dx.doi.org/10.3390/s131115582

Cotts, T. (n.d. ). Breaking the IoT Bottleneck. Retrieved April 17, 2016 , from http://www.ospmag.com/issue/article/Breaking-the-IoT-Bottleneck

Gorey, C. (2016). IoT Day: A timeline of how IoT is changing the world (infographic). Retrieved April 9, 2016, from https://www.siliconrepublic.com/machines/2016/04/09/iot-day-infographic

Holdowsky, J., Mahto, M., Raynor, M. E., & Cotteleer, M. J. (2015). Inside the Internet of Things (IoT). Retrieved April 9, 2016, from http://dupress.com/articles/iot-primer-iot-technologies-applications/

Lieberman, B. (2015). The Internet Of Things — Analytics: Using The Intel® IoT Analytics Website for Data Mining. Retrieved April 17, 2016 , from https://software.intel.com/en-us/articles/the-internet-of-things-analytics-using-the-intel-iot-analytics-website-for-data-mining

MSV, J. (2015). AWS IoT: Amazon’s Knock Out Punch To The Competition. Retrieved April 17, 2016 , from http://www.forbes.com/sites/janakirammsv/2015/10/13/aws-iot-amazons-knock-out-punch-to-the-competition/#2cfe9553a94c

Postscapes (2011). IoT Standards and Protocols. Retrieved August 1, 2016, from https://postscapes.com/internet-of-things-protocols/

Schneider, S. (2013). Understanding The Protocols Behind The Internet Of Things. Retrieved April 9, 2016, from http://electronicdesign.com/iot/understanding-protocols-behind-internet-things

--

--

Dr. GP Pulipaka

Ganapathi Pulipaka | Founder and CEO @deepsingularity | Bestselling Author | Big data | IoT | Startups | SAP | MachineLearning | DeepLearning | DataScience