All pictures taken by Michael Hausenblas—donated to the public domain.

Sensors: primary source of Big Data

How the Internet of Things/M2M/the Programmable Web will require us to think differently about processing data.

Michael Hausenblas
I. M. H. O.
Published in
3 min readAug 21, 2013

--

Prompted by a recent Gartner release on Emerging Technologies I’m going to make the case that sensors will be the primary source of Big Data in the years to come.

While this is not an entirely new thought I suppose it is interesting to have look at the challenges we’re facing with the data torrent from the Internet of Things (IoT), M2M, the Programmable Web or whatever your favourite term is.

Where are all the sensors?

Many have written about what the IoT will bring us. But let’s step back a bit and have a look at what sensors can mean and where we (increasingly) find them deployed:

  • Every smartphone has a dozen of sensors.
  • More and more wearables are being rolled out, including Google Glass or Apple’s iWatch.
  • We find an array of sensors in commercial buildings and in the future surely more in homes as well.
  • All kinds of transportation systems, including public traffic, as part of smart cities.
  • The manufacturing industry has been using sensors for a while already and is increasingly exploring ways to derive more value out of it.
  • Retailers benefit from sensors in the supply chain management.
Source: CISCO blog post, http://blogs.cisco.com/news/the-internet-of-things-infographic/

Right. So apparently, it’s not about a lack of sensors now, or in the near future:estimates range between 40 billion and 50 billion devices by 2020.

Challenges

So, what challenges do we face concerning the IoT?

I won’t focus on the more obvious ones, such as consolidation of protocols—low energy Bluetooth, Z-Wave, WirelessHART, ZigBee, Link Layer, 6LowPAN, CoAP, you name it—the power consumption or the integration with existing (Web) infrastructure.

The question I’m interested in is: are we equipped with the tools to gain insights from the sensor data? Greater minds already have argued a while ago that relational databases are not suitable for the tasks at hand. I’d like to argue that Hadoop, Storm, and NoSQL databases are a good fit. And here is why:

  • One needs to be able to capture and store all the sensor data. While a single, simple sensor—say, a temperature sensor—might only generate some MB per year, the sheer number of sensors puts us quickly into the high PB range.
  • For many applications, combining historical data with new, incoming data from sensors is essential (cf. also the Lambda Architecture).

The way I see it we will witness sensor data dominating Big Data within the next couple of years. Not only in terms of volume but equally concerning the velocity (think: stream processing) and variety (formats used) of the data items being processed, the IoT presents us with plenty of challenges.

--

--

Michael Hausenblas
I. M. H. O.

open-source observability @ AWS | opinions -: own | 塞翁失马