Jeffrey Ellin
Aug 22, 2017 · 3 min read

Visualizing Amazon Kinesis IoT data with Zepplin

The Internet of Things (IoT) is increasingly becoming an important topic in the world of application development. This is because these devices are constantly sending a high velocity of data that needs to be processed and analyzed. Amazon Kinesis and Amazon IoT are a perfect pair for receiving and analyzing this data. Spark Streaming can be used to process the data as it arrives.

To demonstrate this technology I will use the Amazon Simple Beer Simulator. The simulator is a python script that posts random data to represent a beer keg. Data includes temperature, pour volume and humidity.

Data will be sent to Amazon IoT where a rule will publish it to Kinesis. From there an Apache Spark streaming application can read and transform the data with it ultimately being visualized with Apache Zeppelin.

Apache Zeppelin is a web based tool for running notebooks. It allows Data Scientists easy access to running Big Data tools such as Spark and Hive. It also provides an integration point for using javascript visualization tools such as D3 and Plotly via its Angular interpreter. In addition Zeppelin has some built in visualizations that can be leveraged for quick and dirty dashboards.

Notebooks make it possibly to build interactive visualizations without needing to deploy code onto a big data platform.

The first few paragraphs in my sample are designed to build the streaming application and Scala. The last paragraph renders a nice graph that will update automatically every minute.

For a complete look at how this is done refer to two of my blog posts.

In the linked articles I will detail how to setup Kinesis, how to configure an Amazon IoT rule and will be using a integrated docker container that can be used to run Zeppelin.

In a real world situation it would be better to use a distributed install of Zeppelin so as to leverage the capacity of a multiple node cluster. Since we are only dealing with one Kinesis shard we can easily support this use case on a container running on a laptop.

Amazon IoT and Kinesis: Spark Streaming

Amazon IoT and Kinesis: Introducing Zeppelin

At the end of the day this illustrates how to use Zeppelin to visualize data as it is being retrieved by Spark streaming. It is a rather contrived example as you most likely would want to use some sort of persistent storage to save the results before they are queried. With the above setup you can only query the data from the last batch interval. One possible solution is to drop the data from the streaming service into an elastic cache backed by Elastic Search or Redis and then query your chart over the desired time window.

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade