Flink Streaming SQL Example

Apache Flink has hit version 1.3 lately and the SQL support has been extended. You can find what is supported from the docs. In this blog post, I will be giving a small example for using Streming SQL a Tumbling Window.

Consider you are streaming room temperatures that are being read from many sensors. You want to find the average tempetarute in windows of 10 seconds and act upon it. You want to do it in SQL. The SQL Query that is supported in Flink looks like the following:

Window SQL Example

After setting up the Flink Execution environment, you need to get your data from a stream, parse and format it to a Tuple or a POJO format, and assign timestamps so that Flink knows how to handle time (Event Time vs Processing Time) and finally execute the SQL and print the results to a stream. In our case we will be printing the results to another stream, but you can pipe the stream to another stream if you wish, filter it, and execute another SQL in it. Possibilities are only limited by your imagination.

You maybe wondering why this Streaming SQL is needed. Surely you could just use regular SQL and for 10 second intervals, you could query the latest 10 seconds data to find the average. However, it would travel the whole data at once, while in streaming SQL, the data is being filtered/aggregated in real-time without actually storing it and the results are also being updated real-time. It can also work in parallel. This might be an interesting and a differentiating use case for your applications. Of course, it is not always the feasible option, for instance if your time window is very large, it might be slowing things down, or requiring more memory than the regular SQL version.

The code example

As a result of running the program above with proper configuration, you will see the following output:

living room,2017–07–02 08:51:40.0,36.7043450149447
living room,2017–07–02

The full example can be found on: https://gist.github.com/mustafaakin/457859b8bf703c64029071c1139b593d