Playing with Flux, a data scripting language

Published in

Worldsensing TechBlog

5 min readJun 28, 2018

At Worldsensing we have been recently using the TICK stack at several places in the company for different purposes. We have used the full stack for infrastructure monitoring, as is the most frequent use case for the software. We have also been using InfluxDB and Kapacitor, together with Grafana, to quickly and easily process and visualize data in several R&D projects. Finally, we are also using InfluxDB and Kapacitor to implement real-time anomaly detection in our products.

Considering the wide range of use cases that we had, we thought it would worthwhile to go to InfluxDays London. There were many interesting talks and we had a chance to solve some doubts and find new ways of doing things with the software. However, the most interesting thing for us was the presentation of Flux, their new data scripting language.

Basically, they argue that, while SQL is good for querying data, sometimes when one has to process data ends up either writing huge complex queries, or loading all data into pandas and working with it there. Considering this, they decided to create a scripting language instead of a query language in order to try and process data in the database as much as possible.

Also the engine to process this language would be separate from the storage engine. This decoupling means that they can be both scaled independently, but also that, potentially, the query engine (they called it that, even though it’s a scripting language; well, okay!) could have connectors to different types of databases, so the language could end up being kind of universal. Well, that’s ambitious but sounds cool! So let’s see what the language looks like.

A simple Flux script would look like this:

from(db:"telegraf")
|> filter(fn:(r) => r._measurement == "cpu" and r._field == "usage_user" and r.cpu == "cpu3")
|> range(start:-1h)

(the filter line can be split in three separate filter instructions; the engine will join them anyway).

This would be equivalent to this query in InfluxQL:

SELECT "usage_user", "cpu" FROM "telegraf"."autogen"."cpu" WHERE time > now() — 1h and "cpu" = 'cpu3'

Ok, so even though they are more or less the same length, it really seems to show two different kinds of languages. Another difference is in the result set. For example, if we remove the filter by the field cpu, now the two queries don’t return the same. For this Flux query:

from(db:"telegraf")
|> filter(fn:(r) => r._measurement == "cpu" and r._field == "usage_user")
|> range(start:-10s)

This is the result (both in the Chronograf editor and in the CLI):

In InfluxQL:

> SELECT "usage_user", "cpu" FROM "telegraf"."autogen"."cpu" WHERE time > now() — 10sname: cpu
time usage_user cpu
— — — — — — — — -
1529505814000000000 34.997452878215555 cpu-total
1529505814000000000 35.09127789030023 cpu0
1529505814000000000 35.88709677412375 cpu1
1529505814000000000 33.814432989675645 cpu2
1529505814000000000 34.83606557375216 cpu3
1529505819000000000 35.65261554090017 cpu-total
1529505819000000000 36.290322580690116 cpu0
1529505819000000000 37.97979797984089 cpu1
1529505819000000000 32.18623481780154 cpu2
1529505819000000000 36.196319018418144 cpu3

As we can see, Flux returns 5 separate “tables”, while InfluxQL just gives all the results together and mixed. By default, Flux prepares the results to be further processed in subsequent steps, to be filtered, joined or aggregated as necessary. In order to separate the data in InfluxQL we have to add a group by clause, which in Flux is kind of put there by default.

from(db:"telegraf")
|> filter(fn:(r) => r._measurement == "cpu" and r._field == "usage_user")
|> range(start:-10s)
|> mean()

is equivalent to:

SELECT mean("usage_user") as "usage_user" FROM "telegraf"."autogen"."cpu" WHERE time > now() — 10s GROUP BY "cpu"

If you want to mix values with different tags, you can also select what to use for grouping with the group(by:[column_names]) function.

But what about more complex queries? Can Flux make our lives simpler? Well, this will supposedly be possible when the language and engine are finished:

from(db:"telegraf")
|> filter(fn:(r) => r._measurement == "cpu" and r._field == "usage_user")
|> range(start:-1h)
|> exponentialMovingAverage()

But sadly, for now it’s still in development and the available functions are more or less the same ones that are implemented in InfluxQL. However, it’s already possible to define your own functions, so you can make queries look simpler.

We’re big fans of Kapacitor, since it’s a simple way for processing streams of data (way simpler than having e.g. an Apache Spark cluster, if you don’t work at that scale). So the first question we had when learning about Flux is: how will Flux and Kapacitor interact? Well, basically it seems like Flux will very much replace Kapacitor: it will have capabilities to output results into a database or generate an alarm, and when continuous queries are implemented, everything that Kapacitor can do should be even easier to build with Flux.

As an example, let’s see a script to compare CPU usage data from now to that of one hour ago (could also compare with a week ago or whatever we find useful). Here is the script for Kapacitor (you can also find it in this repository):

Now let’s see how this would be in Flux:

The same can be accomplished in a few less lines of code and, I would say, a cleaner and easier to understand way. Moreover, Flux should manage transparently the arrival of new data for continuous queries, so that, unlike Kapacitor, we wouldn’t have to worry about choosing to have stream or batch data.

We are eager to get access to the release version version of Flux and try all the new features, and also see how the community accepts it. Will it really make a place for itself in the data world? Will it rival SQL’s popularity for data access and analysis? Let’s not forget that other similar databases, like Timescale, fully embrace SQL as their language and still seem to offer powerful timeseries queries, so it will be interesting to see how the timeseries landscape evolves.

If you want to try Flux for yourself with the current development version, follow these instructions:

1- Download the nightly builds of InfluxDB, Chronograf (to use the graphical Flux editor) and Flux.

They’re available from the Influx downloads page with installation instructions.
Follow the instructions on each and run the InfluxDB and Flux daemons.

2- To also have Telegraf to insert data (and Kapacitor if you also want it) use the sandbox.

Remove InfluxDB and Chronograf from the docker-compose.yml file.
Modify the Telegraf configuration (and Kapacitor if you are running it) in its folder to point to your local docker IP, so that it sends data to the nigntly build of InfluxDB you are running.

3- Start writing Flux scripts right away! There are two ways (actually three, but one involves sending scripts through curl so I won’t even list it):

platform_nightly_xxx/influx repl: Command list interface. Write a script and type run() in the next line.
http://localhost:8888/sources/1/delorean: Flux editor in Chronograf. You can edit scripts and explore the results.

Have fun!

Playing with Flux, a data scripting language

Written by Daniel Lázaro Iglesias