Apache NiFi — Streamr processors

Published in

Streamr

4 min readJun 12, 2019

When building larger architectures, especially those focused on data flowing from system to system such as IoT, developers are likely to encounter issues with maintainability and scalability. You may even need to write a script to connect and integrate different systems. This problem is only exacerbated in architectures where data formats differ between systems, devices and applications.

Apache NiFi is a tool built specifically to address these and many more issues concerning data flows. Here at Streamr Labs, we have created publish and subscribe processors for Apache NiFi. Connecting your streams to NiFi allows you to connect and integrate real-time data to different systems with little to no effort. One great use case for these processors is data analysis with Apache Spark. The ExecuteSparkInteractive processor provided out of the box in NiFi allows you to inject code to PySpark to refine your data.

What is Apache NiFi?

Apache NiFi was built to ease the automation of data flows between systems. NiFi makes understanding and modifying data flows in even larger architectures easier through a drag and drop user interface; NiFi’s design closely relates to the main ideas of Flow-Based Programming. However, NiFi doesn’t just offer a UI to create connections between systems, it has multiple features that make it a viable option as a backbone of a data flow architecture. Some of its most important features are:

Data buffering with back pressure and pressure release
Encryption with 2-Way SSL
Horizontal scalability (clustering)
And (most importantly for Streamr Labs) extendability. This is what allows anyone to build a custom processor for NiFi.

You can read more about NiFi and its features in its official documentation.

NiFi also has a subproject called MiNiFi that is more suited to be used in the edge of an architecture. For example, MiNiFi can be set up in a Raspberry Pi to create a light data flow that processes and pushes forward sensor data in an IoT architecture. Where NiFi is built on Java, MiNiFi has a Java version and a C++ version for lighter data flows.

Streamr has also integrated with Node-Red

How does Apache NiFi differ from the Streamr stack ?

Comparing Apache NiFi to the Streamr stack is like comparing apples to oranges. There are similarities: they are both open source and utilise flow-based programming models. They also operate on top of Java and ingest data. But as solutions, they serve different purposes. Apache NiFi is an open source enterprise-grade data flow automation or ETL tool designed to automate the flow of data between software systems. The Streamr stack could be seen in this context as a holistic data processing stack that has generalised data ingestion capabilities, a data transportation layer, data analysis tools and a data Marketplace.

One could say that these two solutions operate on different markets and segments but the differences between Apache NiFi and Streamr can be beneficial. With the Streamr processor, Apache NiFi users can easily integrate their existing information systems and data to be monetised on the Streamr data marketplace.

Getting started with the Streamr processors on NiFi

If you haven’t installed Apache NiFi yet, you can find the guides and downloads for it here. After you have Apache NiFi installed, the easiest way to get started with the Streamr processors is to:

Download the .nar file for the processors
Copy the .nar file to NiFi’s /lib directory. The lib directory is where all the processors’ .nar files are stored. The lib directory is most likely at NIFI_HOME/lib or NIFI_HOME/libexec/lib.
Restart NiFi
See if the professors were correctly added to NiFi by dragging the add processor icon to the flow, and checking if StreamrLabs were added to the Source drag down menu.
The processors should be found under the StreamrLabs source. If you cannot see the processors make sure that you added the .nar file to the correct directory.

The StreamrSubscribe processor allows you to subscribe to real-time data in Streamr and transfer the data onwards to a NiFi flow. The StreamrPublish processor allows you to publish data to Streamr from a NiFi flow.

Here is an example flow showing how to use the processors. You can try out the example flow yourself by importing this .xml file to your NiFi flow. You can also see how the processors and controllers are configured in the example.

The processors use a single line, single entry JSON string as their data format.

{“string”: “string”, “number”: 123123, “list”: [123, 123]}

The JSON’s can be easily converted to / from other formats using NiFi’s out of the box processor ConvertRecord. Simply configure the processor to have a JsonTreeReader if you want to read JSONs or JsonRecordSetWriter if you want to write JSONs. JsonRecordSetWriter should be configured to output only one line per object as the StreamrPublish processor only processes and publishes one entry at a time.

Go to https://github.com/streamr-dev/streamr-nifi for more information on how to get started and how to use the Streamr processors.

Apache NiFi — Streamr processors

What is Apache NiFi?

How does Apache NiFi differ from the Streamr stack ?

Getting started with the Streamr processors on NiFi

Written by Santeri Juslenius