Apache NiFi — Streamr processors

Santeri Juslenius
Jun 12, 2019 · 4 min read
Image for post
Image for post
Streamr has created publish and subscribe processors for Apache NiFi

When building larger architectures, especially those focused on data flowing from system to system such as IoT, developers are likely to encounter issues with maintainability and scalability. You may even need to write a script to connect and integrate different systems. This problem is only exacerbated in architectures where data formats differ between systems, devices and applications.

Apache NiFi is a tool built specifically to address these and many more issues concerning data flows. Here at Streamr Labs, we have created publish and subscribe processors for Apache NiFi. Connecting your streams to NiFi allows you to connect and integrate real-time data to different systems with little to no effort. One great use case for these processors is data analysis with Apache Spark. The ExecuteSparkInteractive processor provided out of the box in NiFi allows you to inject code to PySpark to refine your data.

What is Apache NiFi?

Apache NiFi was built to ease the automation of data flows between systems. NiFi makes understanding and modifying data flows in even larger architectures easier through a drag and drop user interface; NiFi’s design closely relates to the main ideas of Flow-Based Programming. However, NiFi doesn’t just offer a UI to create connections between systems, it has multiple features that make it a viable option as a backbone of a data flow architecture. Some of its most important features are:

  • Data buffering with back pressure and pressure release

You can read more about NiFi and its features in its official documentation.

NiFi also has a subproject called MiNiFi that is more suited to be used in the edge of an architecture. For example, MiNiFi can be set up in a Raspberry Pi to create a light data flow that processes and pushes forward sensor data in an IoT architecture. Where NiFi is built on Java, MiNiFi has a Java version and a C++ version for lighter data flows.

Streamr has also integrated with Node-Red

How does Apache NiFi differ from the Streamr stack ?

Comparing Apache NiFi to the Streamr stack is like comparing apples to oranges. There are similarities: they are both open source and utilise flow-based programming models. They also operate on top of Java and ingest data. But as solutions, they serve different purposes. Apache NiFi is an open source enterprise-grade data flow automation or ETL tool designed to automate the flow of data between software systems. The Streamr stack could be seen in this context as a holistic data processing stack that has generalised data ingestion capabilities, a data transportation layer, data analysis tools and a data Marketplace.

One could say that these two solutions operate on different markets and segments but the differences between Apache NiFi and Streamr can be beneficial. With the Streamr processor, Apache NiFi users can easily integrate their existing information systems and data to be monetised on the Streamr data marketplace.

Getting started with the Streamr processors on NiFi

If you haven’t installed Apache NiFi yet, you can find the guides and downloads for it here. After you have Apache NiFi installed, the easiest way to get started with the Streamr processors is to:

  1. Download the .nar file for the processors

The StreamrSubscribe processor allows you to subscribe to real-time data in Streamr and transfer the data onwards to a NiFi flow. The StreamrPublish processor allows you to publish data to Streamr from a NiFi flow.

Here is an example flow showing how to use the processors. You can try out the example flow yourself by importing this .xml file to your NiFi flow. You can also see how the processors and controllers are configured in the example.

Image for post
Image for post

The processors use a single line, single entry JSON string as their data format.

{“string”: “string”, “number”: 123123, “list”: [123, 123]}

The JSON’s can be easily converted to / from other formats using NiFi’s out of the box processor ConvertRecord. Simply configure the processor to have a JsonTreeReader if you want to read JSONs or JsonRecordSetWriter if you want to write JSONs. JsonRecordSetWriter should be configured to output only one line per object as the StreamrPublish processor only processes and publishes one entry at a time.

Go to https://github.com/streamr-dev/streamr-nifi for more information on how to get started and how to use the Streamr processors.


Open source infrastructure for real-time data

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store