Hello-World in Apache NiFi

Suci Lin
4 min readNov 25, 2017

--

I am using Apache NiFi Processors to ingest data from various purposes. NiFi allows users to collect and process data by using flow based programming in Web UI. Ingesting data via Nifi is very convenient. The latest version is 1.4.

I will introduce how to enable NiFi via Docker and Homebrew in Mac and a Hello-World sample to run NiFi.

PRE-REQUISITES

  • Successfully running Docker in your machine (for running NiFi in docker)
  • (Better to have) Basic concepts of NiFi (data flow, processor, and connection. See more in user guide of Apache NiFi)

Getting Started

1. Run NiFi

You can run NiFi via a.) Docker, b.) installing it in Mac or c.) installing it in Windows

a.) Run a NiFi container

I usually use docker to run everything in my Mac. If you can run docker well in your local machine, you may use the following command to download and run NiFi 1.4 or latest version.

docker run -d -P -h nifi -p 38080:8080 -p 38181:8181 --name nifi_1.4.0 --memory=4g -v /docker/apache/nifi apache/nifi:1.4.0# latest version
docker run -d -P -h nifi -p 38080:8080 -p 38181:8181 --name nifi_latest --memory=4g -v /docker/apache/nifi apache/nifi:latest

b.) Download, install and run NiFi in Mac

brew install nifi

Just follow the official doc from Apache Nifi

Find the folder where NiFi was installed. Ex. /usr/local/Cellar/nifi/1.4.0

bin/nifi.sh run, run in the foreground,

bin/nifi.sh start, run in the background

bin/nifi.sh status, check the status

bin/nifi.sh stop, stop NiFi

c.) Download, install and run NiFi in Windows

  1. Download from Nifi official website .
  2. Choose nifi-x.y.z-bin.zip (ex, nifi-1.4.0-bin.zip).
  3. Extract to a specific folder (ex. c:\Users\username\nifi).
  4. Find and execute run-nifi.bat in “your nifi folder\bin\”

2. Verify if NiFi is running

Run Nifi instance locally, then connect NiFi by http://localhost:38080/nifi (or http://localhost:8080/nifi if you run in Mac or Windows) to see NiFi user interface in the browser.

3.Building your first Apache Nifi dataflow

I try to introduce the simplest use case as an example here. (Warning: many screen-shot pictures! You may zoom out your browser.)

Only two processors are used in this dataflow:

  • GenerateFlowFile as input and PutFile as output

GenerateFlowFlies will generate random texts as flowfiles. PutFile will write the content from GenerateFlowFiles into the disk as many individual files.

a). ​Add a Processor

To start, drag the processor icon near NiFi logo in the menu bar.

drag the processor icon

Search in Filter, choose GenerateFlowFile to add to NiFi canvas.

Configure the settings according to the Warning message and the behavior you expect. The detail configurations of processors are not introduced here. Basically, you can find the detail usage when you right click at processor.

This processor will generate FlowFiles with random data.

Then, add another processor:PutFile. Need to add corresponding values of configuration for PutFile.

b). Connect processor

Drag from a processor to connect to another processor.

c). Run two processors as your first NiFi data flow

d) Verify the data flow

I ran NiFi in docker. So I entered the NiFi docker container to check if data were written into /tmp/ folder.

Then, stop processors to not continue writing files.

Conclusion

I introduced a VERY simple example to show what is NiFi with an input processor and an output processor, GenerateFlowFile and PutFile. The basic steps are adding appropriate processors, connecting them, then run it.

Apache NiFi is a powerful tool with scalable directed graphs for pulling data from external sources; routing, transforming, and aggregating it; and finally delivering it to its final destinations.

More NiFi usages and use cases will be explored as it keep growing. However, as many data ingestion pipelines tasks, you still should firstly design a reasonable and appropriate ingestion logic based on your purpose and data infrastructure environment.

本文分享了當❶如何透過docker,以及在Mac/Windows 建立NiFi 使用環境❷透過一個簡單的dataflow介紹NiFi的基本概念.

主要步驟為:

  1. 選用適合的Processor (在此為 GenerateFlowFile and PutFile),GenerateFlowFile 隨機產生文字,以flowfile的形式傳給PutFile,PutFile 將這些內容寫進指定的資料夾中。
  2. 設定這兩個 processors
  3. 連接這兩個 processors
  4. 執行並驗證這一個dataflow

值得注意的是,在使用NiFi 之前,應該預先思考並準備好相關資訊(data source, target and etc.),設計處理data 每個步驟和邏輯,接著篩選適合的processor來完成data ingestion的工作。

本方法是以一個簡單的NiFi dataflow的範例,包含input 和output,提供簡單的測試,請參考延伸參考文件來建置更進階的NiF環境和設計進階的data 處理邏輯。

Ref:

  1. Getting-started with NiFi from official docs of Apache NiFi
  2. Apache NiFi How to Build a Flow — Part 1 and Part2 (youtube)
  3. nifi in depth
  4. 『NiFi 学习之路』入门 (Chinese)
  5. Apache NiFi Overview and Apache NiFi User Guide

--

--

Suci Lin

Data Engineer, focus on stream processing and IoT. Passionate about data storytelling with data visualization and building an engineering culture.