Hello-World in Apache NiFi
I am using Apache NiFi Processors to ingest data from various purposes. NiFi allows users to collect and process data by using flow based programming in Web UI. Ingesting data via Nifi is very convenient. The latest version is 1.4.
I will introduce how to enable NiFi via Docker and Homebrew in Mac and a Hello-World sample to run NiFi.
- Successfully running Docker in your machine (for running NiFi in docker)
- (Better to have) Basic concepts of NiFi (data flow, processor, and connection. See more in user guide of Apache NiFi)
1. Run NiFi
You can run NiFi via a.) Docker, b.) installing it in Mac or c.) installing it in Windows
a.) Run a NiFi container
I usually use docker to run everything in my Mac. If you can run docker well in your local machine, you may use the following command to download and run NiFi 1.4 .
docker run -d -P -h nifi -p 38080:8080 -p 38181:8181 --name nifi_1.4.0 --memory=4g -v /docker/nifiapache/nifiapache/nifi:1.4.0
b.) Download, install and run NiFi in Mac
brew install nifi
Just follow the official doc from Apache Nifi
Find the folder where NiFi was installed. Ex. /usr/local/Cellar/nifi/1.4.0
bin/nifi.sh run, run in the foreground,
bin/nifi.sh start, run in the background
bin/nifi.sh status, check the status
bin/nifi.sh stop, stop NiFi
c.) Download, install and run NiFi in Windows
- Download from Nifi official website .
- Choose nifi-x.y.z-bin.zip (ex, nifi-1.4.0-bin.zip).
- Extract to a specific folder (ex. c:\Users\username\nifi).
- Find and execute run-nifi.bat in “your nifi folder\bin\”
2. Verify if NiFi is running
3.Building your first Apache Nifi dataflow
I try to introduce the simplest use case as an example here. (Waring: many screen-shot pictures! You may zoom out your browser.)
Only two processors are used in this dataflow:
- GenerateFlowFile as input and PutFile as output
GenerateFlowFlies will generate random texts as flowfiles. PutiFile will write the content from GenerateFlowFiles into the disk as many individual files.
a). Add a Processor
To start, drag the processor icon near NiFi logo in the menu bar.
Search in Filter, choose GenerateFlowFile to add to NiFi canvas.
Configure the settings according to the Warning message and the behavior you expect. The detail configurations of processors are not introduced here. Basically, you can find the detail usage when you right click at processor.
This processor will generate FlowFiles with random data.
Then, add another processor:PutFile. Need to add corresponding values of configuration for PutFile.
b). Connect processor
Drag from a processor to connect to another processor.
c). Run two processors as your first NiFi data flow
d) Verify the data flow
I ran NiFi in docker. So I entered the NiFi docker container to check if data were written into /tmp/ folder.
Then, stop processors to not continue writing files.
I introduced a VERY simple example to show what is NiFi with an input processor and an output processor, GenerateFlowFile and PutFile. The basic steps are adding appropriate processors, connecting them, then run it.
Apache NiFi is a powerful tool with scalable directed graphs for pulling data from external sources; routing, transforming, and aggregating it; and finally delivering it to its final destinations.
More NiFi usages and use cases will be explored as it keep growing. However, as many data ingestion pipelines tasks, you still should firstly design a reasonable and appropriate ingestion logic based on your purpose and data infrastructure environment.
本文分享了當❶如何透過docker，以及在Mac/Windows 建立NiFi 使用環境❷透過一個簡單的dataflow介紹NiFi的基本概念．
- 選用適合的Processor (在此為 GenerateFlowFile and PutFile)，GenerateFlowFile 隨機產生文字，以flowfile的形式傳給PutFile，PutFile 將這些內容寫進指定的資料夾中。
- 設定這兩個 processors
- 連接這兩個 processors
值得注意的是，在使用NiFi 之前，應該預先思考並準備好相關資訊（data source, target and etc.），設計處理data 每個步驟和邏輯，接著篩選適合的processor來完成data ingestion的工作。
本方法是以一個簡單的NiFi dataflow的範例，包含input 和output，提供簡單的測試，請參考延伸參考文件來建置更進階的NiF環境和設計進階的data 處理邏輯。