Tutorial: Build Your First Streaming Application Now!!

Tim Spann
6 min readJan 18, 2023

--

Using Apache NiFi, Apache Pulsar, and Apache Flink for streaming apps

How to Get Started with NiFi + Pulsar + Flink (FLiPN)

Getting started with N+P+F is a simple process. Follow the step-by-step instructions below to configure Apache Pulsar to build full FLiPN applications with Apache Pulsar, Apache NiFi, and Apache Flink.

The easiest way is to use Docker; you will need at least 16 GB of RAM to run smoothly.

I have recently done a few talks on this, and you can check them out before or after following this tutorial.

Note:

If you need to change ports, you can update the docker-compose.yml file. If you are running on a Mac M1 or M2, you probably want to set the platform for Docker to platform “linux/x86_64”.

Follow the step-by-step instructions below to configure Apache Pulsar to build complete FLiPN applications with Apache NiFi and Apache Flink:

  1. You must have Docker and Docker Compose
  2. You must have Git installed.
  3. Docker must be running.
  4. Clone this repository locally https://github.com/tspannhw/create-nifi-pulsar-flink-apps
  5. Download to nifi directory in that new folder.
  6. Option 1: Download and build your connector with maven and Java 11+.
    https://github.com/streamnative/pulsar-nifi-bundle
  7. Option 2: Download https://search.maven.org/remotecontent?filepath=io/streamnative/connectors/nifi-pulsar-nar/1.18.0/nifi-pulsar-nar-1.18.0.nar and https://search.maven.org/remotecontent?filepath=io/streamnative/connectors/nifi-pulsar-client-service-nar/1.18.0/nifi-pulsar-client-service-nar-1.18.0.nar
  8. Run the Pulsar and NiFi clusters in docker via docker-compose up
  9. Run the Flink SQL cluster in docker via:

sudo docker run — rm -it — platform “linux/x86_64” — volume flink:/flink — name “flink” streamnative/pulsar-flink:1.15.1.4 /bin/bash

  1. You will be connected to the Flink cluster and in a command shell.
  2. Run ./bin/start-cluster.sh
  3. Run ./bin/sql-client.sh
  4. You will now be in the Flink SQL Client.
  5. You can now connect Flink to Pulsar by creating a catalog.
CREATE CATALOG pulsar
WITH (
'type' = 'pulsar-catalog',
'catalog-admin-url' = 'http://<Your PC Name>:8080',
'catalog-service-url' = 'pulsar://<Your PC Name>:6650'
);

Now you will use that catalog.

USE CATALOG pulsar;

We will create a database for our topic.

CREATE DATABASE sql_examples;

Then use that database to create a table for our topic.

USE sql_examples;
CREATE TABLE citibikenyc (
num_docks_disabled DOUBLE,
eightd_has_available_keys STRING,
station_status STRING,
last_reported DOUBLE,
is_installed DOUBLE,
num_ebikes_available DOUBLE,
num_bikes_available DOUBLE,
station_id DOUBLE,
is_renting DOUBLE,
is_returning DOUBLE,
num_docks_available DOUBLE,
num_bikes_disabled DOUBLE,
legacy_id DOUBLE,
valet STRING,
eightd_active_station_services STRING,
ts DOUBLE,
uuid STRING
) WITH (
'connector' = 'pulsar',
'topics' = 'persistent://public/default/citibikenyc',
'format' = 'json'
);

SQL Command

Purpose

SHOW TABLES

Show tables in the current database

USE CATALOG pulsar

Use the pulsar catalog

SHOW CURRENT DATABASE

Display the current database

SHOW DATABASES

Display all the databases

CREATE TABLE citibikenyc …

Create a table in the current database

DESC citibikenyc

Describe the columns of a topic

show create table citibikenyc

Display the DDL to recreate a table

select * from citibikenyc

Query the topic for current data and all events that arrive.

CREATE CATALOG pulsar ..

Create the catalog to connect to Apache Pulsar.

Congratulations! After completing the steps above, you’ve configured Pulsar to work with Apache NiFi and Apache Flink. Now we’ll discuss building a quick application. It’s so easy; my cat, Sploot, can write one.

Building a FLiPN App The Easy Way

We created our topic and table so that the hard part was completed. In a production system, the catalog would automatically contain all the data from our topics defined with schemas.

  1. From a browser, navigate to your local NiFi as http://localhost:9999/nifi/
  2. You can build an app by dragging and dropping NiFi controls to your NiFi canvas. Or you can load one from a JSON file or a NiFi registry. A finished example NiFi flow is in the GitHub directory nifi. You can upload it from there by adding a Processor Group and uploading it.
  3. To run a flow, highlight the first processor and click Run.
  4. Once data flows to Pulsar, you can start a SQL query in the Flink SQL client.
  5. You have now built a NiFi-Pulsar-Flink app.
  6. See an example of a more advanced one that includes a Pulsar function; see https://github.com/tspannhw/pulsar-transit-function

Build an Apache NiFi flow:

Add processors to the canvas.

Then pick the ones you need, following our suggestions in the following steps.

InvokeHTTP

HTTP URL: https://gbfs.citibikenyc.com/gbfs/en/station_status.json

Schedule for every 10 minutes. We will run this once for a demo.

QueryRecord

Create a new JsonTreeReader with defaults.

Create a new JsonRecordSetWriter with defaults.

SplitJson

$.*.data.stations

UpdateRecord

/ts ${now():toNumber()}
/uuid ${uuid}

Create a Pulsar Controller.

The Pulsar Service URL is pulsar://<YourPCName>:6650.

PublishPublicRecord

Configure the Pulsar Producer.

The Topic Name is persistent://public/default/citibikenyc.

Use the Pulsar Client Service that you created before.

Use existing JsonTreeReader.

Use existing JsonRecordSetWriter.

RetryFlowFile

We use the default settings to retry if we fail.

You are now ready to run and test your application.

Or you can schedule it to run.

This has been tested on a MacBookPro with M1 Processor (ARM).

Resources

Important

Learn How to Use Apache Pulsar, Apache Flink, and Apache NiFi

https://dev.to/tspannhw/learn-how-to-use-apache-pulsar-apache-flink-and-apache-nifi-16i7

--

--

Tim Spann

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/