Writing critical syslog events to Apache Iceberg for analysis

Published in

Cloudera

3 min readJun 25, 2023

A few weeks have passed since you built your data flow with DataFlow Designer to filter out critical syslog events to a dedicated Kafka topic. Now that everyone has better visibility into real-time health, management wants to do historical analysis on the data. Your company is evaluating Apache Iceberg to build an open data lakehouse and you are tasked with building a flow that ingests the most critical syslog events into an Iceberg table.

Ensure your table is built and accessible.

Create an Apache Iceberg Table

From the Home page, click the Data Hub Clusters. Navigate to oss-kudu-demo from the Data Hubs list
Navigate to Hue from the Kudu Data Hub.
Inside of Hue you can now create your table. You will have your own database to work with. To get to your database, click on the ‘<’ icon next to default database. You should see your specific database in the format: <YourEmailWithUnderscores>_db. Click on your database to go to the SQL Editor.
Create your Apache Iceberg table with the sql below and clicking the play icon to execute the sql query. Note that the the table name must prefixed with your Work Load User Name (userid).

CREATE TABLE <<userid>>_syslog_critical_archive

(priority int, severity int, facility int, version int, event_timestamp bigint, hostname string,

body string, appName string, procid string, messageid string,

structureddata struct<sdid:struct<eventid:string,eventsource:string,iut:string>>)

STORED BY ICEBERG;

Once you have sent data to your table, you can query it.

Additional Documentation

2.1 Open ReadyFlow & start Test Session

Navigate to DataFlow from the Home Page
Navigate to the ReadyFlow Gallery
Explore the ReadyFlow Gallery
Search for the “Kafka to Iceberg” ReadyFlow.
Click on “Create New Draft” to open the ReadyFlow in the Designer named yourid_kafkatoiceberg Ex: tim_kafkatoiceberg
Start a Test Session by either clicking on the start a test session link in the banner or going to Flow Options and selecting Start in the Test Session section.
In the Test Session creation wizard, select the latest NiFi version and click Start Test Session. Notice how the status at the top now says “Initializing Test Session”.

2.2 Modifying the flow to read syslog data

The flow consists of three processors and looks very promising for our use case. The first processor reads data from a Kafka topic, the second processor gives us the option to batch up events and create larger files which are then written out to Iceberg by the PutIceberg processor.
All we have to do now to reach our goal is to customize its configuration to our use case.

Provide values for predefined parameters
Navigate to Flow Options→ Parameters
Select all parameters that show No value set and provide the following values

Name

Description

Value

CDP Workload User

CDP Workload User Password

Writing critical syslog events to Apache Iceberg for analysis

Written by Tim Spann