Apache Kafka to Oracle NoSQL Database: Build A Streaming Data Pipeline on OCI using Goldengate

Shadab Mohammad
Oracle Developers
Published in
7 min readMay 11, 2023

--

Building a Streaming Data Pipeline from Apache Kafka to Oracle NoSQL Database on OCI using GoldenGate for Big Data

Photo by JJ Ying on Unsplash

Introduction:

In the bustling arena of today’s business world, the ability to process data in real-time has become an invaluable asset for making swift, informed decisions. This is where the true significance of data integration and streaming solutions comes into play — as an essential link that bridges the various components of a data pipeline.

In this blog post, I’m going to walk you through constructing a data pipeline from Apache Kafka to Oracle Cloud Infrastructure’s (OCI) NoSQL Database, utilizing the power of OCI GoldenGate for Big Data.

OCI offers a fully managed NoSQL Cloud Database, a robust alternative to Amazon DynamoDB, and a comprehensive Change Data Capture tool known as OCI GoldenGate. We’ll explore how to leverage these powerful tools to read data from an Apache Kafka topic and write it to an OCI NoSQL database.

Oracle’s GoldenGate seamlessly orchestrates data replication from source to sink through remote capture processes, known as Extract and Replicat. The Extract process captures the data and records it to trail files. These trail files are then leveraged to create a Replicat process, effectively applying the records to the sink.

Data Pipeline: Kafka to OCI NoSQL Database

Prerequisites:

Before we begin, ensure that you have the following resources set up:

  1. An Oracle Cloud Infrastructure account.
  2. An Apache Kafka 3.2 cluster running in your environment.
  3. OCI GoldenGate for Big Data installed and configured.
  4. Network connectivity from OCI GG to your Kafka environment and all appropriate network security lists, network security groups has the rules for ports 9092
  5. IAM privilege on your OCI User to create OCI GG and OCI NoSQL

Write Streaming Data to the Apache Kafka Topic and Creating the OCI Goldengate Setup

Step 1: Generate Sample Data and Write to Apache Kafka Topic

To Deploy the below Apache Kafka Topic and Producer on OCI, you can refer this article.

We need to create a script that generates sample sensor data and writes it to a Kafka topic called “sensors”. Here’s a shell script (sensor_data.sh) that generates random data as JSON and writes to the topic

#!/bin/bash

topic="sensors"
kafka_server="<your-ip>:9092"
log_file="sensors.log"
KAFKA_HOME="/home/opc/kafka_build_323/kafka_2.13-3.2.3"

while true; do
# Generate random sensor_id, temperature, and humidity
sensor_id=$(printf "s%03d" $((RANDOM % 100 + 1)))
temperature=$(echo "scale=2; 50 + (100 - 50) * $RANDOM / 32767" | bc)
humidity=$(echo "scale=2; 20 + (80 - 20) * $RANDOM / 32767" | bc)
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

# Create single-line JSON payload
payload="{\"schema\": {\"type\": \"struct\", \"fields\": [{\"type\": \"string\", \"optional\": false, \"field\": \"sensor_id\"}, {\"type\": \"float\", \"optional\": false, \"field\": \"temperature\"}, {\"type\": \"float\", \"optional\": false, \"field\": \"humidity\"}, {\"type\": \"string\", \"optional\": false, \"field\": \"timestamp\"}]}, \"payload\": {\"sensor_id\": \"$sensor_id\", \"temperature\": $temperature, \"humidity\": $humidity, \"timestamp\": \"$timestamp\"}}"

# Send payload to Kafka topic
echo "$payload" | $KAFKA_HOME/bin/kafka-console-producer.sh --bootstrap-server $kafka_server --topic $topic

# Append payload to log file
echo "$payload" >> $log_file

sleep 1
done

A sample JSON message which is being written to the Kafka topic is as below :

{"schema": {"type": "struct", "fields": [{"type": "string", "optional": false, "field": "sensor_id"}, {"type": "float", "optional": false, "field": "temperature"}, {"type": "float", "optional": false, "field": "humidity"}, {"type": "string", "optional": false, "field": "timestamp"}]}, "payload": {"sensor_id": "s076", "temperature": 54.23, "humidity": 65.45, "timestamp": "2023-05-10T12:26:47Z"}}

Step 2. Deploy OCI Goldengate for Big Data and Create a Connection for Apache Kafka, OCI NoSQL Database using the OCI Console

OCI GoldenGate for Big Data allows you to integrate and stream data from Kafka to OCI NoSQL. To deploy OCI Goldengate for Big data follow the guide here

2.a) Create Kafka OCI GG Connection:
Create a Connection to the Apache Kafka Bootstrap server in Step 1. Go to OCI Console > Oracle Database > Goldengate > Connections > Create Connection

Assign the connection to the OCI Big Data Deployment

2.b) Create OCI NoSQL Database OCI GG Connection:

First create an API key for your user using a private-public key pair

Create a Connection to the Oracle NoSQL Go to OCI Console > Oracle Database > Goldengate > Connections > Create Connection

Step 3. Create the Extract (Capture) Process for Kafka

Go to the OCI Goldengate Big Data Console URL

3.a) Create a New credential for Kafka

As per Oracle’s official documentation follow these steps and Add a credential:

  1. Open the navigation menu, and then click Configuration.
  2. On the Credentials page, click Add Credential (plus icon).
  3. Enter the following details in the fields provided, and then click Submit:
  • For Credential Domain, enter OracleGoldenGate.
  • For Credential Alias, enter kafka.
  • For User ID: enter kafka://
  • For Password, enter a password.
  • For Verify Password, enter the password again.

3.b) Add Kafka Extract

Parameter File:

-- Parameter file for Kafka extract.
EXTRACT KFH
EXTTRAIL KI
-- TODO: Create a credential store entry with a userid prefix kafka://
SOURCEDB USERIDALIAS kafka DOMAIN OracleGoldenGate
JVMOPTIONS CLASSPATH $OGG_HOME/ggjava/ggjava.jar:$THIRD_PARTY_DIR/kafka/*
-- Connection alias: KafkaICN
TRANLOGOPTIONS _CONNECTIONID ocid1.goldengateconnection.oc1.ap-seoul-1.amaaaaaap77apcqath3ac2jsznxxnxsxiicy27s3wtcph54d6v63xme2jjta
TABLE source.*;

The Kafka topic is being captured in real-time and records return to the trail file ‘KI’

Step 4. Create the Replicat (Apply) Process for OCI NoSQL

From OCI Goldengate Console, go to ‘Overview’ > Create Replicat

Parameters file:

REPLICAT NOSQLKFH
MAP source.sensors , TARGET source.kfhsensors;

Properties file:

# Properties file for Replicat NOSQLKFH
# NoSQL Handler Template
gg.handlerlist=nosql
gg.handler.nosql.type=nosql
gg.handler.nosql.connectionId=ocid1.goldengateconnection.oc1.ap-seoul-1.amaaaaaap77apcqadb7vzzgykj2twstqzn3yfmhxd57r3jnns3s54pgwahaq
gg.handler.nosql.compartmentID=ocid1.compartment.oc1..aaaaaaaak72343qs5qs7wvjvjxlqzfbahtdi2mritj3fwostqqbocvfnk7za
gg.handler.nosql.ddlHandling=CREATE,ADD,DROP
gg.handler.nosql.interactiveMode=true
gg.handler.nosql.storageGb=10
gg.handler.nosql.readUnits=50
gg.handler.nosql.writeUnits=50
gg.handler.nosql.mode=op
gg.classpath=$THIRD_PARTY_DIR/nosqlsdk/*
jvm.bootoptions=-Xmx512m -Xms32m

Step 5. Monitor the Data Streaming

5.a) Check the sensors.log file from Step 1, it should be running a producer script to write data to the Kafka ‘sensors’ topic

tail -f sensors.log

5.b) Check the Statistics of your Extract and Replicat

Extract Stats
Replicat Stats

5.c) Query the OCI NoSQL Database to check the records from Kafka are being persisted to the NoSQL table in real-time

Go to OCI Console > Databases > Oracle NoSQL Database > Tables

Make sure you are in the same compartment which you specified in the CompartmentID in the parameter file in Step 4. above. You can now see a table here called ‘kfhsensors

Query the table to check the records

We can see all the records being written to the Kafka topic is now persisting to OCI NoSQL Database

Conclusion

The synergy of Apache Kafka, Oracle NoSQL, and OCI GoldenGate provides a solid foundation for real-time data streaming. The integration of these powerful tools gives businesses a streamlined path to making more effective data-driven decisions, thus providing them a leg up in a competitive marketplace.

Embracing OCI’s suite of managed services, which includes the NoSQL Cloud Database and GoldenGate, allows you to scale effortlessly and maintain reliable, high-performance real-time data processing. This arrangement not only alleviates the complexity and overhead associated with managing your data infrastructure but also guarantees high availability and durability of your data.

To sum up, crafting a streaming data pipeline from Apache Kafka to Oracle NoSQL Database on OCI using GoldenGate is a highly efficient method to handle and interpret real-time data. Let the data revolution propel your business forward!

In a future blog post I will demonstrate how to fork the Apache Kafka topic to write one stream to OCI NoSQL Database and another stream to persist data to Oracle Object Storage.

Image Credit: Pexel.com

Oracle Developers and Oracle OCI Free Tier

Join our Oracle Developers channel on Slack to discuss this and other topics!

Build, test, and deploy your applications on Oracle Cloud — for free! Get access to OCI Cloud Free Tier!

--

--

Shadab Mohammad
Oracle Developers

Cloud Solutions Architect@Oracle (The statements and opinions expressed here are my own & do not necessarily represent those of my employer)